Data Engineering Solution Cuts Costs by 4x & Enables Real-Time Reporting for eClinical Company

Services:

Industry:

Healthcare

Location:

USA

Client since:

2021

Business Challenge

The Client had several data sources stored in different databases, and to analyze those, they needed to be able to access all data (e.g., client and order databases). As the company’s customer base grew, it faced the challenge of accumulating securely and cost-effectively large amounts of data in near real-time for analysis and reporting.

icon Time- and cost-consuming data aggregation

Time- and cost-consuming data aggregation

The Client spent 4x more costs in the data pipeline with 70% effectiveness. Additionally, they received data with a delay of 2 hours.

icon Inability to generate reports

Inability to generate reports

Data analysts on the client’s side knew how to build those reports for their needs and, yes, this process was uncomfortable as there was not one place for all data to be stored. The report generation process is used to overload the data and analytics team with report creation.

icon Lack of expertise in data engineering

Lack of expertise in data engineering

The Client was looking for a skilled data engineering partner. likeAnd Sombra was a good fit, since it has an extensive experience, focus on what matters to Client’s business and get things done right. The results speak for themselves.

Share your business challenge

And we deliver on time and on budget

Meet an expert

How we worked

Agreed on sharing responsibilities with the Client

Sombra team suggested a dedicated development team for the Client. This engagement model implies that both parties are sharing responsibilities as follows:

Client

  • Establishes and manages priorities.
  • Defines and drives milestones for the project.
  • Increases or decreases the project scope during our cooperation.

Sombra

  • Manages the development team to achieve defined goals.
  • Keeps the Client informed about team progress through recurrent syncs.

Strategic alignment

The Client had previously developed a solution, which required Sombra to evaluate and define the most efficient way to improve so they could see fast ROI for their business. In order to deliver a solid product with minimum change requests, our team of professionals took the following steps:

  • Assessed the existing solution.
  • Analyzed the Client’s functional and non-functional requirements.
  • Provided our recommendations for the new system.
  • Approved this new vision with the Client.

Solution design

With approved vision, the team was able to move forward with developing a brand new architecture with new approaches and modern tech stack.

Solution

During the implementation phase, the Sombra Team of professionals used the following tools and technologies:

AWS stack, Data lake on S3, DWH on Amazon Redshift, Spark on AWS EMR, Apache airflow on AWS MWAA;
Clients: Sisense, Holistics

To deliver the solution, we:

  • Implemented distributed data processing using Apache Spark on Amazon EMR, designed to handle data at a scale of hundreds of gigabytes.
  • Established a three-layer data architecture (Raw → Trusted → Analytical) to ensure the delivery of high-quality, normalized analytical data to Data Clients for business intelligence (BI) and reporting purposes.
  • Orchestrated complex data pipelines that integrate various data sources, including files, databases, and APIs, using Apache Airflow.
  • Implemented robust data security measures, including encryption of sensitive data both in transit and at rest, along with strict data access policies and other best practices for data security.

Discover the most promising technological trends that will dominate the market in the next decade

Read more

Business Value

Applying proven methodologies, the Sombra team built a solution that aided the client’s initial pain points. Moreover, the project was delivered on time and within scope, helping the Client to achieve their business goals without unnecessary complications.

  • The new approach reduced regular data infrastructure costs by 50%.
  • Implementing modern technologies enabled near real-time data syncing with delays of no more than 2 minutes, improving data accuracy and timeliness.
  • The client achieved 99.9% reliability in data pipeline workflows, ensuring uninterrupted operations.

Get the latest tech insights delivered to your inbox!

Stay ahead in the tech world—hit subscribe now.

    Thank You for Subscribing!