Solution
During the implementation phase, the Sombra Team of professionals used the following tools and technologies:
AWS stack, Data lake on S3, DWH on Amazon Redshift, Spark on AWS EMR, Apache airflow on AWS MWAA;
Clients: Sisense, Holistics
To deliver the solution, we:
- Implemented distributed data processing using Apache Spark on Amazon EMR, designed to handle data at a scale of hundreds of gigabytes.
- Established a three-layer data architecture (Raw → Trusted → Analytical) to ensure the delivery of high-quality, normalized analytical data to Data Clients for business intelligence (BI) and reporting purposes.
- Orchestrated complex data pipelines that integrate various data sources, including files, databases, and APIs, using Apache Airflow.
- Implemented robust data security measures, including encryption of sensitive data both in transit and at rest, along with strict data access policies and other best practices for data security.