
Example from Financial Services

Data Lakes are typically medallion architectures and comprise the following:
| Architectural Layer | Purpose | Key Technologies & Role |
| Bronze Layer (Raw) | The initial landing zone for all data. It’s an immutable, permanent record of the raw data as it was sourced. | Azure Data Factory: Used for data ingestion from various sources.
AWS Glue, Python Data Lake Storage Gen2: Provides the foundational storage for the raw data. AWS S3 |
| Data Transformation | This is where raw data is transformed into a clean, structured, and prepared state. | Databricks: The compute engine used to read, transform, and write data. It has native support for Iceberg.
AWS Sagemaker |
| Apache Iceberg | An open-source table format that adds a reliable layer on top of your data lake storage. It enables ACID transactions, schema evolution, and time travel. | Apache Iceberg: Sits between Databricks and Data Lake Storage, managing table metadata and ensuring data reliability. |
| Silver Layer (Prepared) | Contains the organization’s “single source of truth.” The data here is cleaned, standardized, and ready for use. | Apache Iceberg Tables on S3, Data Lake Storage Gen2: Data is stored as Iceberg tables in your data lake, which guarantees data quality and consistency. |
| Gold Layer (Consumption) | Comprised of highly curated, use-case-specific datasets. The data is optimized for fast consumption and reporting. | Azure Synapse Analytics, AWS Glue: Acts as the high-performance analytics engine that reads from the Silver layer. |
| Consumption Tools | These are the end-user tools that connect to the Gold layer to access and visualize the data. | Power BI/Quicksight BI Tools: Used for business intelligence, reporting, and creating dashboards.
Machine Learning Models: Consume the prepared data for advanced analytics and model training. |