
Data Operations + Agile
Data Operations ‘DataOps’ has been inspired by the Agile-premised ‘Development Operations’ model. The ‘DevOps’ model which usually includes security (DevSecOps…
Read More »

Data Operations ‘DataOps’ has been inspired by the Agile-premised ‘Development Operations’ model. The ‘DevOps’ model which usually includes security (DevSecOps…
Read More »
The icebergth is hereth. Apache Iceberg is an open-source table format for large-scale data systems, designed to provide efficient and reliable management of s…
Read More »
Databricks LakeFlow is built on top of Databricks Workflows and Delta Live Tables. It is an implementation of Apache Airflow built into the Databricks eco syst…
Read More »
A straightfoward method to automate data ingestion from S3 buckets (data lake) to a Redshift (data warehouse) cluster; by using Glue. Create a Redshift cluster…
Read More »
[Data engineering lifecycle from “Fundamentals of Data Engineering” by Matt Housley] Data Ingestion Challenges Data ingestion can be complicated. There are usu…
Read More »
AWS Glue is a meta data catalogue service with Extract-Transform-Load logic. The Glue catalogue is based on Hive and is a MySQL DB and a Java front end. Glue &…
Read More »
Data flowing into the Data Lake obviously changes. Data table changes are captured by CDC or change data capture. Changes in the source database are delivered …
Read More »
Amazon Redshift is a petabyte scalable columnar data warehouse that is very efficient in storing raw data and collecting data from various sources. Redshift su…
Read More »
Data products are the end result of file or data movements to the cloud; ETL; processing; de-duplication; curation and storage in a consumable layer. There is …
Read More »