Trilogix Cloud

Cloud Models
Apache Iceberg: an overview
The icebergth is hereth. Apache Iceberg is an open-source table format for large-scale data systems, designed to provide efficient and reliable management of s…
Read More »
Data
Databricks Lakeflow (replaces Airflow)
Databricks LakeFlow is built on top of Databricks Workflows and Delta Live Tables. It is an implementation of Apache Airflow built into the Databricks eco syst…
Read More »
AWS Technology
Data Partitioning
Data files or tables are parsed into smaller units. This is also called ‘partitioning’. A partition is usually performed against a primary attribut…
Read More »
Cloud Design Goals
Parquet file format for Data Lakes
Parquet is a file format standard used in many enterprises. It allows the standardisation of files and provides a common framework for queries and storage. Par…
Read More »
Cloud Models
Databricks and Snowflake: Summary
Databricks and Snowflake overlap in many areas. Firms deploying both need to clearly demarcate the epics and use case journeys to be supported by the technolog…
Read More »
Data
Automating S3 to Redshift with Glue
A straightfoward method to automate data ingestion from S3 buckets (data lake) to a Redshift (data warehouse) cluster; by using Glue. Create a Redshift cluster…
Read More »
AWS Technology
AWS and Data Pipeline Ingestion
[Data engineering lifecycle from “Fundamentals of Data Engineering” by Matt Housley] Data Ingestion Challenges Data ingestion can be complicated. There are usu…
Read More »
AWS Technology
AWS Glue introduction
AWS Glue is a meta data catalogue service with Extract-Transform-Load logic. The Glue catalogue is based on Hive and is a MySQL DB and a Java front end. Glue &…
Read More »
Data
Data Lake Design and Change Data Capture
Data flowing into the Data Lake obviously changes. Data table changes are captured by CDC or change data capture. Changes in the source database are delivered …
Read More »

Category: Data