
Open table formats, such as Apache Iceberg, enable scale-out data warehousing directly on a data lake. This architecture has become known as a data lakehouse. Since the creation of enterprise data warehouses, such as Teradata and Oracle, customers have been locked in by proprietary data storage (i.e., table formats and metadata catalogs). Data lakes were open, but couldn’t deliver the same functionality and performance as data warehouses. Customers have adopted open table formats to achieve the combination of vendor-agnostic data representation and data warehouse capabilities and performance. In this blog post, we explore the journey of open table formats, and why the industry has selected Apache Iceberg as the open standard.
The story begins in 2019 when Netflix contributed Iceberg to the Apache Software Foundation. Netflix was already using Iceberg internally, and once it became an open source project it was quickly adopted by many other large tech companies such as Apple, Adobe, Airbnb, and others. This move marked a strategic shift towards an open, vendor-agnostic solution for representing tables in object storage.
While Delta Lake emerged around the same time frame, it was not (and still is not) an open format. Over time, some of its source code was released, but key elements remained proprietary. For example, a Delta Lake-based lakehouse with the Databricks Unity catalog cannot accept writes from non-Databricks compute engines, and other catalogs impose serious restrictions, such as not supporting concurrent writes. Iceberg clearly stood out in that it was vendor agnostic, and that dozens of leading companies collaborate on the roadmap and actively contribute to the project. Iceberg quickly gained traction due to its open nature and compatibility with modern data teams’ needs.
Throughout 2021–2022, Apache Iceberg and Delta Lake were locked in a two-horse race. The tide began to turn in 2023 as most of the major players in the data lake/lakehouse market rallied behind Iceberg.
Iceberg distinguishes itself not only through its extensive ecosystem of technologies and products but also by the wide array of companies contributing to the project. This diversity brings several advantages, including rapid innovation and assurance that the project will continue to evolve independently of any single company. The following diagram illustrates the variety of contributors to the Iceberg project compared to those of the Delta Lake project:
