Data Integration at Scale: The Evolution of OpenTable Formats for Modern Data Architectures and Data Warehousing

Posted on:

June 17, 2025

Author

Katie Mohr

Head of Smart Studio

TL;DR - Article Summary

Enterprises face massive data challenges across clouds and formats. Based on insights from the ITPro Podcast, this article explores how Informatica’s Lakehouse architecture and support for OpenTable formats like Iceberg and Delta enable secure, scalable data integration for AI, analytics, and beyond.

Enterprises and organizations face extreme data challenges, both in terms of volume and the number of heterogeneous sources on-premises and across many clouds. By combining the best of data warehouses and data lakes into its unique Lakehouse architecture, Informatica offers fast, secure access to all of an organization’s data, be it structured, semi-structured or unstructured.

Modern enterprises routinely generate incredible volumes of data. These can be challenging and difficult to process, store and use correctly without taking the right approach. Artificial intelligence (AI) has upped this ante considerably. Before an enterprise can implement AI in its workflows, it must prepare data at the right scale across its entire lifecycle. Data integration illuminates another key aspect of the data deluge, in that it represents how data streams from across a business—including customer data, orders, supply chain and logistics metrics, accounting, and more—and gets put together into a single, coherent and cohesive repository.

Data Lakes and Warehouses Become Lakehouses

As data collection and storage techniques and technologies have improved and advanced, businesses have worked with a variety of aggregation, organization and storage tools. These include:

Data Lake: a centralized repository that stores raw unprocessed data in native form. Designed for scale and flexibility, it accommodates structured, semi-structured and even unstructured data such as logs, images, JSON, transcripts, and more. Its schema-on-read approach means it interprets data only when and as it’s accessed.
Data Warehouse: a structured system built for analytics and reporting, it stores cleaned, transformed and validated data through a schema-on-write approach that applies and confirms structure as data is stored.
[Data] Lakehouse: a flexible hybrid architecture that combines the open-ended formats of a data lake with performance and governance of a data warehouse. It supports open file formats (such as Parquet or Delta Lake), enables ACID transactions, and supports SQL analytics and machine learning in one place.

A Data Lakehouse offers a unified platform where raw and curated data can coexist, where auditing and governance hold sway, and where data and development teams can build pipelines for data without duplication across systems. This makes a lakehouse suitable for AI, machine learning, analytics and data science, and a great deal more. Please not that Informatica offers a variety of market-leading data lakehouse platforms and solutions.

Informatica Helps Keep the Focus on Data

Lakehouse architectures provide enterprises with the ability to access and query data directly. There’s no need to extract data from storage and process it for use through a data warehouse. This architecture permits direct access to write-on-schema data, and handles the read-on-access processing needed for unstructured data safely and securely, on the fly.

With the bulk of today’s data somewhere in a cloud, it’s vital that such data remain secure and private, where the quality of the data as high as technology will allow. Thus, enterprises must cope not only with higher data volumes and use, but with all conceivable formats, and with many locations on-premises and in multiple public and private clouds.

The key to overcoming these challenges is to have the right data strategy in place, along with the right leaders to drive that strategy, and the elimination of silos to permit data to flow where and as its needed, anywhere in the organization. This means ensuring the quality of data through proper preparation and labeling or tagging, within a secure, robust and scalable architecture that can securely handle huge data volumes. Informatica’s data lakehouse platforms and solutions take such approaches as they key motivation and trust.

OpenTable Format Are Key

OpenTable formats provide specialized layouts and labels to manage and organize data in a Lakehouse architecture. It relies on metadata to manage and maintain physical data objects available for access and use. This enables applications and tools to query on all kinds of relevant data, and find all the item that match a specific query. This also accommodates open file formats such as Iceberg, Delta and Hoodie, works with hyperscalers and data warehouses, and interoperates with interfaces into the major public clouds (e.g., AWS, Microsoft, and GCP).

Why this is popular is because it works. In the final analysis, this approach makes accessing data inside a Lakehouse faster, better and cheaper. That means more data can be processed and address more quickly. It also means large volumes of data can be more readily accessed at lower cost. And finally, it means that structured, semi-structured and unstructured data are all equally accessible.

The Informatica Edge

Informatica supports OpenTable formats in various ways. Its goal is to support all permutations of such formats, as well as different file formats and different data catalogs. That means Iceberg, Delta and Hoodies on the roadmap. It means Informatica works with all hyperscalers, including AWS, Microsoft Fabric OneLake and DLS Gen 2, plus Google Cloud storage. On the metadata side, Informatica also works with hyperscaler catalogs, plus the Hive MetaStore, and REST-based catalogs such as nessi and grino. Indeed, Informatica’s goal is to support all possible permutations, to give customers what they want and require.

For more information about Informatica and its Data Lakehouse offerings, please consult the following: