Abstract
2401.09621.pdf (arxiv.org)
arXiv:2401.09621v1 [cs.DB] 17 Jan 2024
Contemporary approaches to data management are increasingly
relying on unified analytics and AI platforms to foster collabora
tion, interoperability, seamless access to reliable data, and high
performance. Data Lakes featuring open standard table formats
such as Delta Lake, Apache Hudi, and Apache Iceberg are central
components of these data architectures. Choosing the right format
for managing a table is crucial for achieving the objectives men
tioned above. The challenge lies in selecting the best format, a task
that is onerous and can yield temporary results, as the ideal choice
may shift over time with data growth, evolving workloads, and the
competitive development of table formats and processing engines.
Moreover, restricting data access to a single format can hinder data
sharing resulting in diminished business value over the long term.
The ability to seamlessly interoperate between formats and with
negligible overhead can effectively address these challenges. Our
solution in this direction is an innovative omni-directional transla
tor, XTable, that facilitates writing data in one format and reading
it in any format, thus achieving the desired format interoperability.
In this work, we demonstrate the effectiveness of XTable through
application scenarios inspired by real-world use cases