Skip to main content

Dagster & Iceberg

This library provides I/O managers for reading and writing Apache Iceberg tables. It also provides a Dagster resource for accessing Iceberg tables.

Installation

uv add dagster-iceberg

Example

import pyarrow as pa
from dagster_iceberg.config import IcebergCatalogConfig
from dagster_iceberg.io_manager.arrow import PyArrowIcebergIOManager

import dagster as dg


@dg.asset
def my_table() -> pa.Table:
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
names = ["n_legs", "animals"]
return pa.Table.from_arrays([n_legs, animals], names=names)


defs = dg.Definitions(
assets=[my_table],
resources={
"io_manager": PyArrowIcebergIOManager(
name="default",
config=IcebergCatalogConfig(
properties={
"type": "sql",
"uri": "sqlite:////tmp/warehouse/pyiceberg_catalog.db",
"warehouse": "file:///tmp/warehouse",
}
),
namespace="default",
)
},
)

About Apache Iceberg

Iceberg is a high-performance format for huge analytic tables. It brings the reliability and simplicity of SQL tables to big data, while making it possible for multiple engines to safely work with the same tables, at the same time.