Tabular Iceberg Sink Connector - Cannot read from Trino

Hi,

We are currently exploring Kafka Connect to read data from Kafka (specifically, the Azure Event Hub Kafka endpoint) and write it to Iceberg tables. I’m using Polaris for the REST catalog and Azure as the object store.

I created a catalog in Polaris using the following payload:

{
  "catalog": {
    "type": "INTERNAL",
    "name": "catalogName",
    "properties": {
      "default-base-location": "abfss://container@storage-account.dfs.core.windows.net"
    },
    "storageConfigInfo": {
      "storageType": "AZURE",
      "tenantId": "xxxxxxxxxxxxx",
      "allowedLocations": ["abfss://container@storage-account.dfs.core.windows.net/warehouse"]
    }
  }
}

Next, I created a namespace in Polaris and set up a sink connector in Kafka Connect with the following configuration:

{
  "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
  "tasks.max": "2",
  "topics": "metric",
  "iceberg.tables": "feed.replay-messages",
  "iceberg.tables.auto-create-enabled": "true",
  "iceberg.tables.schema-force-optional": "true",
  "iceberg.catalog.type": "rest",
  "iceberg.catalog.uri": "http://apache-polaris:8181/api/catalog",
  "iceberg.catalog.io-impl": "org.apache.iceberg.azure.adlsv2.ADLSFileIO",
  "iceberg.catalog.include-credentials": "true",
  "iceberg.catalog.warehouse": "catalogName",
  "iceberg.catalog.token": "xxxxxx",
  "name": "sink-feed",
  "key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "key.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false"
}

I also set the environment variables AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_CLIENT_SECRET in both the Kafka Connect and Polaris pods.

The table gets created successfully, and I can see both the metadata and data folders. The metadata folder contains a JSON file, and the data folder has a few Parquet files.

However, when I attempt to read the table from Trino (after setting up the catalog with the correct configuration), I only see the column names, but no rows are returned. I’ve checked the Parquet files in Azure Blob Storage, and they do contain data.

Could someone help me resolve this issue?

Help me understand: where are you reading from with Trino?

If using a Kafka connector, perhaps this from the Trino docs would help:

“Topics can be live. Rows appear as data arrives, and disappear as segments get dropped. This can result in strange behavior if accessing the same table multiple times in a single query (e.g., performing a self join).”

Hi Thanks for replying. I am reading from eventhubs (Kafka) and ingesting into Iceberg in Azure Storage. This is for storing all my data for warehousing purposes. I am then trying to read that Iceberg data using Trino. I am not directly querying Kafka.

Aha. Then I’d think filing an issue with Trino or reaching out to their community would be the best next step.