Tabular Iceberg Sink Connector - Cannot read from Trino

sv9 · 10 October 2024 07:10

Hi,

We are currently exploring Kafka Connect to read data from Kafka (specifically, the Azure Event Hub Kafka endpoint) and write it to Iceberg tables. I’m using Polaris for the REST catalog and Azure as the object store.

I created a catalog in Polaris using the following payload:

{
  "catalog": {
    "type": "INTERNAL",
    "name": "catalogName",
    "properties": {
      "default-base-location": "abfss://container@storage-account.dfs.core.windows.net"
    },
    "storageConfigInfo": {
      "storageType": "AZURE",
      "tenantId": "xxxxxxxxxxxxx",
      "allowedLocations": ["abfss://container@storage-account.dfs.core.windows.net/warehouse"]
    }
  }
}

Next, I created a namespace in Polaris and set up a sink connector in Kafka Connect with the following configuration:

{
  "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
  "tasks.max": "2",
  "topics": "metric",
  "iceberg.tables": "feed.replay-messages",
  "iceberg.tables.auto-create-enabled": "true",
  "iceberg.tables.schema-force-optional": "true",
  "iceberg.catalog.type": "rest",
  "iceberg.catalog.uri": "http://apache-polaris:8181/api/catalog",
  "iceberg.catalog.io-impl": "org.apache.iceberg.azure.adlsv2.ADLSFileIO",
  "iceberg.catalog.include-credentials": "true",
  "iceberg.catalog.warehouse": "catalogName",
  "iceberg.catalog.token": "xxxxxx",
  "name": "sink-feed",
  "key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "key.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false"
}

I also set the environment variables AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_CLIENT_SECRET in both the Kafka Connect and Polaris pods.

The table gets created successfully, and I can see both the metadata and data folders. The metadata folder contains a JSON file, and the data folder has a few Parquet files.

However, when I attempt to read the table from Trino (after setting up the catalog with the correct configuration), I only see the column names, but no rows are returned. I’ve checked the Parquet files in Azure Blob Storage, and they do contain data.

Could someone help me resolve this issue?

Cerchie · 10 October 2024 20:19

Help me understand: where are you reading from with Trino?

If using a Kafka connector, perhaps this from the Trino docs would help:

“Topics can be live. Rows appear as data arrives, and disappear as segments get dropped. This can result in strange behavior if accessing the same table multiple times in a single query (e.g., performing a self join).”

sv9 · 10 October 2024 21:07

Hi Thanks for replying. I am reading from eventhubs (Kafka) and ingesting into Iceberg in Azure Storage. This is for storing all my data for warehousing purposes. I am then trying to read that Iceberg data using Trino. I am not directly querying Kafka.

Cerchie · 18 October 2024 19:55

Aha. Then I’d think filing an issue with Trino or reaching out to their community would be the best next step.

system · 17 November 2024 19:55

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tabular sink connector not working with azure event hub (kaka head) Kafka Connect	1	37	13 November 2024
Tabular's Kafka Iceberg Sink Connector not working with Event Hubs Kafka Connect	1	59	13 November 2024
Confluent Http Sink & Azure Event Hubs Self-Managed Connectors	2	4107	2 August 2021
Tabular Iceberg Sink Connector Kafka Connect	2	1430	6 March 2024
Iceberg Sink Connector with Snowflake Catalog Kafka Connect	1	219	7 September 2024

Tabular Iceberg Sink Connector - Cannot read from Trino

Related topics