Data Snapshot using KSQL

Hi there

I am hoping you can help.

Is there any way to terminate & catch the response without using limit?

{
“ksql”: “select * from USER_STREAM WHERE ID <=10 EMIT CHANGES;”,
“streamsProperties”: {“auto.offset.reset”:“earliest”}
}

When in the Confluent Control Centre (CC) or running this via the CLI I can see this stream being populated in real-time.

There are only 4 rows (ID between 1-4) in the stream. So this query will never end unless I append with ‘limit 4’, anything more and the query will keep running which, I appreciate, is intended & fine when working in a CLI or Confluent CC, or in an application that can appear to stream the data.

However, I want to query this stream outside the KSQL CLI or Confluent CC- say in Postman or calling upon the MSXML2.serverXMLHTTP functionality within SQL or VBA, whereby the query needs to terminate to be able to get the response - giving me a snapshot in time of the data on how it currently stands.

Upon a bit of research I understood that I am to “materialize” the stream in to a table, somehting like this:
CREATE TABLE Materialised_Users AS
SELECT
ID,
LATEST_BY_OFFSET(Firstname) AS Firstname,
LATEST_BY_OFFSET(Lastname) AS Lastname,
LATEST_BY_OFFSET(Email) AS Email
FROM USER_STREAM
GROUP BY ID
EMIT CHANGES;

Which allows me to perform queries such as
{
“ksql”: “select * from MATERIALISED_USERS WHERE ID =2;”,
“streamsProperties”: {“auto.offset.reset”:“earliest”}
}

Which terminates and gives a response in VBA/Postman etc.

However if I wanted to perform a more ambiguous query such as
{
“ksql”: “select * from MATERIALISED_USERS WHERE ID <=10;”,
“streamsProperties”: {“auto.offset.reset”:“earliest”}
}

I get the response:

“error_code”: 40001,

“Bound on ‘ID’ must currently be ‘=’. Add EMIT CHANGES if you intended to issue a push query. Pull queries require a WHERE clause that: - limits the query to a single key, e.g. SELECT * FROM X WHERE <key-column>=Y;.”

But adding EMIT CHANGES puts me back to the never terminating issue again and I don’t just want to pull one record back.

I hope this makes sense, happy to provide more info on request but any help would be really appreciated.

Thanks,
Stu

Seems you are using an older version of ksqlDB. Since version 0.17.0, range pull queries on tables are supported.

Cf. https://www.confluent.io/blog/announcing-ksqldb-0-17-0-new-features-and-updates/#table-scans

Btw: we are also working on “pull queries on STREAMS”: ksql/klip-53-pull-queries-on-streams.md at ecb43e2ffb8450b3902968a2709c86898291dbc7 · confluentinc/ksql · GitHub – Ie, it should be allowed to issue SELECT * FROM stream without the EMIT CHANGES clause and the query will terminate when it reaches the end of the topic, thus the workaround to create a TABLE won’t be necessary any longer.

Hi Matthias,

Thank you for your response. I will look to update the ksqlDB version and try the create table work around again.
Nice to know you are working on allowing pull queries on streams.

I will aim to feedback after I have updated ksqlDB.

Thanks,
Stu

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.