Create a KSQLDB Cluster and split workload

Hi, everyone. I’m creating some microservices and I’d like to use KSQLDB.

I plan to have more than one instance of each service where I will materialize some views.

I’ve seen that is possible to split/partition this materialized views contents accross these instances, like said in this confluent article:

To allow users to GET any order, the Orders Service creates a queryable materialized view (‘Orders View’, using a state store in each instance of the service, so any Order can be requested historically. Note also that the Orders Service is partitioned over three nodes, so GET requests must be routed to the correct node to get a certain key. This is handled automatically using the [nteractive Queries to expose the HTTP endpoint. (Alternatively, we could also implement this view with an external database, via Kafka Connect.)

I’ve done some tests and I created a KSQLDB cluster by setting the same ‘KSQL_KSQL_SERVICE_ID’ (I’m using Docker) for both, also making sure one can talk to other (this is important for perform pull queries, for example). I also make sure my topic has more than one partition.

The problem is: it seems that when materializing a view each instance holds a full copy of the dataset, because I can pull querie any record from any instance.

I’d appreciate any help.

Cheers;

If you point to any ksqlDB server to issue a pull query, if it does not host certain record, it will forward the request to the other server automatically to answer the query.

The quote refers to Kafka Streams, not ksqlDB. Kafka Streams does not have such a routing layer.

Cf. https://www.confluent.io/blog/ksqldb-pull-queries-high-availability/

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.