Setting consumer partition

GiuseppeR · 18 October 2021 09:27

Hi I have a data source which write on a meteodata Kafka topic with 3 partitions (replica factor 1, I have only 1 broker).
I need to create a consumer group which read data from the topic meteodata and from partition with id 0 and write these data on a database.
How can I set with a python client the partition where consumer reads data? In the other 2 partition data are available?
Any help is apreciated.
g

rick · 19 October 2021 13:26

@GiuseppeR the Confluent provided python client supports the consumer.assign function. Note: When using assign instead of subscribe, you are manually managing the consumer partition assignment and so consumer groups are not relevant. This stack over flow answer describes this well:

Here is a stack overflow answer for the python assign usage:

The API documentation is here:
http://docs.confluent.io/current/clients/confluent-kafka-python/index.html

Good luck

GiuseppeR · 20 October 2021 09:24

Thank you @rick. I try to test 2 consumers of the same groupid:

# Create Consumer instance
consumer_conf = ccloud_lib.pop_schema_registry_params_from_config(conf)
consumer_conf['group.id'] = 'dbsink_consumer'
consumer_conf['auto.offset.reset'] = 'earliest'
consumer = Consumer(consumer_conf)


# Read arguments and configurations and initialize
args = ccloud_lib.parse_args()
config_file = args.config_file
topic = args.topic
conf = ccloud_lib.read_ccloud_config(config_file)

# Create Producer instance
producer_conf = ccloud_lib.pop_schema_registry_params_from_config(conf)
producer = Producer(producer_conf)
# Subscribe to topic
consumer.subscribe([topic])

When the 2 consumers run at the same time, it seems one consumer haven’t data from the partition (I have a topic with 3 partitions hosted on the same broker). So I have a question: Kafka with a topic with 3 partitions writes the same data on the 3 partitions? If Kafka writes the same data on the 3 partitions It seems using consumer.assign is the solution: manually I decide where a consumer takes data…But I have soome doubt if Kafka write the same data on the 3 topics.
Thanks.

rick · 20 October 2021 12:50

Let me try and focus on the producer side of the question.

Partitioning strategy refers to the method that a producer uses to decide which partition to send records to. The strategy is configurable, however, there is a default strategy. Depending on which version Apache Kafka you are using will determine the default strategy. With Kafka 2.4 and later the default strategy is called the Sticky Partitioner. What’s most important to know about this is that the strategy depends a lot on the Key you assign the record you are producing. If you are assigning a Key to the record, the strategy will calculate a hash from the key and use that to determine which partition to send the record to. This is done to attempt to send records of the same key to the same partition over time. If your records do not have a key, the Sticky Partitioner will send records to random partitions (which balances out over time).

Here is a great blog post with the details:

I hope this helps explain the partitioning strategy and that may help you decide if you need to manage your own consumer partition assignment or you can (preferably) use the Consumer Group feature to allow the Consumer Group Coordinator do that work for you.

GiuseppeR · 20 October 2021 15:30

Hi @rick thank you for your precious help. I’m using Confluent Platform 6.1.0.
I try to use the consumer groups, so each consumer is part of a consumer group: I have 3 consumers in 3 differents consumer groups, I have always 3 partitions of the topic consumed from consumers.
In this manner the systems seems work…but I’m not sure all messages received from my topic are consumed at the same manner from the 3 consumers.

This is a code part of my producer:

# Read arguments and configurations and initialize
args = ccloud_lib.parse_args()
config_file = args.config_file
topic = args.topic
conf = ccloud_lib.read_ccloud_config(config_file)

def acked(err, msg):
    global delivered_records
    """Delivery report handler called on
    successful or failed delivery of message
    """
    if err is not None:
        print("Failed to deliver message: {}".format(err))
    else:
        delivered_records += 1
# Create Producer instance
producer_conf = ccloud_lib.pop_schema_registry_params_from_config(conf)
producer = Producer(producer_conf)

producer.produce(topic, key=record_key, value=record_value, on_delivery=acked)
producer.poll(1)

Is it correct using this approach?
Thanks.

rick · 20 October 2021 17:56

Can you try and elaborate on what you see that leads you to believe there is some kind of issue with your applications?

GiuseppeR · 22 October 2021 07:48

Hi @rick I suppose Kafka writes different data in the 3 partitions. I try to create a new topic with one partition and 3 different consumers in 3 different consumer groups.
With some experiments, I saw with a topic with 3 partitions and 3 consumers in 3 different consumer groups, one consumer seems to not receive some data (I set an alert if a value is over a threshold) but I’m sure this threshold was passed but I haven’t received any alert.
Thanks.
Thanks.

rick · 22 October 2021 13:27

@GiuseppeR you could try further debugging with kcat a very clean and simple command line utility for Kafka. Here is an example I used to look at some data in a topic with multiple partitions.

kcat -F <my-config-file> -C -t six-parts -o beginning -f "%t:%p:%o\n"

Here i’m priting everything from the topic six-parts and for each record i’m printing out the topic, partition, and offset. More details on how to use this on the README in the GitHub repository linked above.

Topic		Replies	Views
I am using node-rdkafka. How can I assign consumer to read specific partitions? Confluent Cloud	0	1525	20 July 2023
Partition assignment for consumer concurrency Clients	0	1611	23 September 2023
Confluent_kafka - Python consumer not injesting from all the Partitions of a Topic at the same time Non-Java Clients	0	3546	18 June 2021
Need a little help regarding Kafka partition assignment Certification and Training	3	114	26 August 2024
Multiple consumers, single partition Stream Processing	1	22	18 June 2025

Setting consumer partition

Related topics