Why losing messages on read?

vorandrew · 15 September 2021 14:17

I’ve got topic with replica = 3 and producer with ack = 1. Consumers did not see some of the messages. How to avoid this with out turning ack=all?

danicafine · 15 September 2021 15:16

Hi there! Could you please elaborate a little more? Are you consumers in a consumer group to read from that topic or are they independently reading? How many partitions are in the topic?

vorandrew · 15 September 2021 15:19

50 partitions. in consumer group.

mjsax · 15 September 2021 17:39

Kafka only guarantees that a message is not lost after it was replicated. If you use acks=1 you get the ack back before replication finished. Thus, if there is a broker side error and replication fails, the record could get lost.

If you want to avoid data loss, you need to wait until replication was successful, and thus use acks=all. Btw: you also need to set min.in.sync.replicatas=2. The semantics of acks=all is related to min.in.sync.replicas.

acks=all does not mean, that all followers need to have replicated a message
acks=all means, that all in-sync followers need to have replicated a message
- Thus, if everything is healthy and both followers are in-sync, it means that you get ack after both have repliated
- However, if a follower starts to lag, it might drop out-of-the in-sync follower set. Thus, even with acks=all you might get the acks if only one, or even no (if both followers lag and dropped out of the in-sync follower set) follower replicated the data. – Using min.in.sync.replicas=2 can fix this, because if both follower drop out of the in-sync set, write are rejected right away, until at least one follower catches up again.

vorandrew · 15 September 2021 21:14

I’m not losing messages on write (if ack=1 is means it will be 100% replicated sometime in future if no failure happens)…

Messages being skipped/swallowed on read.

Master - VVVVVVVVVVVVVV
Replica1 - VVVVVSSVVVSSVV
Replica2 - VVVSVSSVSSSVVV

V - ok
S - skipped by consumer

gklijs · 16 September 2021 09:07

That’s weird, is that all from the same topic-partition? Maybe there is another consumer reading from the same topic, belonging to the same consumer group? How did you test you were missing messages?

vorandrew · 16 September 2021 11:45

Same topic, not partition

Totals did not match… when I’ve turned on ack=all on same code base - everything was working fine.

It was working fine in dev env where replica = 0. Changed Kafka to MKS Amazon where we had 3 nodes cluster and replica = 2 - problem appeared.

vorandrew · 16 September 2021 11:51

The other fact that pointing that messages being skipped on read is that.

When I run producer to the end and ONLY after it’s done I start to consume - everything is fine.

When I run producer and consumer together - messages being skipped (when ack=1)

mjsax · 16 September 2021 16:33

It’s hard to imagine that you lose data on-read (ie, that the consumer skips messages). When do you commit offsets? Before or after to finished processing? If you commit before, you get at-most-once semantics and it’s expected that messages might be skipped (eg, if a rebalance happens). If you commit after processing, you get at-least-once and no message should be skipped (you could read a single message multiple times though).

How do you determine that the consumer skips? Based on the input (ie, you send 10 messages and thus that 10 messages are read)? This might not be the right way to evaluate the write were actually successful. Can you verify the start and end offsets of your topic, to see how many messages you really got? Also, on read, can you verify the offsets of the messages. If you don’t use transactions, there should not be any offset gaps.

Messages being skipped/swallowed on read.

Master - VVVVVVVVVVVVVV
Replica1 - VVVVVSSVVVSSVV
Replica2 - VVVSVSSVSSSVVV

By default, all writes and read are done by the master. So it does not seem right that you have “skipped” read in the followers.

It was working fine in dev env where replica = 0. Changed Kafka to MKS Amazon where we had 3 nodes cluster and replica = 2 - problem appeared.

Btw: a small correction on terminology. You cannot have replication.factor=0 – the minimum replication factor is 1 (for which case you only have the leader replica). If you have replication=3, you have a leader plus two followers. The leader also count to the number of replicas.

vorandrew · 16 September 2021 23:07

I hope I have time to re-create environment to run this job again so I can show you results and you can touch everything yourself.

Topic		Replies	Views
Can a Kafka consumer read messages immediately when acks=0, before the leader syncs with followers? Clients	4	94	12 February 2025
Kafka Data loss Kafka Streams	5	1090	16 February 2024
Cannot read all the Kafka messages from all partitions of a topic Clients	1	1326	16 December 2024
Kafka multi-datacenter solution Ops	12	2779	3 February 2023
ConsumerGroups disappearing Ops	0	2385	15 March 2023

Why losing messages on read?

Related topics