Sunday, May 21, 2023

High Watermark

Giriş
Özet olarak açıklaması şöyle
Overall, the High Water Mark is a critical component of Kafka’s replication and message delivery guarantees and is used by consumers to ensure that they are processing the latest available messages in a partition.
Leader Açısından
Açıklaması şöyle
In Apache Kafka, the High Water Mark (HWM) is the offset of the last message that has been successfully replicated to all replicas of a partition.

The HWM is maintained by the leader replica of a partition, which is responsible for tracking the progress of replication across all replicas. When a message is produced to a partition, it is written to the leader replica’s local log and assigned an offset. The leader replica then replicates the message to all other replicas of the partition, and the HWM is updated to reflect the new offset once the message has been successfully replicated to all replicas.
Consumer Açısından
Açıklaması şöyle
Consumers in a Kafka consumer group use the HWM to determine the latest available offset for a partition. When a consumer reads messages from a partition, it maintains an internal offset that tracks the last message that was successfully processed by the group. The next time the consumer reads from the partition, it starts reading from the next offset after the last successfully processed message, up to the HWM.
Consumer HWM Değerini Nasıl Bulur
Açıklaması şöyle. Yani broker ilk bağlantıda HWM değerini bildirir, ayrıca bu değer güncellendikçe yine bildirir
The Consumers get to know about the High Water Mark (HWM) for a partition through the metadata that is returned by the broker when a consumer first connects to a topic. Specifically, the metadata response includes the partition’s current leader broker, its assigned replicas, and the current HWM for each replica.

When a consumer connects to Kafka, it sends a metadata request to the broker for the topics and partitions it wants to consume. The broker then responds with the metadata information, which includes the HWM for each partition. The consumer can use this information to determine the latest available offset for each partition and start consuming messages from there.

During normal operation, the HWM can change as new messages are produced and replicated to the partition. In this case, the broker will notify connected consumers of the new HWM by sending them a metadata update. The consumer can then adjust its internal state accordingly to ensure that it is processing messages up to the latest available offset.

Consumers can also periodically send a Fetch request to the broker to retrieve new messages from a partition. The broker will respond with any messages that have been produced since the consumer's last fetch request, up to the HWM for the partition.
High Water Mark (HWM) vs Log End Offset (LEO)
Açıklaması şöyle.
The HWM is the offset of the last message that has been successfully replicated to all replicas of a partition. It is maintained by the leader replica of the partition and is used by consumers to determine the latest available offset for a partition.

The LEO, on the other hand, is the offset of the last message that has been written to the partition’s log. It is maintained by each replica of the partition and may differ across replicas due to replication lag or other factors.

In practice, the HWM and LEO will usually be equal, since the leader replica is responsible for ensuring that all replicas have successfully replicated each message before advancing the HWM. However, in some cases (such as when a replica falls behind due to network issues), the LEO may be ahead of the HWM, indicating that some messages have been written to the log but have not yet been fully replicated to all replicas.



No comments:

Post a Comment

kafka-consumer-groups.sh komutu

Giriş Bir topic'i dinleyen consumer'ları gösterir. Aynı topic'i dinleyen consumer group'ları olabilir. Her topic farklı part...