Kafka Broker Metrics And Their Debugging

If you are new to Kafka, please read the first three posts of the series given below. Else dive in. 

Introduction to Kafka

Kafka Internals

Reliable Data Delivery in Kafka

Troubleshooting Under Replicated Kafka Partitions

If you are preparing for an interview, this post contains most of the things that you should know about Kafka.

Active Controller Count

Active controller is one of the brokers of Kafka cluster which is designated to do administrative tasks like reassigning partitions. The active controller count metric tells us id the broker is the controller for cluster or not. The value of this metric could be 0 or 1. This metric is emitted per broker.

What if two brokers say that they are the controller?

The active controller count metric indicates whether the broker is currently the controller for the cluster. The metric will either be 0 or 1, with 1 showing that the broker is currently the controller. Kafka cluster require one broker to be the controller and only one broker can be a controller at any given time.

What should you do when more than one broker claims to become controller?
This situation will affect the administrative tasks of cluster. The first step could be restart of the brokers claiming to be controller.

Metric Name

kafka.controller:type=KafkaController,name=ActiveControllerCount

Request Handler Idle Ratio

Following are the two thread pools used by Kafka to handle requests:

Network Handlers

These are responsible for reading and writing data to the clients across the network. This does not require significant processing, so network handler don’t get exhausted easily.

Request Handlers

The request handler threads, however, are responsible for servicing the client request itself, which includes reading or writing the messages to disk. The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. It is advisable to check the cluster for size or any other potential problem if the idle ratios goes lower than 20%.

Kafka uses purgatory to efficiently handle requests.
Read about purgatory here.

Metric Name

kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent

All Topics Bytes In

The all topics bytes in rate, expressed in bytes per second, is useful as a measurement of how much message traffic your brokers are receiving from producing clients. This is a good metric to trend over time to help you determine when you need to expand
the cluster or do other growth-related work. It is also useful for evaluating if one broker in a cluster is receiving more traffic than the others, which would indicate that it is necessary to rebalance the partitions in the cluster.

Metric Name

kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec

All Topics Bytes Out

The all topics bytes out rate, similar to the bytes in rate, is another overall growth metric. In this case, the bytes out rate shows the rate at which consumers are reading messages out. The outbound bytes rate may scale differently than the inbound bytes rate.

The outbound bytes rate also includes the replica traffic. This means that if all of the topics are configured with a replication factor of 2, we will see a bytes out rate equal to the bytes in rate when there are no consumer clients.

Metric Name

kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec

Other Important Kafka Broker Metrics

NameDescriptionMetrics Name
All topics messages inThe messages in rate shows the number of individual messages, regardless of
their size, produced per second. This is useful as a growth metric as a different measure of producer traffic.
kafka.server:
type=BrokerTopicMetrics,
name=MessagesInPerSec
Partition countThe partition count for a broker generally doesn’t change that much, as it is the total
number of partitions assigned to that broker. This includes every replica the broker has, regardless of whether it is a leader or follower for that partition.
kafka.server:
type=ReplicaManager,
name=PartitionCount
Leader countThe leader count metric shows the number of partitions that the broker is currently the leader for. As with most other measurements in the brokers, this one should be generally even across the brokers in the cluster.kafka.server:
type=ReplicaManager,
name=LeaderCount
Offline partitionsThis measurement is only provided by the broker that is the controller for the cluster (all other brokers will report 0), and shows the number of partitions in the cluster that currently have no leader.kafka.controller:
type=KafkaController,
name=OfflinePartitionsCount
Kafka Broker Metrics

Reference:

Kafka – The Definitive Guide by Neha Narkhede, Gwen Shapira & Todd Palino

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter: https://skillcaptain.substack.com

Leave a Reply

Up ↑

%d bloggers like this: