I am still poking around
Prometheus. So far I am liking it a lot.
In this article I am
going to go over how to get it connected to JMX in Kafka/Zookeeper to start getting
metrics out of those systems for monitoring.
I some of information
for this article at https://blog.rntech.co.uk/2016/10/20/monitoring-apache-kafka-with-prometheus/ [1] and https://www.robustperception.io/monitoring-kafka-with-prometheus/ [2]
Be forewarned, this is
going to be a long article as I poke at this tool to get it working the way I
want it to.
Prometheus JMX Exporter
JXM Exporter located at https://github.com/prometheus/jmx_exporter/ [3] on GitHub.
It is a lightweight http server that exposes JMX data as Prometheus
compatible metrics that it can then scrape.
First go at it
I have a single Kafka
server running. On that server I am
going to download and run the JMX exporter
Download the jar
file (Link obtained from here https://github.com/prometheus/jmx_exporter/#running )
> cd
> mkdir jmx_exporter
> cd jmx_exporter
> wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.9/jmx_prometheus_javaagent-0.9.jar
|
Create a configuration
file in yaml format
> vi prometheus_kafkfa.yml
|
And place the following
in it. (taken from https://github.com/prometheus/jmx_exporter/#configuration )
---
hostPort:
127.0.0.1:1234
jmxUrl:
service:jmx:rmi:///jndi/rmi://127.0.0.1:1234/jmxrmi
ssl:
false
lowercaseOutputName:
false
lowercaseOutputLabelNames:
false
whitelistObjectNames:
["org.apache.cassandra.metrics:*"]
blacklistObjectNames:
["org.apache.cassandra.metrics:type=ColumnFamily,*"]
rules:
- pattern: "^org.apache.cassandra.metrics<type=(\w+),
name=(\w+)><>Value: (\d+)"
name: cassandra_$1_$2
value: $3
valueFactor: 0.001
labels: {}
help: "Cassandra metric $1 $2"
type: GAUGE
attrNameSnakeCase: false
|
Set the -javaagent
option in the KAFKA_OPTS
In my kafka setup I am
starting the kafka server with a script at /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh so I am going to tweak that and add a
KAFKA_OPTS
> sudo vi
/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh
|
And add the following to
the top (change the JMX_DIR to your own
location)
JMX_DIR="/home/patman/jmx_exporter"
export KAFKA_OPTS="$KAFKA_OPTS
-javaagent:$JMX_DIR/jmx_prometheus_javaagent-0.9.jar=1234:$JMX_DIR/prometheus_kafkfa.yml"
|
Now start up Kafka (this
is the command I use)
> sudo
/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh
/opt/kafka/kafka_2.11-0.10.1.0/config/server.properties
|
And I get an error.
Looks like I have an
issue with my JMX Exporter config file.
Let me fix that, open up
the yaml config file again.
> vi ~/jmx_exporter/prometheus_kafkfa.yml
|
I am going to place the
following in it (which is a copy of the example config given here https://github.com/rama-nallamilli/kafka-prometheus-monitoring/blob/master/prometheus-jmx-exporter/confd/templates/kafka.yml.tmpl [4]
---
hostPort:
127.0.0.1:1234
ssl: false
lowercaseOutputLabelNames:
false
lowercaseOutputName:
true
rules:
- pattern :
kafka.cluster<type=(.+), name=(.+), topic=(.+),
partition=(.+)><>Value
name: kafka_cluster_$1_$2
labels:
topic: "$3"
partition: "$4"
- pattern :
kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
name: kafka_log_$1
labels:
topic: "$2"
partition: "$3"
- pattern :
kafka.controller<type=(.+), name=(.+)><>(Count|Value)
name: kafka_controller_$1_$2
- pattern :
kafka.network<type=(.+), name=(.+)><>Value
name: kafka_network_$1_$2
- pattern :
kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
name: kafka_network_$1_$2_total
labels:
request: "$3"
- pattern : kafka.network<type=(.+),
name=(\w+), networkProcessor=(.+)><>Count
name: kafka_network_$1_$2
labels:
request: "$3"
type: COUNTER
- pattern :
kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
name: kafka_network_$1_$2
labels:
request: "$3"
- pattern :
kafka.network<type=(.+), name=(\w+)><>Count
name: kafka_network_$1_$2
- pattern :
kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
name: kafka_server_$1_$2_total
labels:
topic: "$3"
- pattern :
kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
name: kafka_server_$1_$2_total
type: COUNTER
- pattern :
kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+),
partition=(.*)><>(Count|Value)
name: kafka_server_$1_$2
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
- pattern :
kafka.server<type=(.+), name=(.+), topic=(.+),
partition=(.*)><>(Count|Value)
name: kafka_server_$1_$2
labels:
topic: "$3"
partition: "$4"
- pattern :
kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
name: kafka_server_$1_$2
labels:
topic: "$3"
type: COUNTER
- pattern :
kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+),
brokerPort=(.+)><>(Count|Value)
name: kafka_server_$1_$2
labels:
clientId: "$3"
broker: "$4:$5"
- pattern :
kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
name: kafka_server_$1_$2
labels:
clientId: "$3"
- pattern :
kafka.server<type=(.+), name=(.+)><>(Count|Value)
name: kafka_server_$1_$2
- pattern : kafka.(\w+)<type=(.+),
name=(.+)PerSec\w*><>Count
name: kafka_$1_$2_$3_total
- pattern :
kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
name: kafka_$1_$2_$3_total
labels:
topic: "$4"
type: COUNTER
- pattern :
kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+),
partition=(.+)><>Count
name: kafka_$1_$2_$3_total
labels:
topic: "$4"
partition: "$5"
type: COUNTER
- pattern :
kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
name: kafka_$1_$2_$3_$4
type: COUNTER
- pattern :
kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
name: kafka_$1_$2_$3_$6
labels:
"$4": "$5"
|
Now start up Kafka again
> sudo
/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh
/opt/kafka/kafka_2.11-0.10.1.0/config/server.properties
|
Now try and curl the
/metrics
> curl localhost:1234/metrics
|
Now I have lots and lots
of data, probably more than I want.
So I am going to tweak
the config file again and build it up from the basics to get what I want.
Visual VM
To make my life simpler
I am going to get VisualVM talking to the JMX directly so I can see what variables
are available.
I am going to edit my
kafka start up script to add a few variable to make JMX available at a port.
> sudo vi /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh
|
And place the following
in it.
export KAFKA_OPTS="$KAFKA_OPTS
-Dcom.sun.management.jmxremote"
#Should retrieve local IP address
IP_ADDR=`ip route get 8.8.8.8 | awk '{print $NF; exit}'`
export KAFKA_OPTS="$KAFKA_OPTS
-Djava.rmi.server.hostname=$IP_ADDR"
export KAFKA_OPTS="$KAFKA_OPTS
-Dcom.sun.management.jmxremote.port=9090"
export KAFKA_OPTS="$KAFKA_OPTS
-Dcom.sun.management.jmxremote.authenticate=false"
export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.ssl=false"
|
Now start up Kafka again
> sudo
/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh
/opt/kafka/kafka_2.11-0.10.1.0/config/server.properties
|
Download Visual VM
And Download
Unzip it and start it
up.
Accept the license
Add a JMX Connection.
Inter in the IP address
and port (In my case the kafka server
lives at 192.168.0.140) . Then click OK
Now you should have
this.
Double Click on it.
Click on the monitor tab
and you can now see stuff scrolling by.
Go to Tools ->
Plugins
Select the Available
Plugins tab and checkbox the VisualVM-MBeans and click Install.
Next
Accept the license and
click Install
Finish
Close the connection
Now you should have an
MBeans tab
There are your beans J
Simpler Config File
Let me start very basic
and get one piece of data from the JVM and one from Kakfa.
Here is a very simple
yaml file.
---
lowercaseOutputName: true
|
But if I run with this
it seems I get everything I can possible get.
How many variables?
Run this quick command
to check
> curl -s localhost:1234/metrics | grep
-v "^#" | wc -l
|
In my case it is 2,609
variables. That is far more than I need
to start with.
Let me see if I can
start simpler by using a whitelist.
Whitelist and rules
I think I found my process_cpu_seconds_total variable here
in java.lang.OpertatingSystem { processCpuTime }
Let me see if I can narrow it down to that.
Here is my first go at it.
---
lowercaseOutputName: true
whitelistObjectNames: ["java.lang.OperatingSystem:*"]
|
Restarting Kafka and
checking…
> curl -s localhost:1234/metrics | grep
-v "^#" | wc -l
|
OK that took it down a bit.
Now I am at 48 variables.
Although looking at the data in java.lang.OperatingSystem…
> curl -s localhost:1234/metrics | grep
-v "^#"
|
I probably want most if not all of this data.
But to learn this tool better I want to try and narrow it
down to one variable if at all possible.
Wait something is up if I change the yaml file to this.
---
lowercaseOutputName: true
whitelistObjectNames: ["java2222.lang.OperatingSystem:*"]
|
I get the same 48 metrics.
Are those default metrics I get no matter what?
Maybe, for now, I should focus on just filtering out other
MBeans.
Let me try to narrow the kafka data down to this section. This is where it gets tricky if you are not a JMX expert, and I am not a JMX expert. L
Let me try to narrow the kafka data down to this section. This is where it gets tricky if you are not a JMX expert, and I am not a JMX expert. L
As a test all I want is
the Count under kafka.server.BrokerTopic
To do this you need to
pick apart four parts.
The first three parts
can be used in the white list.
1. Domain
2. Type
3. Name
1.
Domain
In this case the domain
= kafka.server
2.
Type
In this case type =
BrokerTopicMetrics
3.
Name
In this case the Name = BytesInPerSec
I can use these first
three parts and create a whitelistObjectNames.
Here is a simple yaml
file using that.
---
lowercaseOutputName:
true
whitelistObjectNames:
["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]
|
If I restart Kafka with
this yaml file and run this curl command to grab the kafka data points.
> curl -s localhost:1234/metrics | grep
-v "^#" | grep kafka
|
Here is what I got out
kafka_server_brokertopicmetrics_count{name="BytesInPerSec",}
0.0
kafka_server_brokertopicmetrics_oneminuterate{name="BytesInPerSec",}
0.0
kafka_server_brokertopicmetrics_meanrate{name="BytesInPerSec",}
0.0
kafka_server_brokertopicmetrics_fifteenminuterate{name="BytesInPerSec",}
0.0
kafka_server_brokertopicmetrics_fiveminuterate{name="BytesInPerSec",}
0.0
|
We get five data
points. If you look at visualVM (which
you may need to restart)
The five data points are
coming from this section.
For example count => kafka_server_brokertopicmetrics_count{name="BytesInPerSec",}
0.0
How can I narrow it down
further? Get it down to just the Count
for example.
I need to use rules!
Here is an example of a
rule that filters out the count. (It
uses regex to grab the type, name, and attribute variables)
---
lowercaseOutputName: true
whitelistObjectNames:
["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]
rules:
- pattern:
kafka.server<type=(.+),
name=(.+)><>(Count)
name:
WHITEBOARDCODER_kafka_server_$1_$2_$3
|
Here I have one rule,
you can have more. If you have
multiple rules they are applied in order, the first pattern that matches is
used. If no pattern matches the
attribute is not collected.
If I run with this
config file I should get one single kafka data point.
> curl -s localhost:1234/metrics | grep
-v "^#" | grep kafka
|
Here is what I got out
whiteboardcoder_kafka_server_brokertopicmetrics_bytesinpersec_count
0.0
|
I am still getting the
other 48 default jvm data, not sure how to narrow that down, but as for kafka I
narrowed it down to the one point!
Counters and gauges
If you look at the
output of the kafka data point. You will
see that it is set as a gauge.
What if I want it to be
a counter? I can edit that in the yaml
file, just add a counter type in a rule.
---
lowercaseOutputName: true
whitelistObjectNames:
["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]
rules:
- pattern:
kafka.server<type=(.+), name=(.+)><>(Count)
name:
WHITEBOARDCODER_kafka_server_$1_$2_$3
type: COUNTER
|
Now it comes out as a
counter.
Multiple filter
Let me try something a
little more fancy…
Every one of these has
the same attributes.
I am going to use that
to get data for Count and FifteenMinuteRate for every Name
Here is my yaml file.
---
lowercaseOutputName: true
whitelistObjectNames:
["kafka.server:type=BrokerTopicMetrics,name=*,*"]
rules:
- pattern: kafka.server<type=(.+),
name=(.+)><>(Count|FifteenMinuteRate)
name:
WHITEBOARDCODER_kafka_server_$1_$2_$3
type: GAUGE
|
I should end up with 16
variables
> curl -s localhost:1234/metrics | grep
-v "^#" | grep kafka
|
And I do! Nice it worked!
A good place to read up
on what you should be monitoring is here in the Kafka docs https://kafka.apache.org/documentation/#monitoring [5]
Above this section it
says what we do graphing and alerting on.
I am going to try and
create a yaml file to get just this data.
OK this one is close,
but it is missing a few things. Here are
the things it is missing.
·
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
·
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
·
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
·
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)
They only reason I did
not add these is because I cannot see them in the VisualVM on my machine.
Probably due to the fact it is not set up to replicate (only a single kafka
test instance)
With a single topic on
the kafka cluster this gives me 123 kafka data points. That is a lot more manageable than what I had
before with 2,500+.
References
[1] Monitoring Apache Kafka with
Prometheus
[2] Monitoring Kafka with
Prometheus
[3] JMX Exporter github page
[4] JMX Exporter github
page (Example yaml file)
https://github.com/rama-nallamilli/kafka-prometheus-monitoring/blob/master/prometheus-jmx-exporter/confd/templates/kafka.yml.tmpl
Accessed 4/2017
Accessed 4/2017
[5] Kafka Docs Monitoring
Nice work!
ReplyDeleteHI...how did you fixed
ReplyDeleteJMX exporter error...
i.e. FATAL ERROR in native method: processing of -javaagent failed
i'm stuck there
ReplyDelete