In this article I am
going to be going over some quick explanation on how Kafka works and then go
install it on an Ubuntu 16.04 Server and run a few basic commands to make sure
it's working.
What does Kafka do?
First what is Kafka and
why would I want it?
From Wikipedia
Apache Kafka is an
open-source stream processing platform developed by the Apache Software
Foundation written in Scala and Java. The project aims to provide a unified,
high-throughput, low-latency platform for handling real-time data feeds. Its
storage layer is essentially a "massively scalable pub/sub message queue
architected as a distributed transaction log,"[3] making it highly
valuable for enterprise infrastructures to process streaming data.
OK, fantastic what does that mean?
A good place to start educating yourself on Kafka is their
own documentation http://kafka.apache.org/documentation/
[2]
But before you dive into that, let me go over a simple "How
Kafka Works" example.
Records, Topics and Partitions
In Kafka information is stored as a Record. A Record contains three pieces of
information.
1 1. Value:
The stored message (messages are typically small ~10KB)
The stored message (messages are typically small ~10KB)
2 2. Key:
An optional key can be associated with a record
An optional key can be associated with a record
3 3. Timestamp:
As of version 0.10.0 Records now include a timestamp
As of version 0.10.0 Records now include a timestamp
Records are written sequentially to a Partition
of a Topic.
This image shows the anatomy of a Topic that contains a
single Partition. As records come in
they are appended to the end of the partition.
This Forms an immutable sequence of records. If I add one more record to the above the new
record would be appended to the end.
This shows that the newly added record is added to the
end. When a record is added to a
partition it is given a sequential id number called the offset. In this case the
newly added record has an offset number of 6.
Lifecycle of a Record?
How long does a Record stay around? That depends.
In the server.properties file there are a few settings that
determine how long a record stays in a partition.
Name
|
Description
|
Type
|
Default
|
log.retention.hours
|
The number of hours to keep a log file before deleting it
(in hours), tertiary to log.retention.ms property
|
int
|
168
|
log.retention.bytes
|
The maximum size of the log before deleting it
|
long
|
-1
(no size limit) |
There are the basic properties that set the retention rules
for a record. The default settings is to
remove a record if it is older than 168 hours (7 days). You can also set a byte size limit if you
do when the size of the partition exceeds that size records will be removed
from the front of the partition to get the total under the size.
For example…
In this Topic with one partition 6 records have been
written. The first two records where
written on day 1 and the rest on day 5.
Eight days later if we look at the partition we will see
that the first two records have been removed.
They were removed based on the log.retention.hours in the server
settings. Everything 7 days or older is
removed.
Reading from a Topic
When a consumer starts up it subscribes to a Topic.
Although a new consumer can read every message in a topic it
is more typical to subscribe to a topic and wait for new records to be sent to
it.
When a new record is added to the Topic it sends the new
record to all Consumers attached to that Topic/Partition. In this example the Consumer a subscribed to
this Topic after Record '4' was added.
No records were sent to the Consumer until the next record, Record '5',
was added to the topic.
As long as this consumer is attached all records added to
this Topic/Partition will be sent to it.
Multiple Consumers can be attached to the same
Topic/Partition.
OK now that gives us some basics. With that in mind I am going to install
Kafka on Ubuntu 16.04 and do a few tests.
Installing Kafka on Ubuntu 16.04
I have a basic Ubuntu
16.04 server installed.
Install Oracle java 1.8
You need Java installed on the machine and I prefer
installing Oracles Java vs the OpenJDK.
Run the following command to install it.
> sudo echo
oracle-java8-installer \
shared/accepted-oracle-license-v1-1 select true | \
sudo /usr/bin/debconf-set-selections
> sudo echo \
"deb http://ppa.launchpad.net/webupd8team/java/ubuntu
trusty main" | \
sudo tee /etc/apt/sources.list.d/webupd8team-java.list
> sudo echo \
"deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu
trusty main" | \
sudo tee -a /etc/apt/sources.list.d/webupd8team-java.list
> sudo apt-key adv
--keyserver \
hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
> sudo apt-get update
> sudo apt-get -y
install oracle-java8-installer
|
Now check the java version
> java -version
|
Install Zookeeper
Now we need to install Zookeeper. I am not a zookeeper guy… yet but it's
needed for a Kafka install.
> sudo apt-get install zookeeperd
|
Test to make sure it's up
> netstat -ant | grep :2181
|
This is the results you want.
Install Kafka
Here is Kafka's Download page https://kafka.apache.org/downloads.html
[3]
This is where I found the URL to download.
This is where I found the URL to download.
> wget \
http://apache.cs.utah.edu/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz
|
Make a director for kafka and untar it.
> sudo mkdir /opt/kafka
> sudo tar -xvf kafka_2.11-0.10.1.0.tgz
-C /opt/kafka
|
Try it out real quick to make sure it runs.
> sudo
/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh \
/opt/kafka/kafka_2.11-0.10.1.0/config/server.properties
|
Looks good.
Leave it running and use the Kafka-console tools to talk to
it.
These tools are located at
/opt/kafka/kafka_2.11-0.10.1.0/bin/
I am going to set up some simple scripts to make it simpler
to run these command.
> sudo
vi /bin/kafka-topics
|
And place the following in it
#!/bin/bash
exec
"/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-topics.sh" "$@"
|
Make it executable
> sudo
chmod 755 /bin/kafka-topics
|
Let me do the same thing for kafka-console-consumer
> sudo
vi /bin/kafka-console-consumer
|
And place the following in it
#!/bin/bash
exec
"/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-console-consumer.sh"
"$@"
|
Make it executable
> sudo
chmod 755 /bin/kafka-console-consumer
|
Let me do the same thing for kafka-console-producer
> sudo
vi /bin/kafka-console-producer
|
And place the following in it
#!/bin/bash
exec "/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-console-producer.sh"
"$@"
|
Make it executable
> sudo
chmod 755 /bin/kafka-console-producer
|
Creating a Topic
In Kafka you post messages to topics. Currently you have no topic set up. To prove this, run this command.
> kafka-topics
--zookeeper localhost:2181 --list
|
You should get nothing returned
Now create a topic
> kafka-topics
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic "topic-one"
|
For this simple example I will not go into multiple
Partitions or Replication-factor.
And now list all topics again.
And now list all topics again.
> kafka-topics
--zookeeper localhost:2181 --list
|
There it is…
Send Message to Topic
In another terminal start a producer
> kafka-console-producer --broker-list \
localhost:9092 --topic topic-one
|
Then in one terminal start a consumer and listen to the
topic
> kafka-console-consumer
--bootstrap-server \
localhost:9092 --topic topic-one
|
Now on the producer side type in some messages. Each time you hit return it will send the
line you typed.
Messages produced are consumed on the other side
While I am at it let me add another consumer
> kafka-console-consumer --bootstrap-server
\
localhost:9092 --topic topic-one
|
Also you can feed it an entire file.
Let me create a test file
> vi /tmp/test.txt
|
And place the following in it
Line 01 This is line 1
Line 02 each line becomes a record
Line 03 That is how the console-producer works
Line 04 just to show you
|
Now run this command to feed the test.txt file into the
Kafka Topic
> kafka-console-producer --broker-list \
localhost:9092 --topic topic-one < /tmp/test.txt
|
Each line of the file becomes a record. That is the way the console-producer works.
You could just pipe the info
> cat /tmp/test.txt | kafka-console-producer
--broker-list \
localhost:9092 --topic topic-one
|
Or you can use this file to tail a file.
> tail -f -n +1 /tmp/test.txt | kafka-console-producer \
--broker-list localhost:9092 --topic topic-one
|
Now just append to the /tmp/test.txt file and see its
messages get sent.
> echo "APPEND ME" >>
/tmp/test.txt
|
There you go a very basic overview on very basic Kafka Topic
with one partition.
(More to come as I do more research)
References
[1] Kafka Wikipedia page
Accessed 12/2016
[2] Kafka Download page
Accessed 12/2016
[3] Kafka Download page
Accessed 12/2016
No comments:
Post a Comment