This guide assumes you have a pagerduty account and are familiar with API key service alerts. If you don't know how to set up pagerduty I have posted a few guides on the subject at http://www.whiteboardcoder.com/2014/08/pagerduty-getting-started.html and http://www.whiteboardcoder.com/2014/10/pagerduty-api-service-alert.html
It also assumes you have a Sensu Master/client set up.
If you need help setting up sensu check out these guides I
wrote. http://www.whiteboardcoder.com/2014/10/sensu-getting-started.html or http://www.whiteboardcoder.com/2014/10/sensu-setting-up-client.html
Another guide I found on this same subject can be found at http://www.pagerduty.com/docs/guides/sensu-integration-guide/
[1]
This guide is quick and simple but a little outdated. For example it says Sensu will not resolve
the pagerduty incident once triggered, that is not longer true (with the code
they are using)
Installing redphone gem
There is a gem called redphone its github page is at https://github.com/portertech/redphone
[2]. It is used to support pagerduty,
pingdom, loggly, and statusPage. All the
API request are done over SSL.
To install this gem run
> sudo gem install redphone
|
To list your installed gems run this command
> gem list
|
Before I get too far…
Before I get too far into this I want to use the redphone
gem and create a simple ruby script trigger a pagerduty alert.
Head over to http://www.pagerduty.com/
and login to your account.
I am going to create a new temporary API service to use for
this test. Here is how to go about that.
Click on Services
Click on +Add New Service
Give it a name, select an Escalation Policy. Set the Integration type to Use our API
directly and click Add service.
Copy the Service API Key.
For purpose of this write up I will use
999XXXXXXXXXXb1 as my key.
A simple ruby Script
OK I have my API token now I want to write a simple ruby
script to
Trigger the pagerduty alert. This script in no way will use
Sensu, but it will use the Red Phone gem I installed.
> vi pagerduty_test.ruby
|
Here is my code
(change the api_key to your own)
#!/usr/bin/ruby
require
'redphone/pagerduty'
api_key = '999XXXXXXXXXXb1'
response = Redphone::Pagerduty.trigger_incident(
:service_key => api_key,
:description => "This is
a Test Alert from Patrick"
)
if response['status']
== 'success'
puts "pagerduty Alert issued!"
else
puts "Error issusing pagerduty '"
+ response "'"
end
|
Save it then make the file executable
> chmod u+x pagerduty_test.ruby
|
Run the program
> ./pagerduty_test.ruby
|
It successfully triggered my alert wahoo!
Now to get to work with Sensu…
Pagerduty Sensu Handler
Now that I have figured out how to use the RedPhone gem I
need to figure out Sensu Handlers.
The Sensu doc page for this is located at http://sensuapp.org/docs/0.16/handlers
[2]
Looking over this page there are several different types of
handlers, pipe, TCP, UDP, AMQP, and
Sets.
For my near term purposes I think I only really need pipe
and sets.
Pipe handlers execute a script and pass the event in via
STDIN.
Sets are used for grouping handlers. It’s a way to send the same event to several
handlers at the same time. For example
if you want an event to two different Pipe handlers, one which sends a message
to HipChat and one that sends an email, you can use a set handler. I am not going to use a Set handler in this
document, but I thought it worth mentioning.
I found this github repo https://github.com/sensu/sensu-community-plugins/tree/master/handlers/notification
[3] which contains several Sensu Handlers you can just copy and use. The pagerduty handler can be found at https://github.com/sensu/sensu-community-plugins/blob/master/handlers/notification/pagerduty.rb
[4]
Here is the code (as it was on 11/17/2014)
#!/usr/bin/env ruby
#
# This handler creates and resolves PagerDuty
incidents, refreshing
# stale incident details every 30 minutes
#
# Copyright 2011 Sonian, Inc
<chefs@sonian.net>
#
# Released under the same terms as Sensu (the MIT
license); see LICENSE
# for details.
#
# Dependencies:
#
#
sensu-plugin >= 1.0.0
#
require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'
class Pagerduty < Sensu::Handler
def
incident_key
source =
@event['check']['source'] || @event['client']['name']
[source,
@event['check']['name']].join('/')
end
def handle
if
@event['check']['pager_team']
api_key =
settings['pagerduty'][@event['check']['pager_team']]['api_key']
else
api_key = settings['pagerduty']['api_key']
end
begin
timeout(10) do
response = case @event['action']
when
'create'
Redphone::Pagerduty.trigger_incident(
:service_key => api_key,
:incident_key => incident_key,
:description => event_summary,
:details => @event
)
when
'resolve'
Redphone::Pagerduty.resolve_incident(
:service_key => api_key,
:incident_key => incident_key
)
end
if
response['status'] == 'success'
puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- '
+ incident_key
else
puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' +
incident_key
end
end
rescue
Timeout::Error
puts
'pagerduty -- timed out while attempting to ' + @event['action'] + ' a incident
-- ' + incident_key
end
end
end
|
For my first test I am going to create a notifications
folder and use wget to retrieve the code from github
> sudo mkdir -p
/etc/sensu/handlers/notifications
> cd /etc/sensu/handlers/notifications/
> sudo wget https://raw.githubusercontent.com/sensu/sensu-community-plugins/master/handlers/notification/pagerduty.rb
|
Save it then make the file executable
> sudo chmod a+x pagerduty.rb
|
It still requires a handler and a "pagerduty" json
setting.
Create the pagerduty handler
> sudo mkdir -p /etc/sensu/conf.d/handlers
> sudo vi /etc/sensu/conf.d/handlers/pagerduty.json
|
Replace the api_key with your own.
{
"handlers": {
"pagerduty": {
"command": "/etc/sensu/handlers/notifications/pagerduty.rb",
"type": "pipe",
"severities": [
"ok",
"critical",
"unknown"
]
}
},
"pagerduty": {
"api_key": "999XXXXXXXXXXb1"
}
}
|
Then I have to add this handler to a check. In my case I had a check called
check_file.json I had created before, so I will edit that.
> sudo vi
/etc/sensu/conf.d/check_file.json
|
{
"checks": {
"check_file": {
"handlers": [
"default",
"hipchat", "pagerduty"
],
"command":
"/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
"interval": 60,
"occurrences": 3,
"subscribers": [
"check-from-sensu-master",
"client-1",
"client-2",
"aws-client"
]
}
}
}
|
All I did was add the "pagerduty" handler in the
handlers section.
Restart the Sensu
Master with the following command, and its client
> sudo
service sensu-server restart && sudo service sensu-api restart
&& sudo service sensu-client restart
|
To trigger my check_file check I just need to remove a file
from my home directory.
> rm ~/test.txt
|
After 3 occurrences the handler triggers.
And it worked! The
pagerduty service was triggered!
I immediately acknowledged the issue.
> touch ~/test.txt
|
Bringing back the file resolves the issue in Sensu and
Pagerduty.
That is not how I want Sensu to do. I do not want Sensu to Resolve any pagerduty
service alarm, only to trigger them. So
that means I need to tweak some code.
Here is my tweaked code.
#!/usr/bin/env ruby
require 'rubygems' if
RUBY_VERSION < '1.9.0'
require
'sensu-handler'
require
'redphone/pagerduty'
class Pagerduty <
Sensu::Handler
def incident_key
source = @event['check']['source'] ||
@event['client']['name']
[source,
@event['check']['name']].join('/')
end
def handle
if @event['check']['pager_team']
api_key =
settings['pagerduty'][@event['check']['pager_team']]['api_key']
else
api_key =
settings['pagerduty']['api_key']
end
begin
timeout(10) do
if @event['action'] == 'create'
response =
Redphone::Pagerduty.trigger_incident(
:service_key =>
api_key,
:incident_key =>
incident_key,
:description =>
event_summary,
:details => @event
)
if response['status'] == 'success'
puts 'pagerduty -- ' +
@event['action'].capitalize + 'd incident -- ' + incident_key
else
puts 'pagerduty -- failed to ' +
@event['action'] + ' incident -- ' + incident_key
end
end
end
rescue Timeout::Error
puts 'pagerduty -- timed out while
attempting to ' + @event['action'] + ' a incident -- ' + incident_key
end
end
end
|
This code worked, it triggers a pagerduty alert but does not
resolve it.
Playing with the handler for a bit and looking at the logs I
saw this message.
{"timestamp":"2014-11-17T19:47:58.930938-0700","level":"info","message":"handler
output","handler":{"command":"/etc/sensu/handlers/notifications/pagerduty.rb","type":"pipe","severities":["ok","critical","unknown"],"name":"pagerduty"},"output":"only handling every 30
occurrences: sensu-master/check_file\n"}
|
Only handling every 30 occurences.
I found this post https://github.com/sensu/sensu/issues/613
[5] which mentions that if you are using the sensu-handler this is the expected
behavior. The default is to only trigger
once every 30 minutes (after the initial trigger…. The initial trigger delay
does not count).
I ran a little test.
I triggered a pagerduty alert and acknowledged it. Then let my sensu alarm run for 30
minutes. At the 30 minute mark the
pagerduty handler triggered again. All
it did was add a "Triggered" event to the current open triggered
alert. So no new alert was created,
which is exactly the behavior I was looking for.
If I leave the service alert in an acknowledged state and
fix the Sensu check (by creating the file again). Wait for it to resolve then remove the file
again (to trigger the incident again).
At this point I still have an open, but acknowledged, incident.
The new trigger does not open a new alert, but just add
another "triggered" event to the current open triggered alert… Not
quite what I want… I need to think this through and play with it.
Looking at the pagerduty I can see that it has the same
incident key… Maybe if I had a different incident key it would issue a
different alert?
I have a second check called check_second_file.json
Which checks for
> sudo vi /etc/sensu/conf.d/check_second_file.json
|
Which works the same way as my last check but looks for a
different file.
Restart the Sensu
Master with the following command, and its client
> sudo
service sensu-server restart && sudo service sensu-api restart
&& sudo service sensu-client restart
|
If I remove the first file and trigger the alert and
acknowledge it. Then I trigger the
second Sensu alert, do I get two alerts in pagerduty?
Yes I do! So, all it
needs is a unique incident Key!
This actually works exactly like I want it to. I want each type of Sensu Check to only have
one open pagerduty alert at a time. If
an alert is open and it triggers again I want it to be absorbed into the last
alert.
But if you don't want that and you want each one to trigger
a new alert you could put a timestamp in the incident key. The following code does exactly that.
#!/usr/bin/env ruby
require 'rubygems' if
RUBY_VERSION < '1.9.0'
require
'sensu-handler'
require
'redphone/pagerduty'
class Pagerduty <
Sensu::Handler
def incident_key
source = @event['check']['source'] ||
@event['client']['name']
[source,
@event['check']['name']].join('/') + ("%10.5f" % Time.now.to_f).to_i.to_s
end
def handle
if @event['check']['pager_team']
api_key =
settings['pagerduty'][@event['check']['pager_team']]['api_key']
else
api_key =
settings['pagerduty']['api_key']
end
begin
timeout(10) do
if @event['action'] == 'create'
response =
Redphone::Pagerduty.trigger_incident(
:service_key =>
api_key,
:incident_key =>
incident_key,
:description =>
event_summary,
:details => @event
)
if response['status'] == 'success'
puts 'pagerduty -- ' +
@event['action'].capitalize + 'd incident -- ' + incident_key
else
puts 'pagerduty -- failed to ' +
@event['action'] + ' incident -- ' + incident_key
end
end
end
rescue Timeout::Error
puts 'pagerduty -- timed out while
attempting to ' + @event['action'] + ' a incident -- ' + incident_key
end
end
end
|
One problem with this is that it would trigger a new
incident every 30 minutes when the default handler timer goes off.
To fix that you could override the default timer. Add a "refresh" to your check.
> sudo vi
/etc/sensu/conf.d/check_file.json
|
And add a refresh variable.
{
"checks": {
"check_file": {
"handlers": [
"default", "hipchat", "pagerduty"
],
"command": "/etc/sensu/plugins/check-file.rb -f
/home/patman/test.txt",
"interval": 60,
"occurrences": 3,
"refresh": 43200,
"subscribers": [
"check-from-sensu-master",
"client-1",
"client-2",
"aws-client"
]
}
}
}
|
43200 seconds is 12 hours.
The handlers for this check will only re-run every 12 hours.
That may help out.
But for me I will probably set this to 3600 and not use the epoch
timestamp.
Different Pagerduty alerts
What if you want to have different Sensu checks and you want
thos checks to trigger different pagerduty service alerts?
Luckily someone thought of that when writing the pagerduty.rb
code. You can designate a pager_team in
your check and use that pager_team's API alert key. This makes it easy to have specific pagerduty
alerts triggered per Sensu Check.
> sudo vi /etc/sensu/conf.d/check_file.json
|
Edit your check and designate a pager_team (you make up the
team name)
{
"checks": {
"check_file": {
"handlers": [
"default",
"hipchat", "pagerduty"
],
"command":
"/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
"interval": 60,
"occurrences": 3,
"refresh": 3600,
"pager_team": "alert_1",
"subscribers": [
"check-from-sensu-master",
"client-1",
"client-2",
"aws-client"
]
}
}
}
|
Edit the pagerduty.json file adding pager_teams
> sudo
vi /etc/sensu/conf.d/handlers/pagerduty.json
|
Replace the api_keys with your own. And use the team names
you created here.
{
"handlers": {
"pagerduty": {
"command":
"/etc/sensu/handlers/notifications/pagerduty.rb",
"type": "pipe",
"severities": [
"ok",
"critical",
"unknown"
]
}
},
"pagerduty": {
"api_key": "999XXXXXXXXXXb1",
"alert_1": {
"api_key": "33XXXXXXXXXXXee9"
},
"alert_2": {
"api_key": "999XXXXXXXXXXb1"
}
}
}
|
Now I effectively
have a default pagerduty alert and two specific alerts that a check can use if
the check designates the pager_team.
Restart the Sensu
Master with the following command, and its client
> sudo
service sensu-server restart && sudo service sensu-api restart
&& sudo service sensu-client restart
|
Now I removed my
file to trigger my Sensu alert
> rm ~/test.txt
|
After 3 occurrences the handler triggers.
And it worked! The
pagerduty service from the pager_team 'alert_1' was triggered! Cool this is what I need
Bringing back the file resolves the issue.
> touch ~/test.txt
|
As a test I am going to edit my second Sensu check and have
it use a different alert and see how it works.
> sudo vi /etc/sensu/conf.d/check_second_file.json
|
{
"checks": {
"check_file_2": {
"handlers": [
"default",
"hipchat", "pagerduty"
],
"command":
"/etc/sensu/plugins/check-file.rb -f /home/patman/test-2.txt",
"interval": 60,
"occurrences": 3,
"refresh": 3600,
"pager_team" : "alert_2",
"subscribers": [
"client-1",
"client-2",
"aws-client"
]
}
}
}
|
Restart the Sensu
Master with the following command, and its client
> sudo
service sensu-server restart && sudo service sensu-api restart
&& sudo service sensu-client restart
|
Now let me see if I
can trigger these two different Sensu alarms and have each one in turn trigger
its own pagerduty service alert.
> rm ~/test.txt
|
> rm ~/test-2.txt
|
Perfect! It created
two different pagerduty alerts like it should!
A few more tweaks…
I like the Description, but it would be nice to be able to
append a message to it.
And looking at the Message Itself I see
Which is not really what I want.
So I need to tweak the code a bit.
> sudo vi /etc/sensu/handlers/notifications/pagerduty.rb
|
Here is the code I came up with.
#!/usr/bin/env ruby
require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'
class Pagerduty < Sensu::Handler
def
incident_key
source =
@event['check']['source'] || @event['client']['name']
[source,
@event['check']['name']].join('/')
end
def handle
if
@event['check']['pager_team']
api_key =
settings['pagerduty'][@event['check']['pager_team']]['api_key']
else
api_key = settings['pagerduty']['api_key']
end
if
@event['check']['pagerduty_desc']
description = event_summary + ' ' +
@event['check']['pagerduty_desc']
else
description = event_summary
end
if @event['check']['playbook']
details = @event['check']['playbook']
else
details = @event
end
begin
timeout(10) do
if
@event['action'] == 'create'
response = Redphone::Pagerduty.trigger_incident(
:service_key => api_key,
:incident_key =>
incident_key,
:description => description,
:details => details
)
if
response['status'] == 'success'
puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- '
+ incident_key
else
puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' +
incident_key
end
end
end
rescue
Timeout::Error
puts
'pagerduty -- timed out while attempting to ' + @event['action'] + ' a
incident -- ' + incident_key
end
end
end
|
Then edit the check json file
> sudo vi
/etc/sensu/conf.d/check_file.json
|
Edit your check and designate a pager_team
{
"checks": {
"check_file": {
"handlers": [
"default",
"hipchat", "pagerduty"
],
"command":
"/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
"interval": 60,
"occurrences": 3,
"refresh": 3600,
"pager_team":
"alert_2",
"pagerduty_desc": "This is a test for a
message https://www.google.com",
"playbook" : "Go get notes on how to fix
this at https://www.yahoo.com",
"subscribers": [
"check-from-sensu-master",
"client-1",
"client-2",
"aws-client"
]
}
}
}
|
The pagerduty_desc will append to the description. The playbook will replace the message sent to
pagerduty. (I am using playbook because I have seen it used by other sensu
handlers, I am not sure if it’s in standard use or not?)
If they are not present in the check the normal description
and message will be sent.
Restart the Sensu
Master with the following command, and its client
> sudo
service sensu-server restart && sudo service sensu-api restart
&& sudo service sensu-client restart
|
Trigger an alert
> rm ~/test.txt
|
In pagerduty the message is
appended and the URL is clickable.
Looking at the incident
details the Message shows up just fine and the URL is clickable.
Perfect! That is exactly what I wanted.
I think that is enough for this write up, hope it helps
someone.
References
[1] How To Integrate Sensu with PagerDuty
Accessed 11/2014
[2] Redphone, the monitoring service ruby library github page
Accessed 11/2014
[3] sensu-community-plugins
Accessed 11/2014
[4] sensu-community-plugins pagerduty.rb
Accessed 11/2014
[5] bugs in multiple handlers #613
Accessed
11/2014
Epic Goal: Set up a phone number, via twillio, that when called will set off a pagerduty event.
Its also part of my general Sensu Epic
Epic Goal: My goal is to figure out how to use Sensu to moni
Epic Goal: Set up a phone number, via twillio, that when called will set off a pagerduty event.
Its also part of my general Sensu Epic
Epic Goal: My goal is to figure out how to use Sensu to moni
No comments:
Post a Comment