Zabbix Cheat Sheet
https://bestmonitoringtools.com/zabbix-web-monitoring-create-web-scenarios-with-examples/
Triggers
Trigger if problem persists after 1 day
{Unattended Upgrades - Warn when reboot required active:vfs.file.exists[/var/run/reboot-required].sum(#24)}=24
When reboot is required, value = 1
Trigger only when still an issue after 1 day, build the sum over last 24 (hourly) values, if sum = 24 -> fire trigger
Other scenarios: reboot 1 h ago, last value = 0 -> sum is 23 -> not triggered
Reboot required since 23 hours -> sum is 23 -> not triggered
Trigger if last or before last value > 0
{myhost:unattended.missing.security.updates.last()}>0 and {myhost:unattended.missing.security.updates.last(#2)}>0
.last(#2) = the before last item value
Actions
Test a zabbix action or escalation
To test actions create a fake test trigger for your scenario (Host, Hostgroup, Severity).
We'll use the "vm.memory.size" from the Linux OS template to crate a fake trigger.
- Configuration -> Hosts -> myhost.example.com -> Triggers
- Add Trigger
- Name: No more beer in the fridge {HOST.NAME} (Test Trigger)
- Expression: {myhost.example.com:vm.memory.size[available].last()}>1
- Note: This is usually always true, as the available memory is bigger than 1 byte
- Severity: Desaster (or as you need to test your action)
- Enabled: yes
- "Update"
To disable, but retain the test trigger, change the expression to "<1", or disable the trigger completely.
Understanding and using Zabbix action operations
https://www.zabbix.com/documentation/3.0/manual/config/notifications/action/escalations?s[]=escalation
I found the operations tab quite confusing, especially "Default operation step duration" and the "Step" numbers.
Here is how it works:
- Default operation step duration: e.g. 300sec (5)min
- This is the "time" grid you choose. If you define "300" you'll operate in a 5 minutes grid.
- "Step" Numbers
- These are the numberd steps in your time grid. Example:
Step1 starts immediately. Step2 starts after 5 minutes, Step3 after 10minutes. - The second value defines how often the step declaration will be repeated. "0" means infinitly.
Example: 2-0 means the operation is first performed after 5 minutes, and then repeates every 5 minutes forever.
Example: 2-2 means the operation is performed exactly 5 minutes after the incident.
Example: 2-4 means the operation ist performed 5,10 and 15 minutes after the incident
- These are the numberd steps in your time grid. Example:
Users / Customers
Create a new user with access to some data and alerts
- Create host group with desired hosts
- Create group with read-only permission for host group
- Optional: Create webscenarios on a host which belongs to the host group above
- Create user with group and
- Media type Email 1-6,00:00-00:15;1-6,00:20-24:00;7,00:00-00:15;7,00:20-04:00;7,04:45-24:00
Warning and above
(Send emails not in a maintenance window on sunday 4:00 - 4:45)
- Media type Email 1-6,00:00-00:15;1-6,00:20-24:00;7,00:00-00:15;7,00:20-04:00;7,04:45-24:00
- Add user group to Configuration -> Actions -> Email (Warning and above)
Media
Send emails not in a maintenance window on sunday 4:00 - 4:45:
- 1-6,00:00-24:00;7,00:00-04:00;7,04:45-24:00
More info: https://www.zabbix.com/documentation/3.0/manual/appendix/time_period
Create User Parameter
A user parameter allows Zabbix server to read the output of a command or script on the host.
Example of a website check script:
- vi /etc/zabbix/check_for_debug_output.php
#!/usr/bin/php <?php /* Test a webpage for debug output. Return 0 if ok, return 1 if debug output detected */ $url = @$_SERVER['argv'][1]; if (!$url) { die('Please give the url to check'); } $opts = array( 'http'=>array( 'method'=>"GET", 'header' => "Authorization: Basic " . base64_encode("myuser:mypasswd") ) ); $context = stream_context_create($opts); $html = file_get_contents($url, false, $context); //var_dump($html); if (strstr($html, 'sf-dump') || strstr($html, 'xdebug')) { die('1'); } die('0');
- chmod 755 /etc/zabbix/check_for_debug_output.php
Test the script
- /etc/zabbix/check_for_debug_output.php http://www.example.com
Define the zabbix item as a user parameter
- vi /etc/zabbix/zabbix_agentd.conf
-
UserParameter=website.debugoutput, /etc/zabbix/check_for_debug_output.php http://www.example.com
-
- service zabbix-agent restart
Test the new zabbix item from the zabbix server
- zabbix_get --tls-connect psk --tls-psk-identity Key1 --tls-psk-file /etc/zabbix/key.psk -s myagenthost.com -p 10050 -k website.debugoutput
Create item and trigger in zabbix
- Configuration -> Hosts -> myagenthost.com -> Items -> Create item
- Name: Check for debug output www.example.com
- Type: Zabbix agent
- Key: website.debugoutput
- Update interval: 60
- -> Save
- Configuration -> Hosts -> myagenthost.com -> Triggers -> Create trigger
- Name: www.example.com has debug output
- Expression: {myagenthost.com:myagenthost.website.debugoutput.last()}=1
- URL: http://www.example.com
- Severity: Warning
- -> Save
- Check for correct data:
- Latest Data -> ...
If you want to add a user parameter which needs root privileges:
- visudo
- zabbix ALL = NOPASSWD: /path/to/script --with params
- Do not forget to use sudo in /etc/zabbix/zabbix_agentd.conf!
-
UserParameter=xxx, sudo /path/to/script --with params
-