Quadrata - Creatori di Soluzioni

Today i want show a problem discovered on one of our Zabbix Server and the Housekeeper process.

The Housekeeper is a periodical process, executed by Zabbix server. The process removes outdated information and information deleted by user.
Most of us know the two parameters inside zabbix_server.conf to limit the process behavior:

The Housekeeper is a periodical process, executed by Zabbix server. The process removes outdated information and information deleted by user.
MaxHousekeeperDelete

No more than ‘MaxHousekeeperDelete‘ rows will be deleted per one task in one housekeeping cycle.
Actually for most of the times no one care so much about that but today we fall on a big Zabbix Server slow down and this problemi s coming from Housekeeper.
Today we understood much better the logic behind the Housekeeper process, i will try to explain below.
Few days ago we have removed 3 items prototypes from a template that it was linked to 60 hosts and for every hosts that prototype items is near to 300 real items.
So how many orphaned items we have? 3 (proto items) *60 (hosts)*100(expanded proto items)=18000 !
But the huge strange behaviour is here, MaxHousekeeperDelete. If we set to MaxHousekeeperDelete=500 zabbix will try to remove 500 history value per orphaned items.
So what happend?
In the beginning the Housekeeper process it will try to remove 18000×500=9.000.000 of history value!!!!
For example if we look on Zabbix server Log:
————–
housekeeper [deleted 68 hist/trends, 4522000 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 2649.273207 sec, idle 1 hour(s)]
————–
The “4522000 items” is the deleted orphaned items value for a single Housekeeper process.
To discuss about this strange logic we have opened an official trouble ticket in Zabbix

I’m please to announce the new Zabbix Certify Specialists of october 2014, if you want to become the next one please check the official zabbix training schedule.

Today i have found a big problem, my Trends table is very huge (>250GB) how can i fix this problem?
Digging on Internet i found an easy solution and it comes from our Zabbix Community 🙂

Today i have found a lot of news about our Monitoring tool!
I have translated this very good article from a russian website, every award is for http://habrahabr.ru/, thanks so much.

During the implementation of Zabbix in our very extensive infrastructure, I was faced with the need to monitor the hardware of a fairly large fleet of servers HP Proliant different models and generations regardless of operating system agents and HP. itself an idea suggested itself to realize all this through iLO, but the task proved to be far less trivial, what initially looked. As a result of its decision turned out pretty interesting design that:

Uses the discovery, saves us from having to manually set anything at all, except for the address iLO,
Monitors the temperature of coolers and food servers Proliant, ranging from 5 generations
Monitors the state of the memory and hard drive for servers Proliant, ranging from 7 generation
Collects general information for inventory – serial number, model number, firmware version.

Now exactly how this was done.
Seemingly simple: iLO can give data through IPMI, and Zabbix has native support for this protocol, but, as usual, was smooth on paper. When we look at the issue once there are three problems:

Zabbix uses a library openipmi, in which there is a bug – a successful connection to iLO will only happen if it originated from the name of an account that has administrator privileges. From a security standpoint this is fundamentally wrong. You can solve this patch / update, but it does not eliminate the other,
Removing information from discrete sensors via IPMI is not supported,
And finally, for different models of servers keys, names and number of sensors differ. Make templates for each model by hand – very productive.

In connection with the foregoing, it was decided to write a separate mechanism for interaction with iLO, relying on scripts and other utilities to work with IPMI. As a programming language was chosen perl, as well as a data source – package freeipmi . On all servers in the wards iLO account was created to monitor read-only rights. Logically, the entire structure is divided into two parts:

Script detection data sources ilo_discovery.pl – iLO polls for the supported parameters and keys, parses them, and outputs a format understandable Zabbix,
Script retrieve data ipmi_proliant.pl – on request gives the value of a specific parameter.

Just want to note that perl programmer and I am not used to solve problems of those examples and designs that were clear to me, the end result was achieved – all this works successfully.

Detection script

This provides data in a script format zabbix discovery according to which class the data was requested – sensors, chassis information, and so forth. Such separation is due to the logic of the template that is used in conjunction with scripts.

ilo_discovery.pl

Script retrieve data

This script outputs the value of specific sensors – again, depending on what class of data has been requested. The obtained data is cached in a text file, so you do not accidentally zaddosit iLO simultaneous requests.

ipmi_proliant.pl

Template monitoring

Write scripts – half the battle. Had yet to properly configure the import of all of this information to Zabbix and configure the triggers. The result of this work was the monitoring template, which established rules for detecting all sensors and other data sources and automatically create a corresponding triggers and graphs.

Application in practice

For practical application of the above construction is necessary:

Download the archive with the script and template import template Zabbix,
Put scripts ilo_discovery.pl ipmi_proliant.pl and the folder specified as storage ExternalScripts config Zabbix, and make it executable,
Download and install FreeIPMI (FAQ assembly and dependencies is here ):

Create a user account for iLO Zabbix and prescribe its data in scripts ($ user and $ pass),
Check that FreeIPMI successfully connects to iLO (address, username and password substitute your own):

In Zabbix front-end server, which we want to interrogate through iLO, iLO enter the address in the macro {$ ILO} (in the address field ipmi interface is nothing to indicate it is not necessary)
Bind to the server template monitoring iLO
Wait until fulfills detection.

Will look something like this section lastest data for node monitoring iLO:

Charts on data obtained included:

Conclusion

This monitoring mechanism has been successfully tested with HP Proliant server series DL, ML and BL 5, 6, 7 and 8 generations.General recommendation – try before applying it to update to the latest versions of iLO firmware. As for the younger line of servers, having on board instead Lo100 iLO – with them all this will work too, but some of the information obtained from the older models of the same generation, will not be available because lo100 sends less data than iLO.