Monitor SMART Attributes with Zabbix
In my previous post I setup my pogoplug device as a backup server. Since it was holding somewhat important information, I wanted to monitor the disk that I plugged into the pogoplug device to make sure it’t not failing. This is where the smartmontools software comes into play. With the smartmontools package there comes a utility called smartctl, which allows you to query the SMART attributes of the hard drive (if the drive supports it).
Compile Zabbix Agentd on Pogoplug
Before I checked the SMART attributes, I had to make sure that I was able to run the zabbix agent on the device. I ran into a post entitled “Install Zabbix Agent on Synology” and it seems that it was possible. Following the instructions laid out in that post, here is what I did to compile zabbix on the pogoplug device. First install the prerequisites packages that will be necessary to perform the compile:
$ ipkg install gcc make bison flex gconv-modules
Next create a symlink to /opt/include under /usr. If you don’t do this, the configure script won’t be able to find the iconv.h header file. Here are the commands to accomplish that:
$ cd /usr
$ ln -s /opt/include
After that let’s download the source for zabbix:
$ cd /opt/tmp
$ wget http://downloads.sourceforge.net/.../zabbix-2.0.8.tar.gz
Now let’s extract the source:
$ tar xzf zabbix-2.0.8.tar.gz
Next configure the source:
$ cd zabbix-2.0.8
$ ./configure --enable-agent --prefix=/opt/zabbix
After that is done, compile and install the package:
make
make install
That’s actually it, now you have zabbix agentd installed under /opt/zabbix.
Configure Zabbix Agentd on Pogoplug
Now that it’s installed let’s make sure we can start it. First edit the /opt/zabbix/zabbix_agentd.conf file and add/modify the following parameters:
pogo:~# grep -vE '^$|^#' /opt/zabbix/etc/zabbix_agentd.conf
PidFile=/opt/var/run/zabbix_agentd.pid
LogFile=/opt/var/log/zabbix_agentd.log
Server=192.168.1.XX
ListenIP=192.168.1.XX
StartAgents=1
Hostname=pogo.dnsd.me
AllowRoot=1
If you don’t want to run the application as root you can add the zabbix user and allow him to write under the /opt/var/run and /opt/var/log directories. This wasn’t really meant to be a multi-user platform, so I decided not to go down that route. Plus down the line I needed to allow the user that is running zabbix to be able to execute the smartctl command. On regular systems you can do that with sudo, but I didn’t want to install that. Running sudo requires the suid bit set, which means I would have to mount the filesystem with the suid flag, by default it’s mounted with the nosuid flag. Call me lazy, but I just didn’t feel like it was worth the trouble.
After the config is in place, run the following:
$ /opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/etc/zabbix_agentd.conf
The process should start up. You can check using ps:
pogo:~# ps | grep zabb
16100 zabbix 2936 S /opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/etc/za
16101 zabbix 2936 S /opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/etc/za
16102 zabbix 2936 S /opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/etc/za
You can also confirm it’s listening on 10050 with netstat:
pogo:~# netstat -antp | grep zabb
tcp 0 0 192.168.1.104:10050 0.0.0.0:* LISTEN 16100/zabbix_agentd
The last thing to do is to setup an init script. I just copied the dropbear start-up script and modified it to fit my needs:
$ cp /etc/init.d/dropbear.sh /opt/etc/init.d/zabbix-agentd
$ vi /opt/etc/init.d/zabbix-agentd
I really didn’t change much, but just for reference here is the script. Lastly to make sure it starts automatically on boot up, add it to the /etc/init.d/rcS file:
/opt/etc/init.d/zabbix-agentd start
You can check to make sure zabbix agentd starts up without issues by rebooting the pogoplug device.
Add Swap to Pogoplug
As soon as you add the device to your zabbix server you will get warning about the device missing swap space, so let’s add that. We will need the mkswap command to format our file as swap, so let’s install that first. That utility is part of the util-linux package:
$ ipkg install util-linux
After it’s installed, let’s create the directory where we will store the swap file:
$ mkdir /opt/etc/swap
Now let’s create a 256MB file:
$ dd if=/dev/zero of=/opt/etc/swap/swapfile.img bs=1M count=256
Now let’s format the file as swap:
$ /opt/sbin/mkswap /opt/etc/swap/swapfile.img
Lastly enable the swap file:
$ swapon /opt/etc/swap/swapfile.img
You can confirm with the free command that the swap file is utilized:
pogo:~# free -m
total used free shared buffers cached
Mem: 115 109 6 0 35 46
-/+ buffers/cache: 26 88
Swap: 255 0 255
BTW the free command comes from the procps package. The last thing to do is to enable swap on boot, so edit the /etc/init.d/rcS file and add the following to it:
/sbin/swapon /opt/etc/swap/swapfile.img
Reboot the device to make sure swap is added upon boot up.
Configure Zabbix Agent to Pull SMART Attributes from a Hard Drive
There is a really good guide from zabbix on how to set it up: S.M.A.R.T. HDD Monitoring with Zabbix. Here is the gist of the setup. First create a script that will grab values depending on the drive and attribute that you pass in to it. Here is the script:
$ cat /usr/local/bin/getsv
#!/bin/bash
if [ $# -ne 2 ];
then
echo "Usage: $0 <device> <parameter>"
exit
fi
sudo smartctl -A $1 | grep $2 | tr -s ' ' | sed "s/^[[:space:]]*\(.*\)[[:space:]]*$/\1/" | cut -d " " -f 10
If smartmontools are not installed, go ahead and install it:
$ sudo yum install smartmontools
Confirm your drive returns SMART attributes:
$ sudo smartctl -A /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.11.1-200.fc19.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 182 182 063 Pre-fail Always - 24757
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 235
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 244 233 187 Pre-fail Always - 55830
9 Power_On_Minutes 0x0032 136 136 000 Old_age Always - 260h+36m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 252 252 000 Old_age Always - 469
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 040 253 000 Old_age Always - 44
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 5848
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 144 001 000 Old_age Offline - 406
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0
202 Data_Address_Mark_Errs 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0
204 Soft_ECC_Correction 0x000a 253 252 000 Old_age Always - 0
205 Thermal_Asperity_Rate 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 242 242 000 Old_age Offline - 142
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
Next allow the zabbix user to execute smartctl with sudo without a password. Run the following command:
$ sudo visudo
and add/modify the following in the /etc/sudoers file:
#Defaults requiretty
zabbix ALL= NOPASSWD: /usr/sbin/smartctl
Then run the command manually to make sure it works:
$ sudo su - zabbix -s /bin/bash -c '/usr/local/bin/getsv /dev/sda Temperature_Celsius'
46
That looks good. Now let’ add the corresponding UserParameter scripts to the agentd configuration (I just chose two, but you can pick any attribute you desire, this site talks about what all the different values are). Edit the /etc/zabbix/zabbix_agentd.conf file and add the following to it:
$ tail -3 /etc/zabbix_agentd.conf
UserParameter=smart.temp[*],/usr/local/bin/getsv /dev/$1 Temperature_Celsius
UserParameter=smart.reallocated_sec_cnt[*],/usr/local/bin/getsv /dev/$1 Reallocated_Sector_Ct
UserParameter=custom.disks.discovery_perl2,/usr/local/bin/discover_disk.pl
I will follow my previous post (Monitor Disk IO Stats with Zabbix) to setup auto-discovery of the disks and then plot the values from the discovered values. To apply the above UserParameters, restart the zabbix agentd process and try the query the above values from the Zabbix Server. So on the client run the following:
$ sudo service zabbix-agent restart
Redirecting to /bin/systemctl restart zabbix-agent.service
Then on the Zabbix server run the following:
$ zabbix_get -s 192.168.1.102 -k smart.temp[sda]
44
If the value is returned then it’s all good. At this point you can follow my previous post to add the following to the zabbix front end:
- Regular Expression (we only want /dev/sd* or /dev/hd*, in my previous post I also grabbed sd-cards /dev/mmcblk1)
- Template
- Discovery Rule (here add the custom.disks.discovery_perl2 as the key since you can’t have duplicates)
- Item Prototypes (for both the temperature and reallocated sector count)
- Graph Prototype (only for the temperature)
- Trigger Prototype (only for the smart.reallocated_sec_cnt)
Add a Trigger Prototype to a Zabbix Discovery Rule
The only new thing that I added was the trigger prototype. The Reallocated_Sector_Ct attribute should be zero all the time unless the drive is failing. So rather than plotting a graph full of zeroes, I decided to add a trigger which will send a warning message if the value is bigger than 0. To do this from the Zabbix Front End go to “Templates” -> Select the Template you had created -> “Discovery Rules” -> Click on the Discovery Rule you had created -> “Trigger prototypes” -> “Create trigger prototype” and then configure it like so:
Now we will get notified if that value is above zero.
Install Smartmontools on Pogoplug
There is a prebuilt smartmontools package:
pogo:~# ipkg list | grep smart
smartmontools - 5.40-3 - Utility programs to control and monitor (SMART) built into most modern ATA and SCSI hard disks.
But when I tried to run it, it would seg fault. I ran into a bug report here. Following the instructions laid out in that report, I was able to compile smartmontools on pogoplug. Here are the commands I ran to accomplish that. First download the latest version:
$ cd /opt/tmp
$ wget http://downloads.sourceforge.net/..../smartmontools-6.2.tar.gz
Then I extract the source:
$ tar xvzf smartmontools-6.2.tar.gz
Next I configured the source:
$ cd smartmontools-6.2
$ ./configure CXXFLAGS='-g -O2 -fno-toplevel-reorder -Wall -W' --prefix=/opt/smart
Lastly I compiled and installed the software:
$ make
$ make install
The install went through without a hitch. Next I needed to check to make sure I could get the attributes. I had to specify the -d sat to get it to work. Here is how it looked like:
pogo:~# /opt/smart/sbin/smartctl -d sat -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [armv5tel-linux-2.6.31.8] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1431
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 41254693
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 4017
10 Spin_Retry_Count 0x0013 100 069 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1392
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 065 052 045 Old_age Always - 35 (Min/Max 35/48)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 72
193 Load_Cycle_Count 0x0032 041 041 000 Old_age Always - 118819
194 Temperature_Celsius 0x0022 035 048 000 Old_age Always - 35 (0 12 0 0 0)
195 Hardware_ECC_Recovered 0x001a 056 047 000 Old_age Always - 27286480
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
Most of the other OSes had a script to start the smartd process to automatically monitor the hard drive and log anything that fails. By default smartd logs to syslog, but I didn’t have that running on the pogoplug device. So first I configured the file to monitor only the sda device. I edited the /opt/smart/etc/smartd.conf file and added/modified the following:
#DEVICESCAN
/dev/sda -a -d sat
Now start the smartd process with the following command:
$ /opt/smart/sbin/smartd -c /opt/smart/etc/smartd.conf -d > /opt/var/log/smart.log &
If you check the log file after some time you should see the following:
pogo:~# tail /opt/var/log/smart.log
Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 38 to 37
Device: /dev/sda [SAT], opened ATA device
Device: /dev/sda [SAT], opened ATA device
Device: /dev/sda [SAT], opened ATA device
Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 64
Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 37 to 36
Device: /dev/sda [SAT], opened ATA device
Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65
Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
Device: /dev/sda [SAT], opened ATA device
Enable Smartctl Zabbix Checks on Pogoplug
I didn’t really want to install perl on the pogoplug device, so I ended up making my discovery script look like the following:
pogo:~# cat /opt/local/bin/discover_disk.pl
#!/bin/sh
echo '{"data":[,{"{#DISK}":"sda",}]}'
It will just monitor one disk, which I am okay with. There is only one slot in the pogoplug, I won’t be adding new drives to it. And here is how my getsv script looked like:
pogo:~# cat /opt/local/bin/getsv
#!/bin/sh
if [ $# -ne 2 ];
then
echo "Usage: $0 <device> <parameter>"
exit
fi
/opt/smart/sbin/smartctl -d sat -A $1 | grep $2 | tr -s ' ' | sed "s/^[[:space:]]*\(.*\)[[:space:]]*$/\1/" | cut -d " " -f 10
I didn’t change that much, just removed sudo (since I won’t be using it) and added the full path of the smartctl utility (could’ve taken care of this with symlinks). Then I added the following to /opt/zabbix/etc/zabbix_agentd.conf:
pogo:~# tail -12 /opt/zabbix/etc/zabbix_agentd.conf
UserParameter=custom.vfs.dev.read.ops[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$4}'
UserParameter=custom.vfs.dev.read.ms[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$7}'
UserParameter=custom.vfs.dev.write.ops[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$8}'
UserParameter=custom.vfs.dev.write.ms[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$11}'
UserParameter=custom.vfs.dev.io.active[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$12}'
UserParameter=custom.vfs.dev.io.ms[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$13}'
UserParameter=custom.vfs.dev.read.sectors[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$6}'
UserParameter=custom.vfs.dev.write.sectors[*],cat /proc/diskstats | grep $1 | head -1 | awk '{print $$10}'
UserParameter=custom.disks.discovery_perl,/opt/local/bin/discover_disk.pl
UserParameter=smart.temp[*],/opt/local/bin/getsv /dev/$1 Temperature_Celsius
UserParameter=smart.reallocated_sec_cnt[*],/opt/local/bin/getsv /dev/$1 Reallocated_Sector_Ct
UserParameter=custom.disks.discovery_perl2,/opt/local/bin/discover_disk.pl
And that was it, I then restart the agentd process:
pogo:~# /opt/etc/init.d/zabbix-agentd restart
Stopping zabbix agent: Success (Killed)
Starting zabbix agent: Success
and after some time, I saw the following graph in zabbix: