A couple of months ago, myself and a friend decided to resurrect our old mining hardware as cryptocurrency prices were getting higher and higher.
After the initial setup, some hardware upgrades (new GPUs mostly), etc., I decided it was time I used my Zabbix server to monitor the miners. Unfortunately, I found only one Zabbix template for monitoring Nvidia GPUs and it works with one GPU only. Therefore, I used it conceptually to create a new template, featuring:
- low-level discovery of all the graphics cards
- item prototypes for:
- fan speed
- total, free and used memory
- power draw in decaWatts (tens of Watts, so that it can fit in the graphs nicely)
- temperature
- utilization
- a graph prototype having the fan speed, power draw and temperature in one graph
- trigger prototypes set at different GPU temperatures
- a BASH script for the low-level discovery
Events caused by the triggers:
And the graphs:
Here's the template at github: https://github.com/plambe/zabbix-nvidia-smi-multi-gpu
Enjoy!