forked from ganglia/gmond_python_modules
- Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
63 lines (55 loc) · 1.62 KB
/
README
File metadata and controls
63 lines (55 loc) · 1.62 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
NVIDIA GPU monitoring plugin for gmond
======================================
Installation instructions:
* First install the Python Bindings for the NVIDIA Management Library:
$ cd nvidia-ml-py-*
$ sudo python setup.py install
For the latest bindings see: http://pypi.python.org/pypi/nvidia-ml-py/
You can do a site install or place it in {libdir}/ganglia/python_modules
* Copy python_modules/nvidia.py to {libdir}/ganglia/python_modules
* Copy conf.d/nvidia.pyconf to /etc/ganglia/conf.d
* Copy graph.d/* to {ganglia_webroot}/graph.d/
* A demo of what the GPU graphs look like is available here:
http://ganglia.ddbj.nig.ac.jp/?c=research+month+gpu+queue&h=t135i&m=load_one&r=hour&s=by+name&hc=4&mc=2
By default all metrics that the management library could detect for your GPU
are collected. For more information on what metrics are supported on what
models, please refer to NVML documentation.
The following metrics have been implemented:
* gpu_num
* gpu_driver
* gpu_type
* gpu_uuid
* gpu_pci_id
* gpu_mem_total
* gpu_graphics_speed
* gpu_sm_speed
* gpu_mem_speed
* gpu_max_graphics_speed
* gpu_max_sm_speed
* gpu_max_mem_speed
* gpu_temp
* gpu_util
* gpu_mem_util
* gpu_mem_used
* gpu_fan
* gpu_power_usage
* gpu_perf_state
* gpu_ecc_mode
Version 2:
The following metrics have been implemented:
* gpu_max_graphics_speed
* gpu_max_sm_speed
* gpu_max_mem_speed
* gpu_serial
* gpu_power_man_mode
* gpu_power_man_limit
Version 3:
The following metrics have been implemented:
* gpu_ecc_db_error
* gpu_ecc_sb_error’
* gpu_power_violation_report
* gpu_encoder_util
* gpu_decoder_util
* gpu_bar1_memory
* gpu_shutdown_temp
* gpu_slowdown_temp