Skip to content

Commit 60e7ae9

Browse files
richardellingRageLtMan
authored andcommitted
Add zpool_influxdb command
A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Elling <[email protected]> Closes openzfs#10786
1 parent b957318 commit 60e7ae9

File tree

20 files changed

+3102
-2
lines changed

20 files changed

+3102
-2
lines changed

cmd/Makefile.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
SUBDIRS = zfs zpool zdb zhack zinject zstream zstreamdump ztest
22
SUBDIRS += fsck_zfs vdev_id raidz_test zfs_ids_to_path
3+
SUBDIRS += zpool_influxdb
34

45
if USING_PYTHON
56
SUBDIRS += arcstat arc_summary dbufstat

cmd/zpool_influxdb/Makefile.am

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
include $(top_srcdir)/config/Rules.am
2+
3+
bin_PROGRAMS = zpool_influxdb
4+
5+
zpool_influxdb_SOURCES = \
6+
zpool_influxdb.c
7+
8+
zpool_influxdb_LDADD = \
9+
$(top_builddir)/lib/libspl/libspl.la \
10+
$(top_builddir)/lib/libnvpair/libnvpair.la \
11+
$(top_builddir)/lib/libzfs/libzfs.la

cmd/zpool_influxdb/README.md

Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# Influxdb Metrics for ZFS Pools
2+
The _zpool_influxdb_ program produces
3+
[influxdb](https://github.com/influxdata/influxdb) line protocol
4+
compatible metrics from zpools. In the UNIX tradition, _zpool_influxdb_
5+
does one thing: read statistics from a pool and print them to
6+
stdout. In many ways, this is a metrics-friendly output of
7+
statistics normally observed via the `zpool` command.
8+
9+
## Usage
10+
When run without arguments, _zpool_influxdb_ runs once, reading data
11+
from all imported pools, and prints to stdout.
12+
```shell
13+
zpool_influxdb [options] [poolname]
14+
```
15+
If no poolname is specified, then all pools are sampled.
16+
17+
| option | short option | description |
18+
|---|---|---|
19+
| --execd | -e | For use with telegraf's `execd` plugin. When [enter] is pressed, the pools are sampled. To exit, use [ctrl+D] |
20+
| --no-histogram | -n | Do not print histogram information |
21+
| --signed-int | -i | Use signed integer data type (default=unsigned) |
22+
| --sum-histogram-buckets | -s | Sum histogram bucket values |
23+
| --tags key=value[,key=value...] | -t | Add tags to data points. No tag sanity checking is performed. |
24+
| --help | -h | Print a short usage message |
25+
26+
#### Histogram Bucket Values
27+
The histogram data collected by ZFS is stored as independent bucket values.
28+
This works well out-of-the-box with an influxdb data source and grafana's
29+
heatmap visualization. The influxdb query for a grafana heatmap
30+
visualization looks like:
31+
```
32+
field(disk_read) last() non_negative_derivative(1s)
33+
```
34+
35+
Another method for storing histogram data sums the values for lower-value
36+
buckets. For example, a latency bucket tagged "le=10" includes the values
37+
in the bucket "le=1".
38+
This method is often used for prometheus histograms.
39+
The `zpool_influxdb --sum-histogram-buckets` option presents the data from ZFS
40+
as summed values.
41+
42+
## Measurements
43+
The following measurements are collected:
44+
45+
| measurement | description | zpool equivalent |
46+
|---|---|---|
47+
| zpool_stats | general size and data | zpool list |
48+
| zpool_scan_stats | scrub, rebuild, and resilver statistics (omitted if no scan has been requested) | zpool status |
49+
| zpool_vdev_stats | per-vdev statistics | zpool iostat -q |
50+
| zpool_io_size | per-vdev I/O size histogram | zpool iostat -r |
51+
| zpool_latency | per-vdev I/O latency histogram | zpool iostat -w |
52+
| zpool_vdev_queue | per-vdev instantaneous queue depth | zpool iostat -q |
53+
54+
### zpool_stats Description
55+
zpool_stats contains top-level summary statistics for the pool.
56+
Performance counters measure the I/Os to the pool's devices.
57+
58+
#### zpool_stats Tags
59+
60+
| label | description |
61+
|---|---|
62+
| name | pool name |
63+
| path | for leaf vdevs, the pathname |
64+
| state | pool state, as shown by _zpool status_ |
65+
| vdev | vdev name (root = entire pool) |
66+
67+
#### zpool_stats Fields
68+
69+
| field | units | description |
70+
|---|---|---|
71+
| alloc | bytes | allocated space |
72+
| free | bytes | unallocated space |
73+
| size | bytes | total pool size |
74+
| read_bytes | bytes | bytes read since pool import |
75+
| read_errors | count | number of read errors |
76+
| read_ops | count | number of read operations |
77+
| write_bytes | bytes | bytes written since pool import |
78+
| write_errors | count | number of write errors |
79+
| write_ops | count | number of write operations |
80+
81+
### zpool_scan_stats Description
82+
Once a pool has been scrubbed, resilvered, or rebuilt, the zpool_scan_stats
83+
contain information about the status and performance of the operation.
84+
Otherwise, the zpool_scan_stats do not exist in the kernel, and therefore
85+
cannot be reported by this collector.
86+
87+
#### zpool_scan_stats Tags
88+
89+
| label | description |
90+
|---|---|
91+
| name | pool name |
92+
| function | name of the scan function running or recently completed |
93+
| state | scan state, as shown by _zpool status_ |
94+
95+
#### zpool_scan_stats Fields
96+
97+
| field | units | description |
98+
|---|---|---|
99+
| errors | count | number of errors encountered by scan |
100+
| examined | bytes | total data examined during scan |
101+
| to_examine | bytes | prediction of total bytes to be scanned |
102+
| pass_examined | bytes | data examined during current scan pass |
103+
| issued | bytes | size of I/Os issued to disks |
104+
| pass_issued | bytes | size of I/Os issued to disks for current pass |
105+
| processed | bytes | data reconstructed during scan |
106+
| to_process | bytes | total bytes to be repaired |
107+
| rate | bytes/sec | examination rate |
108+
| start_ts | epoch timestamp | start timestamp for scan |
109+
| pause_ts | epoch timestamp | timestamp for a scan pause request |
110+
| end_ts | epoch timestamp | completion timestamp for scan |
111+
| paused_t | seconds | elapsed time while paused |
112+
| remaining_t | seconds | estimate of time remaining for scan |
113+
114+
### zpool_vdev_stats Description
115+
The ZFS I/O (ZIO) scheduler uses five queues to schedule I/Os to each vdev.
116+
These queues are further divided into active and pending states.
117+
An I/O is pending prior to being issued to the vdev. An active
118+
I/O has been issued to the vdev. The scheduler and its tunable
119+
parameters are described at the
120+
[ZFS documentation for ZIO Scheduler]
121+
(https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/ZIO%20Scheduler.html)
122+
The ZIO scheduler reports the queue depths as gauges where the value
123+
represents an instantaneous snapshot of the queue depth at
124+
the sample time. Therefore, it is not unusual to see all zeroes
125+
for an idle pool.
126+
127+
#### zpool_vdev_stats Tags
128+
| label | description |
129+
|---|---|
130+
| name | pool name |
131+
| vdev | vdev name (root = entire pool) |
132+
133+
#### zpool_vdev_stats Fields
134+
| field | units | description |
135+
|---|---|---|
136+
| sync_r_active_queue | entries | synchronous read active queue depth |
137+
| sync_w_active_queue | entries | synchronous write active queue depth |
138+
| async_r_active_queue | entries | asynchronous read active queue depth |
139+
| async_w_active_queue | entries | asynchronous write active queue depth |
140+
| async_scrub_active_queue | entries | asynchronous scrub active queue depth |
141+
| sync_r_pend_queue | entries | synchronous read pending queue depth |
142+
| sync_w_pend_queue | entries | synchronous write pending queue depth |
143+
| async_r_pend_queue | entries | asynchronous read pending queue depth |
144+
| async_w_pend_queue | entries | asynchronous write pending queue depth |
145+
| async_scrub_pend_queue | entries | asynchronous scrub pending queue depth |
146+
147+
### zpool_latency Histogram
148+
ZFS tracks the latency of each I/O in the ZIO pipeline. This latency can
149+
be useful for observing latency-related issues that are not easily observed
150+
using the averaged latency statistics.
151+
152+
The histogram fields show cumulative values from lowest to highest.
153+
The largest bucket is tagged "le=+Inf", representing the total count
154+
of I/Os by type and vdev.
155+
156+
#### zpool_latency Histogram Tags
157+
| label | description |
158+
|---|---|
159+
| le | bucket for histogram, latency is less than or equal to bucket value in seconds |
160+
| name | pool name |
161+
| path | for leaf vdevs, the device path name, otherwise omitted |
162+
| vdev | vdev name (root = entire pool) |
163+
164+
#### zpool_latency Histogram Fields
165+
| field | units | description |
166+
|---|---|---|
167+
| total_read | operations | read operations of all types |
168+
| total_write | operations | write operations of all types |
169+
| disk_read | operations | disk read operations |
170+
| disk_write | operations | disk write operations |
171+
| sync_read | operations | ZIO sync reads |
172+
| sync_write | operations | ZIO sync writes |
173+
| async_read | operations | ZIO async reads|
174+
| async_write | operations | ZIO async writes |
175+
| scrub | operations | ZIO scrub/scan reads |
176+
| trim | operations | ZIO trim (aka unmap) writes |
177+
178+
### zpool_io_size Histogram
179+
ZFS tracks I/O throughout the ZIO pipeline. The size of each I/O is used
180+
to create a histogram of the size by I/O type and vdev. For example, a
181+
4KiB write to mirrored pool will show a 4KiB write to the top-level vdev
182+
(root) and a 4KiB write to each of the mirror leaf vdevs.
183+
184+
The ZIO pipeline can aggregate I/O operations. For example, a contiguous
185+
series of writes can be aggregated into a single, larger I/O to the leaf
186+
vdev. The independent I/O operations reflect the logical operations and
187+
the aggregated I/O operations reflect the physical operations.
188+
189+
The histogram fields show cumulative values from lowest to highest.
190+
The largest bucket is tagged "le=+Inf", representing the total count
191+
of I/Os by type and vdev.
192+
193+
Note: trim I/Os can be larger than 16MiB, but the larger sizes are
194+
accounted in the 16MiB bucket.
195+
196+
#### zpool_io_size Histogram Tags
197+
| label | description |
198+
|---|---|
199+
| le | bucket for histogram, I/O size is less than or equal to bucket value in bytes |
200+
| name | pool name |
201+
| path | for leaf vdevs, the device path name, otherwise omitted |
202+
| vdev | vdev name (root = entire pool) |
203+
204+
#### zpool_io_size Histogram Fields
205+
| field | units | description |
206+
|---|---|---|
207+
| sync_read_ind | blocks | independent sync reads |
208+
| sync_write_ind | blocks | independent sync writes |
209+
| async_read_ind | blocks | independent async reads |
210+
| async_write_ind | blocks | independent async writes |
211+
| scrub_read_ind | blocks | independent scrub/scan reads |
212+
| trim_write_ind | blocks | independent trim (aka unmap) writes |
213+
| sync_read_agg | blocks | aggregated sync reads |
214+
| sync_write_agg | blocks | aggregated sync writes |
215+
| async_read_agg | blocks | aggregated async reads |
216+
| async_write_agg | blocks | aggregated async writes |
217+
| scrub_read_agg | blocks | aggregated scrub/scan reads |
218+
| trim_write_agg | blocks | aggregated trim (aka unmap) writes |
219+
220+
#### About unsigned integers
221+
Telegraf v1.6.2 and later support unsigned 64-bit integers which more
222+
closely matches the uint64_t values used by ZFS. By default, zpool_influxdb
223+
uses ZFS' uint64_t values and influxdb line protocol unsigned integer type.
224+
If you are using old telegraf or influxdb where unsigned integers are not
225+
available, use the `--signed-int` option.
226+
227+
## Using _zpool_influxdb_
228+
229+
The simplest method is to use the execd input agent in telegraf. For older
230+
versions of telegraf which lack execd, the exec input agent can be used.
231+
For convenience, one of the sample config files below can be placed in the
232+
telegraf config-directory (often /etc/telegraf/telegraf.d). Telegraf can
233+
be restarted to read the config-directory files.
234+
235+
### Example telegraf execd configuration
236+
```toml
237+
# # Read metrics from zpool_influxdb
238+
[[inputs.execd]]
239+
# ## default installation location for zpool_influxdb command
240+
command = ["/usr/bin/zpool_influxdb", "--execd"]
241+
242+
## Define how the process is signaled on each collection interval.
243+
## Valid values are:
244+
## "none" : Do not signal anything. (Recommended for service inputs)
245+
## The process must output metrics by itself.
246+
## "STDIN" : Send a newline on STDIN. (Recommended for gather inputs)
247+
## "SIGHUP" : Send a HUP signal. Not available on Windows. (not recommended)
248+
## "SIGUSR1" : Send a USR1 signal. Not available on Windows.
249+
## "SIGUSR2" : Send a USR2 signal. Not available on Windows.
250+
signal = "STDIN"
251+
252+
## Delay before the process is restarted after an unexpected termination
253+
restart_delay = "10s"
254+
255+
## Data format to consume.
256+
## Each data format has its own unique set of configuration options, read
257+
## more about them here:
258+
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
259+
data_format = "influx"
260+
```
261+
262+
### Example telegraf exec configuration
263+
```toml
264+
# # Read metrics from zpool_influxdb
265+
[[inputs.exec]]
266+
# ## default installation location for zpool_influxdb command
267+
commands = ["/usr/bin/zpool_influxdb"]
268+
data_format = "influx"
269+
```
270+
271+
## Caveat Emptor
272+
* Like the _zpool_ command, _zpool_influxdb_ takes a reader
273+
lock on spa_config for each imported pool. If this lock blocks,
274+
then the command will also block indefinitely and might be
275+
unkillable. This is not a normal condition, but can occur if
276+
there are bugs in the kernel modules.
277+
For this reason, care should be taken:
278+
* avoid spawning many of these commands hoping that one might
279+
finish
280+
* avoid frequent updates or short sample time
281+
intervals, because the locks can interfere with the performance
282+
of other instances of _zpool_ or _zpool_influxdb_
283+
284+
## Other collectors
285+
There are a few other collectors for zpool statistics roaming around
286+
the Internet. Many attempt to screen-scrape `zpool` output in various
287+
ways. The screen-scrape method works poorly for `zpool` output because
288+
of its human-friendly nature. Also, they suffer from the same caveats
289+
as this implementation. This implementation is optimized for directly
290+
collecting the metrics and is much more efficient than the screen-scrapers.
291+
292+
## Feedback Encouraged
293+
Pull requests and issues are greatly appreciated at
294+
https://github.com/openzfs/zfs
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
### Dashboards for zpool_influxdb
2+
This directory contains a collection of dashboards related to ZFS with data
3+
collected from the zpool_influxdb collector.

0 commit comments

Comments
 (0)