|
| 1 | +# Influxdb Metrics for ZFS Pools |
| 2 | +The _zpool_influxdb_ program produces |
| 3 | +[influxdb](https://github.com/influxdata/influxdb) line protocol |
| 4 | +compatible metrics from zpools. In the UNIX tradition, _zpool_influxdb_ |
| 5 | +does one thing: read statistics from a pool and print them to |
| 6 | +stdout. In many ways, this is a metrics-friendly output of |
| 7 | +statistics normally observed via the `zpool` command. |
| 8 | + |
| 9 | +## Usage |
| 10 | +When run without arguments, _zpool_influxdb_ runs once, reading data |
| 11 | +from all imported pools, and prints to stdout. |
| 12 | +```shell |
| 13 | +zpool_influxdb [options] [poolname] |
| 14 | +``` |
| 15 | +If no poolname is specified, then all pools are sampled. |
| 16 | + |
| 17 | +| option | short option | description | |
| 18 | +|---|---|---| |
| 19 | +| --execd | -e | For use with telegraf's `execd` plugin. When [enter] is pressed, the pools are sampled. To exit, use [ctrl+D] | |
| 20 | +| --no-histogram | -n | Do not print histogram information | |
| 21 | +| --signed-int | -i | Use signed integer data type (default=unsigned) | |
| 22 | +| --sum-histogram-buckets | -s | Sum histogram bucket values | |
| 23 | +| --tags key=value[,key=value...] | -t | Add tags to data points. No tag sanity checking is performed. | |
| 24 | +| --help | -h | Print a short usage message | |
| 25 | + |
| 26 | +#### Histogram Bucket Values |
| 27 | +The histogram data collected by ZFS is stored as independent bucket values. |
| 28 | +This works well out-of-the-box with an influxdb data source and grafana's |
| 29 | +heatmap visualization. The influxdb query for a grafana heatmap |
| 30 | +visualization looks like: |
| 31 | +``` |
| 32 | +field(disk_read) last() non_negative_derivative(1s) |
| 33 | +``` |
| 34 | + |
| 35 | +Another method for storing histogram data sums the values for lower-value |
| 36 | +buckets. For example, a latency bucket tagged "le=10" includes the values |
| 37 | +in the bucket "le=1". |
| 38 | +This method is often used for prometheus histograms. |
| 39 | +The `zpool_influxdb --sum-histogram-buckets` option presents the data from ZFS |
| 40 | +as summed values. |
| 41 | + |
| 42 | +## Measurements |
| 43 | +The following measurements are collected: |
| 44 | + |
| 45 | +| measurement | description | zpool equivalent | |
| 46 | +|---|---|---| |
| 47 | +| zpool_stats | general size and data | zpool list | |
| 48 | +| zpool_scan_stats | scrub, rebuild, and resilver statistics (omitted if no scan has been requested) | zpool status | |
| 49 | +| zpool_vdev_stats | per-vdev statistics | zpool iostat -q | |
| 50 | +| zpool_io_size | per-vdev I/O size histogram | zpool iostat -r | |
| 51 | +| zpool_latency | per-vdev I/O latency histogram | zpool iostat -w | |
| 52 | +| zpool_vdev_queue | per-vdev instantaneous queue depth | zpool iostat -q | |
| 53 | + |
| 54 | +### zpool_stats Description |
| 55 | +zpool_stats contains top-level summary statistics for the pool. |
| 56 | +Performance counters measure the I/Os to the pool's devices. |
| 57 | + |
| 58 | +#### zpool_stats Tags |
| 59 | + |
| 60 | +| label | description | |
| 61 | +|---|---| |
| 62 | +| name | pool name | |
| 63 | +| path | for leaf vdevs, the pathname | |
| 64 | +| state | pool state, as shown by _zpool status_ | |
| 65 | +| vdev | vdev name (root = entire pool) | |
| 66 | + |
| 67 | +#### zpool_stats Fields |
| 68 | + |
| 69 | +| field | units | description | |
| 70 | +|---|---|---| |
| 71 | +| alloc | bytes | allocated space | |
| 72 | +| free | bytes | unallocated space | |
| 73 | +| size | bytes | total pool size | |
| 74 | +| read_bytes | bytes | bytes read since pool import | |
| 75 | +| read_errors | count | number of read errors | |
| 76 | +| read_ops | count | number of read operations | |
| 77 | +| write_bytes | bytes | bytes written since pool import | |
| 78 | +| write_errors | count | number of write errors | |
| 79 | +| write_ops | count | number of write operations | |
| 80 | + |
| 81 | +### zpool_scan_stats Description |
| 82 | +Once a pool has been scrubbed, resilvered, or rebuilt, the zpool_scan_stats |
| 83 | +contain information about the status and performance of the operation. |
| 84 | +Otherwise, the zpool_scan_stats do not exist in the kernel, and therefore |
| 85 | +cannot be reported by this collector. |
| 86 | + |
| 87 | +#### zpool_scan_stats Tags |
| 88 | + |
| 89 | +| label | description | |
| 90 | +|---|---| |
| 91 | +| name | pool name | |
| 92 | +| function | name of the scan function running or recently completed | |
| 93 | +| state | scan state, as shown by _zpool status_ | |
| 94 | + |
| 95 | +#### zpool_scan_stats Fields |
| 96 | + |
| 97 | +| field | units | description | |
| 98 | +|---|---|---| |
| 99 | +| errors | count | number of errors encountered by scan | |
| 100 | +| examined | bytes | total data examined during scan | |
| 101 | +| to_examine | bytes | prediction of total bytes to be scanned | |
| 102 | +| pass_examined | bytes | data examined during current scan pass | |
| 103 | +| issued | bytes | size of I/Os issued to disks | |
| 104 | +| pass_issued | bytes | size of I/Os issued to disks for current pass | |
| 105 | +| processed | bytes | data reconstructed during scan | |
| 106 | +| to_process | bytes | total bytes to be repaired | |
| 107 | +| rate | bytes/sec | examination rate | |
| 108 | +| start_ts | epoch timestamp | start timestamp for scan | |
| 109 | +| pause_ts | epoch timestamp | timestamp for a scan pause request | |
| 110 | +| end_ts | epoch timestamp | completion timestamp for scan | |
| 111 | +| paused_t | seconds | elapsed time while paused | |
| 112 | +| remaining_t | seconds | estimate of time remaining for scan | |
| 113 | + |
| 114 | +### zpool_vdev_stats Description |
| 115 | +The ZFS I/O (ZIO) scheduler uses five queues to schedule I/Os to each vdev. |
| 116 | +These queues are further divided into active and pending states. |
| 117 | +An I/O is pending prior to being issued to the vdev. An active |
| 118 | +I/O has been issued to the vdev. The scheduler and its tunable |
| 119 | +parameters are described at the |
| 120 | +[ZFS documentation for ZIO Scheduler] |
| 121 | +(https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/ZIO%20Scheduler.html) |
| 122 | +The ZIO scheduler reports the queue depths as gauges where the value |
| 123 | +represents an instantaneous snapshot of the queue depth at |
| 124 | +the sample time. Therefore, it is not unusual to see all zeroes |
| 125 | +for an idle pool. |
| 126 | + |
| 127 | +#### zpool_vdev_stats Tags |
| 128 | +| label | description | |
| 129 | +|---|---| |
| 130 | +| name | pool name | |
| 131 | +| vdev | vdev name (root = entire pool) | |
| 132 | + |
| 133 | +#### zpool_vdev_stats Fields |
| 134 | +| field | units | description | |
| 135 | +|---|---|---| |
| 136 | +| sync_r_active_queue | entries | synchronous read active queue depth | |
| 137 | +| sync_w_active_queue | entries | synchronous write active queue depth | |
| 138 | +| async_r_active_queue | entries | asynchronous read active queue depth | |
| 139 | +| async_w_active_queue | entries | asynchronous write active queue depth | |
| 140 | +| async_scrub_active_queue | entries | asynchronous scrub active queue depth | |
| 141 | +| sync_r_pend_queue | entries | synchronous read pending queue depth | |
| 142 | +| sync_w_pend_queue | entries | synchronous write pending queue depth | |
| 143 | +| async_r_pend_queue | entries | asynchronous read pending queue depth | |
| 144 | +| async_w_pend_queue | entries | asynchronous write pending queue depth | |
| 145 | +| async_scrub_pend_queue | entries | asynchronous scrub pending queue depth | |
| 146 | + |
| 147 | +### zpool_latency Histogram |
| 148 | +ZFS tracks the latency of each I/O in the ZIO pipeline. This latency can |
| 149 | +be useful for observing latency-related issues that are not easily observed |
| 150 | +using the averaged latency statistics. |
| 151 | + |
| 152 | +The histogram fields show cumulative values from lowest to highest. |
| 153 | +The largest bucket is tagged "le=+Inf", representing the total count |
| 154 | +of I/Os by type and vdev. |
| 155 | + |
| 156 | +#### zpool_latency Histogram Tags |
| 157 | +| label | description | |
| 158 | +|---|---| |
| 159 | +| le | bucket for histogram, latency is less than or equal to bucket value in seconds | |
| 160 | +| name | pool name | |
| 161 | +| path | for leaf vdevs, the device path name, otherwise omitted | |
| 162 | +| vdev | vdev name (root = entire pool) | |
| 163 | + |
| 164 | +#### zpool_latency Histogram Fields |
| 165 | +| field | units | description | |
| 166 | +|---|---|---| |
| 167 | +| total_read | operations | read operations of all types | |
| 168 | +| total_write | operations | write operations of all types | |
| 169 | +| disk_read | operations | disk read operations | |
| 170 | +| disk_write | operations | disk write operations | |
| 171 | +| sync_read | operations | ZIO sync reads | |
| 172 | +| sync_write | operations | ZIO sync writes | |
| 173 | +| async_read | operations | ZIO async reads| |
| 174 | +| async_write | operations | ZIO async writes | |
| 175 | +| scrub | operations | ZIO scrub/scan reads | |
| 176 | +| trim | operations | ZIO trim (aka unmap) writes | |
| 177 | + |
| 178 | +### zpool_io_size Histogram |
| 179 | +ZFS tracks I/O throughout the ZIO pipeline. The size of each I/O is used |
| 180 | +to create a histogram of the size by I/O type and vdev. For example, a |
| 181 | +4KiB write to mirrored pool will show a 4KiB write to the top-level vdev |
| 182 | +(root) and a 4KiB write to each of the mirror leaf vdevs. |
| 183 | + |
| 184 | +The ZIO pipeline can aggregate I/O operations. For example, a contiguous |
| 185 | +series of writes can be aggregated into a single, larger I/O to the leaf |
| 186 | +vdev. The independent I/O operations reflect the logical operations and |
| 187 | +the aggregated I/O operations reflect the physical operations. |
| 188 | + |
| 189 | +The histogram fields show cumulative values from lowest to highest. |
| 190 | +The largest bucket is tagged "le=+Inf", representing the total count |
| 191 | +of I/Os by type and vdev. |
| 192 | + |
| 193 | +Note: trim I/Os can be larger than 16MiB, but the larger sizes are |
| 194 | +accounted in the 16MiB bucket. |
| 195 | + |
| 196 | +#### zpool_io_size Histogram Tags |
| 197 | +| label | description | |
| 198 | +|---|---| |
| 199 | +| le | bucket for histogram, I/O size is less than or equal to bucket value in bytes | |
| 200 | +| name | pool name | |
| 201 | +| path | for leaf vdevs, the device path name, otherwise omitted | |
| 202 | +| vdev | vdev name (root = entire pool) | |
| 203 | + |
| 204 | +#### zpool_io_size Histogram Fields |
| 205 | +| field | units | description | |
| 206 | +|---|---|---| |
| 207 | +| sync_read_ind | blocks | independent sync reads | |
| 208 | +| sync_write_ind | blocks | independent sync writes | |
| 209 | +| async_read_ind | blocks | independent async reads | |
| 210 | +| async_write_ind | blocks | independent async writes | |
| 211 | +| scrub_read_ind | blocks | independent scrub/scan reads | |
| 212 | +| trim_write_ind | blocks | independent trim (aka unmap) writes | |
| 213 | +| sync_read_agg | blocks | aggregated sync reads | |
| 214 | +| sync_write_agg | blocks | aggregated sync writes | |
| 215 | +| async_read_agg | blocks | aggregated async reads | |
| 216 | +| async_write_agg | blocks | aggregated async writes | |
| 217 | +| scrub_read_agg | blocks | aggregated scrub/scan reads | |
| 218 | +| trim_write_agg | blocks | aggregated trim (aka unmap) writes | |
| 219 | + |
| 220 | +#### About unsigned integers |
| 221 | +Telegraf v1.6.2 and later support unsigned 64-bit integers which more |
| 222 | +closely matches the uint64_t values used by ZFS. By default, zpool_influxdb |
| 223 | +uses ZFS' uint64_t values and influxdb line protocol unsigned integer type. |
| 224 | +If you are using old telegraf or influxdb where unsigned integers are not |
| 225 | +available, use the `--signed-int` option. |
| 226 | + |
| 227 | +## Using _zpool_influxdb_ |
| 228 | + |
| 229 | +The simplest method is to use the execd input agent in telegraf. For older |
| 230 | +versions of telegraf which lack execd, the exec input agent can be used. |
| 231 | +For convenience, one of the sample config files below can be placed in the |
| 232 | +telegraf config-directory (often /etc/telegraf/telegraf.d). Telegraf can |
| 233 | +be restarted to read the config-directory files. |
| 234 | + |
| 235 | +### Example telegraf execd configuration |
| 236 | +```toml |
| 237 | +# # Read metrics from zpool_influxdb |
| 238 | +[[inputs.execd]] |
| 239 | +# ## default installation location for zpool_influxdb command |
| 240 | + command = ["/usr/bin/zpool_influxdb", "--execd"] |
| 241 | + |
| 242 | + ## Define how the process is signaled on each collection interval. |
| 243 | + ## Valid values are: |
| 244 | + ## "none" : Do not signal anything. (Recommended for service inputs) |
| 245 | + ## The process must output metrics by itself. |
| 246 | + ## "STDIN" : Send a newline on STDIN. (Recommended for gather inputs) |
| 247 | + ## "SIGHUP" : Send a HUP signal. Not available on Windows. (not recommended) |
| 248 | + ## "SIGUSR1" : Send a USR1 signal. Not available on Windows. |
| 249 | + ## "SIGUSR2" : Send a USR2 signal. Not available on Windows. |
| 250 | + signal = "STDIN" |
| 251 | + |
| 252 | + ## Delay before the process is restarted after an unexpected termination |
| 253 | + restart_delay = "10s" |
| 254 | + |
| 255 | + ## Data format to consume. |
| 256 | + ## Each data format has its own unique set of configuration options, read |
| 257 | + ## more about them here: |
| 258 | + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md |
| 259 | + data_format = "influx" |
| 260 | +``` |
| 261 | + |
| 262 | +### Example telegraf exec configuration |
| 263 | +```toml |
| 264 | +# # Read metrics from zpool_influxdb |
| 265 | +[[inputs.exec]] |
| 266 | +# ## default installation location for zpool_influxdb command |
| 267 | + commands = ["/usr/bin/zpool_influxdb"] |
| 268 | + data_format = "influx" |
| 269 | +``` |
| 270 | + |
| 271 | +## Caveat Emptor |
| 272 | +* Like the _zpool_ command, _zpool_influxdb_ takes a reader |
| 273 | + lock on spa_config for each imported pool. If this lock blocks, |
| 274 | + then the command will also block indefinitely and might be |
| 275 | + unkillable. This is not a normal condition, but can occur if |
| 276 | + there are bugs in the kernel modules. |
| 277 | + For this reason, care should be taken: |
| 278 | + * avoid spawning many of these commands hoping that one might |
| 279 | + finish |
| 280 | + * avoid frequent updates or short sample time |
| 281 | + intervals, because the locks can interfere with the performance |
| 282 | + of other instances of _zpool_ or _zpool_influxdb_ |
| 283 | + |
| 284 | +## Other collectors |
| 285 | +There are a few other collectors for zpool statistics roaming around |
| 286 | +the Internet. Many attempt to screen-scrape `zpool` output in various |
| 287 | +ways. The screen-scrape method works poorly for `zpool` output because |
| 288 | +of its human-friendly nature. Also, they suffer from the same caveats |
| 289 | +as this implementation. This implementation is optimized for directly |
| 290 | +collecting the metrics and is much more efficient than the screen-scrapers. |
| 291 | + |
| 292 | +## Feedback Encouraged |
| 293 | +Pull requests and issues are greatly appreciated at |
| 294 | +https://github.com/openzfs/zfs |
0 commit comments