Skip to content

libspl: nice assertion output #16140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

robn
Copy link
Member

@robn robn commented Apr 28, 2024

Motivation and Context

The output when a userspace SPL assertion fires is pretty anemic, which makes it harder than it needs to be to understand where to look when zdb or ztest blows up. I finally got annoyed enough to do something about it.

Description

See commit messages for detail. In short:

  • set thread names on userspace threads so they're visible in the process table
  • in assertion output, show process and thread id and name
  • when possible, also show stack backtrace in assertions
  • add a lock in the assert function, so multiple simultaneous assertions do not trample output

thread names

Just make its easier to see everything:

pstree:

$ pstree -tp $(pgrep zdb)
zdb(1233469)─┬─{arc_evict}(1233552)
             ├─{arc_prune}(1233551)
             ├─{arc_reap}(1233553)
             ├─{dbu_evict}(1233554)
             ├─{delay_taskq}(1233547)
             ├─{delay_taskq}(1233548)
             ├─{delay_taskq}(1233549)
             ├─{delay_taskq}(1233550)
             ├─{dp_sync_taskq}(1233649)
             ├─{dp_sync_taskq}(1233650)
...

top
Screenshot_2024-04-28_17-27-41

htop
Screenshot_2024-04-28_17-28-04

assertion output

Neatly presents the location and content of the assertion, process & thread name & id and, where possible, stack backtrace. Below shows examples on various platforms (with ASSERT0("oh no") added to top of arc_evict_cb_check(), a convenient thread that starts early):

Typical output on Linux + glibc (using glibc's backtrace()):

$ ./zdb -b lucy
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x7f2ed57f9a30 == 0)
  PID: 1129969   COMM: zdb
  TID: 1130052   NAME: arc_evict
Call trace:
  /home/robn/code/zfs/.libs/libzpool.so.5(+0x2c4ec8) [0x7f2ed57edec8]
  /home/robn/code/zfs/.libs/libzpool.so.5(libspl_assertf+0x130) [0x7f2ed57ee090]
  /home/robn/code/zfs/.libs/libzpool.so.5(+0x844cc) [0x7f2ed55ad4cc]
  /home/robn/code/zfs/.libs/libzpool.so.5(+0x235d3f) [0x7f2ed575ed3f]
  /home/robn/code/zfs/.libs/libzpool.so.5(+0x57ca8) [0x7f2ed5580ca8]
  /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7f2ed4f17134]
  /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7f2ed4f977dc]
Aborted

glibc's unwinder is not always the most accurate, and often can't resolve some addresses. So if libunwind/libunwind is detected, it will be used instead (disabled with --without-libunwind). Using libunwind 1.6.2 (Debian 12), it looks like:

$ ./zdb -b lucy
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x7f15753f3a30 == 0)
  PID: 1177702   COMM: zdb
  TID: 1177785   NAME: arc_evict
Call trace:
  [0x7f15753e80f0] libspl_assertf+0x130
  [0x7f15751a750c] arc_evict_cb_check+0x3c
  [0x7f1575358d7f] zthr_procedure+0x9f
  [0x7f157517ace8] zk_thread_wrapper+0x18
  [0x7f1574b11134] pthread_condattr_setpshared+0x4d4
  [0x7f1574b917dc] __xmknodat+0x23c
Aborted

Newer libunwind can also inspect the ELF object the symbol lives in. With 1.8.2, it looks like:

$ ./zdb -b lucy
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x7f3ed7d70a30 == 0)
  PID: 1223738   COMM: zdb
  TID: 1223821   NAME: arc_evict
Call trace:
  [0x7f3ed7d65120] libspl_assertf+0x130 (in /home/robn/code/zfs/.libs/libzpool.so.5.0.0 +0x2c5120)
  [0x7f3ed7b2450c] arc_evict_cb_check+0x3c (in /home/robn/code/zfs/.libs/libzpool.so.5.0.0 +0x8450c)
  [0x7f3ed7cd5d7f] zthr_procedure+0x9f (in /home/robn/code/zfs/.libs/libzpool.so.5.0.0 +0x235d7f)
  [0x7f3ed7af7ce8] zk_thread_wrapper+0x18 (in /home/robn/code/zfs/.libs/libzpool.so.5.0.0 +0x57ce8)
  [0x7f3ed748e134] start_thread+0x2c4 (in /usr/lib/x86_64-linux-gnu/libc.so.6 +0x89134)
  [0x7f3ed750e7dc] __clone3+0x2c (in /usr/lib/x86_64-linux-gnu/libc.so.6 +0x1097dc)
Aborted

If neither backtrace() nor libunwind is available, the output will just omit the call trace. On Void 20240314 using musl (which doesn't have backtrace()), it's:

$ ./zdb -b tank
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x7fe0e9910112 == 0)
  PID: 12837     COMM: zdb
  TID: 12919     NAME: arc_evict
Aborted

Meanwhile, FreeBSD 14 does have backtrace(), but in libexecinfo instead (a port of this is sometimes used with musl). We find that and use it just fine:

$ ./zdb -b garden
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x82183d3b2 == 0)
  PID: 96765     COMM: zdb
  TID: 141497    NAME: arc_evict
Call trace:
  0x821b7f238 <libspl_assertf+0x148> at /home/robn/zfs/.libs/libzpool.so.5
  0x821b7f208 <libspl_assertf+0x118> at /home/robn/zfs/.libs/libzpool.so.5
  0x821925a43 <arc_init+0x1b43> at /home/robn/zfs/.libs/libzpool.so.5
  0x821ac5776 <zthr_create_timer+0x146> at /home/robn/zfs/.libs/libzpool.so.5
  0x8218e5d99 <zk_thread_create+0x2d9> at /home/robn/zfs/.libs/libzpool.so.5
Abort trap (core dumped)

Of course, libunwind is nice there. With devel/unwind 20240211:

$ ./zdb -b garden
ASSERT at module/zfs/arc.c:4483:arc_evict_cb_check()
"oh no" == 0 (0x8236844d3 == 0)
  PID: 26150     COMM: zdb
  TID: 142192    NAME: arc_evict
Call trace:
  [0x8239c6308] libspl_assertf+0x118 (in /usr/home/robn/zfs/.libs/libzpool.so.5.0.0 +0x3ce308)
  [0x82376cb43] arc_evict_cb_check+0x33 (in /usr/home/robn/zfs/.libs/libzpool.so.5.0.0 +0x174b43)
  [0x82390c876] zthr_procedure+0x56 (in /usr/home/robn/zfs/.libs/libzpool.so.5.0.0 +0x314876)
  [0x82372ce99] zk_thread_wrapper+0x19 (in /usr/home/robn/zfs/.libs/libzpool.so.5.0.0 +0x134e99)
  [0x82abbca75] +0x19 (in /lib/libthr.so.3 +0xfa75)
Abort trap (core dumped)

How Has This Been Tested?

See above. At least, compile checked against recent glibc, musl, FreeBSD, with multiple versions of libunwind. And of course, this is assertion response, so if nothing blows up then nothing sees it.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn robn force-pushed the libspl-nice-asserts branch 2 times, most recently from 197165e to 8be04d4 Compare April 28, 2024 10:03
@robn
Copy link
Member Author

robn commented Apr 28, 2024

Looking into the FreeBSD 13 build failure at the moment. Possibly a bug in config/find_system_library.m4 💀

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. It'll be nice to have more useful debug information along with the ASSERT.

It looks like the build failure on FreeBSD just needs to be sorted out.

@robn robn force-pushed the libspl-nice-asserts branch from 8be04d4 to f9cba1b Compare April 30, 2024 00:54
robn added 6 commits April 30, 2024 12:48
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
@robn robn force-pushed the libspl-nice-asserts branch from f9cba1b to 4705367 Compare April 30, 2024 02:53
@robn
Copy link
Member Author

robn commented Apr 30, 2024

Last push should take care of it, lets see.

The problem was that FreeBSD has libunwind.h in /usr/include, as part of its standard compiler toolchain (some LLVM/Clang thing), but doesn't have a corresponding -lunwind. find_system_library.m4 wasn't cleaning up LIBUNWIND_LIBS properly after this half-discovery, and so was later failing trying to link a nonexistent -lunwind.

@behlendorf 28b110d fixes this. The commit message explains more. I'm 50/50 on a separate PR; it's not really related to this PR, and it could have a wider impact, but it doesn't come up in any normal build that I can imagine, so I'm not sure if more testing/review is really likely/possible. If you'd prefer separate, let me know!

@behlendorf
Copy link
Contributor

Thanks for running that down. Let's go ahead and include that change as-is with this PR, with can always follow up with additional cleanup/improvements in another PR.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label May 1, 2024
@behlendorf behlendorf closed this in 7ac00d3 May 1, 2024
behlendorf pushed a commit that referenced this pull request May 1, 2024
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
behlendorf pushed a commit that referenced this pull request May 1, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
behlendorf pushed a commit that referenced this pull request May 1, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
behlendorf pushed a commit that referenced this pull request May 1, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
behlendorf pushed a commit that referenced this pull request May 1, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 7ac00d3)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 4429ad9)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 3948002)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit dec697a)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 2152c40)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 051460b)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 7ac00d3)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 4429ad9)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 3948002)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit dec697a)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 2152c40)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 051460b)
@robn robn mentioned this pull request Jul 17, 2024
13 tasks
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 7ac00d3)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 4429ad9)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 3948002)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit dec697a)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 2152c40)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
(cherry picked from commit 051460b)
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants