-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
libexpr: Reduce the size of Value down to 16 bytes (on 64 bit systems) #13407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff. Have only went over briefly. But I am interested in a review from @NaN-git
The following commits will touch this file significantly, so it's better to get the formatting out of the way first.
@@ -50,11 +50,11 @@ void printAmbiguous( | |||
break; | |||
} | |||
case nList: | |||
if (seen && v.listSize() && !seen->insert(v.listElems()).second) | |||
if (seen && v.listSize() && !seen->insert(&v /* FIXME: What to do, pointer to elements is not stable. */).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previosly this code used a pointer to the listElements (either stored inline or in a big list on the heap). Since now inline values are tagged we have to copy them to the listView
, so the pointer to elements would point to the stack.
For now I've changed this to a Value *
instead (that is actually the same pointer, since the smallList is the first element in a Value structure). This should work fine, but maybe I'm missing something. @roberth do you think this is ok to do?
If there are no objections I'll remove the FIXME.
…alueStorage` This factors out most of the value representation into a mixin class. `finishValue` is now gone for good and replaced with a simple template function `setStorage` which derives the type information/disriminator from the type of the argument. Likewise, reading of the value goes through function template `getStorage`. An empty type `Null` is introduced to make the bijection InternalType <-> C++ type complete.
This shaves off a very significand amount of memory used for evaluation as well as reduces the GC-managed heap. Previously the union discriminator (InternalType) was stored as a separate field in the Value, which takes up whole 8 bytes due to padding needed for member alignment. This effectively wasted 7 whole bytes of memory. Instead of doing that InternalType is instead packed into pointer alignment niches. As it turns out, there's more than enough unused bits there for the bit packing to be effective. See the doxygen comment in the ValueStorage specialization for more details. This does not add any performance overhead, even though we now consistently assert the InternalType in all getters. This can also be made atomic with a double width compare-and-swap instruction on x86_64 (CMPXCHG16B instruction) for parallel evaluation.
Also fixed |
Motivation
This shaves off a very significant (~20% percent reduction in maximum heap size and ~17% in total bytes) amount of memory used for evaluation as well as reduces the GC-managed heap. See the results below. The fraction of collected memory is not affected.
Memory usage comparison
Below are the comparisons for
NIX_SHOW_STATS=1 nix-env --query --available --out-path --file ../nixpkgs --eval-system x86_64-linux > /dev/null
Before
After
So the saving are around
2.45
GiB on a full nixpkgs eval.Previously the union discriminator (
InternalType
) wasstored as a separate field in the Value, which takes up
whole 8 bytes due to padding needed for member alignment.
This effectively wasted 7 whole bytes of memory. Instead
of doing that
InternalType
is instead packed into pointeralignment niches. As it turns out, there's more than enough
unused bits there for the bit packing to be effective.
See the doxygen comment in the
ValueStorage
specializationfor more details.
This does not add any performance overhead, even though
we now consistently assert the
InternalType
in all getters.This can also be made atomic with a double width compare-and-swap
instruction on x86_64 (
CMPXCHG16B
instruction) for parallel evaluation.Context
This is the best we can do with the current data model to address #8621.
The improvement is very significant and should give more leeway to the ever growing nixpkgs.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.