Description
Overview
Support field projection inside of #[repr(transparent)]
wrapper types defined in zerocopy and in the standard library.
Many thanks to @jswrenn, @kupiakos, @djkoloski, @SkiFire13, @cuviper, @danielhenrymantilla, and @DanielKeep for invaluable feedback and input on this design.
Motivation
There are a number of wrapper types - both in zerocopy and in the standard library - whose length and field positions are identical to those of a single, wrapped type, but which modify other aspects of the memory of that type in some way. Examples include:
- Standard library
MaybeUninit<T>
- layout is identical toT
, but any sequence of bytes (including uninitialized bytes) are a valid instanceUnsafeCell<T>
- layout and bit validity are identical toT
, but permits aliased mutation through a shared reference (&
)Cell<T>
- likeUnsafeCell<T>
, but limits the allowed mutations to those that can be guaranteed to be sound
- Zerocopy
Unalign<T>
- identical toT
, except with alignment 1ByteArray<T>
- identical toT
, except with alignment 1 and any sequence of bytes (not including uninitialized bytes) are a valid instance
Our most important uses of field projection are:
- Field projection inside of the
MaybeValid<T>
type required by ourTryFromBytes
design in order to support deriving theTryFromBytes
trait on users' types. - Field projection inside of the
Unalign<T>
type, which is exposed to users of zerocopy
Most of the complexity of field projection is not specific to the wrapper type. Thus, by solving field projection in the general case, we avoid having to solve it multiple times for different types (e.g. for MaybeValid
and Unalign
), and it gives us one central location to encapsulate all of the complexity. In other words, the primary motivations for this design are to support the MaybeValid
and Unalign
types, but field projection in other types comes for free.
API
User
A user who wishes to perform field projection does so using the project!
macro, which can be invoked in a number of ways:
// In these examples, `C<T>` is the container type, and `F` is the field type.
// Supports immutable and mutable references.
let _: &C<F> = project!(&c.f);
let _: &mut C<F> = project!(&mut c.f);
// Supports arbitrary expressions to generate the container reference.
let ident = |x| x;
let _: &C<F> = project!(&(ident(&c)).f);
// Supports chained field accesses.
let _: &C<G> = project!(&c.f.g);
// Supports indexing operations for elements and slices (bounds-checked at runtime).
let _: &C<H> = project!(&c.f.g[0]);
let _: &C<[H]> = project!(&c.f.g[1..3]);
Container author
Container authors implement the Projectable
trait:
use core::mem::MaybeUninit;
// If `MaybeUninit` supported unsized types, we could add `?Sized` bounds to `T` and `F`.
unsafe impl<T, F> Projectable<F, MaybeUninit<F>> for MaybeUninit<T> {
type Inner = T;
}
Design
This design is being prototyped in the field-project branch.
The core of the design is captured in this code snippet, which is explained below:
/// A container which supports field projection of its contained type.
///
/// `F` is the type of a field which can be projected into, and `W` is the wrapped version
/// of that type; when projecting into a field of type `F`, the resulting value will be of type `W`,
/// which is presumed to be equal to the container type instantiated with `F`.
///
/// # Safety
///
/// If `P: Projectable<F, W>`, then the following must hold:
/// - Given `p: *const P` or `p: *mut P`, it is valid to perform `let i = p as
/// *const P::Inner` or `let i = p as *mut P::Inner`. The size of the
/// referents of `p` and `i` must be identical (e.g. as reported by
/// `size_of_val_raw`).
/// - If the following hold:
/// - `p: &P` or `p: &mut P`.
/// - Given an `i: P::Inner` of size `size_of_val(p)`, there exists an `F` at
/// byte range `f` within `i`.
///
/// ...then it is sound to materialize a `&W` or `&mut W` which points to range
/// `f` within `p`.
///
/// Note that this definition holds regardless of whether `P`, `P::Inner`, or
/// `F` are sized or unsized.
pub unsafe trait Projectable<F: ?Sized, W: ?Sized> {
/// The inner type.
type Inner: ?Sized;
}
/// Performs field projection on `outer`, projecting into the field of type `F`
/// at the address provided by `inner_to_field`.
///
/// `outer_to_inner` and `field_to_wrapped_field` each perform only a
/// raw pointer cast. These can't be performed inside of `project` because,
/// in that context, the pointer types are generic, and so Rust doesn't know
/// that fat pointer conversions are guaranteed to be valid (e.g., that we're
/// never converting from a thin pointer to a fat pointer or between incompatible
/// fat pointer types). In the context of `project!`, these types are concrete,
/// and so this isn't an issue.
#[doc(hidden)]
#[inline(always)]
pub fn project<P, F, W, OuterToInner, InnerToField, FieldToWrappedField>(
_unsafe: unsafe_token::UnsafeToken,
outer: &P,
outer_to_inner: OuterToInner,
inner_to_field: InnerToField,
field_to_wrapped_field: FieldToWrappedField,
) -> &W
where
P: Projectable<F, W> + ?Sized,
// NOTE: This bound will be unnecessary once `Unalign` is removed and
// we support unsized types.
P::Inner: Sized,
F: ?Sized,
W: ?Sized,
OuterToInner: Fn(*const P) -> *const Unalign<P::Inner>,
InnerToField: Fn(*const Unalign<P::Inner>) -> *const F,
FieldToWrappedField: Fn(*const F) -> *const W,
{
let outer: *const P = outer;
let inner = outer_to_inner(outer);
let field = inner_to_field(inner);
let wrapped_field = field_to_wrapped_field(field);
unsafe { &*wrapped_field }
}
// `project_mut`, which is the mutable equivalent of `project`, is omitted for brevity
/// Performs field projection.
///
/// Given a wrapper, `w: W<T>`, and a field type in `T`, `f: F`,
/// `project!(&w.f)` returns a reference to a `W<F>`. Any "place expression"
/// on `w` is supported (`w.a.b.c`, `w[0]`, `w[3..5]`, etc). Mutable references
/// are also supported.
///
/// # Safety
///
/// It is unsound to project using a sequence of accesses that invoke
/// [`Deref::deref`] or [`DerefMut::deref_mut`].
#[macro_export]
macro_rules! project {
// Mutable versions of these matches are omitted for brevity.
(&$c:ident $($f:tt)*) => {
$crate::project!(&($c) $($f)*)
};
(&($c:expr) $($f:tt)*) => {{
// This function does nothing, but is unsafe to call, and so has the
// effect of requiring that the caller only invoke `project!` inside of
// an `unsafe` block.
$crate::project::promise_no_deref();
// We generate an `UnsafeToken` so that `project` can itself be
// safe, and thus we don't need to wrap the entire call to `project`
// in `unsafe { ... }`. This, in turn, is done so that the
// meta-variables `$c` and `$($f)*` are not expanded inside of an
// `unsafe { ... }`, which would allow safe Rust code to smuggle in
// unsafe code via a call to `project!` without needing to write the
// `unsafe` keyword.
//
// Note that this doesn't currently provide any benefits - the user
// still has to write `unsafe` thanks to the call to `promise_no_deref`
// - but this ensures that a future change in which we are able to
// make this macro safe will have as small a diff as possible.
let token = unsafe { $crate::project::unsafe_token::UnsafeToken::new() };
use ::core::borrow::Borrow as _;
$crate::project(
token,
$c.borrow(),
|outer| outer as *const _,
|inner| if false {
// This branch is never executed, but allows us to ensure that
// `$($f)*` doesn't contain any unsafe code that isn't wrapped
// in an `unsafe` block. If it does, then wrapping it in
// `unsafe` - as we do in the `else` branch - would allow users
// to write unsafe code without needing to write `unsafe`.
//
// The way we accomplish this is to generate a reference from
// `inner` (which is a raw pointer). That allows us to extract
// the unsafe operation of converting to a reference and wrap it
// in `unsafe { ... }` on its own, while leaving the `$($f)*`
// not wrapped in `unsafe { ... }`. Note that this is NOT sound
// to execute in the general case, but that's okay because we're
// in an `if false` branch. For example, if the wrapper type is
// `#[repr(packed)]`, then `inner_ref` may not be validly
// aligned, which is unsound.
let inner_ref = unsafe { &*inner };
::core::ptr::addr_of!(inner_ref .0 $($f)*)
} else {
unsafe { ::core::ptr::addr_of!((*inner) .0 $($f)* ) }
},
|field| field as *const _,
)
}};
}
#[doc(hidden)]
#[inline(always)]
pub const unsafe fn promise_no_deref() {}
#[doc(hidden)]
#[repr(packed)]
pub struct Unalign<T>(pub T);
#[doc(hidden)]
pub mod unsafe_token {
/// A token used to prove that the `unsafe` keyword has been written
/// somewhere.
pub struct UnsafeToken(());
impl UnsafeToken {
/// Constructs a new `UnsafeToken`.
///
/// # Safety
///
/// The caller is responsible for ensuring that they uphold the safety
/// invariants of any APIs which consume this token.
pub unsafe fn new() -> UnsafeToken {
UnsafeToken(())
}
}
}
In order to explain this design, consider the following hypothetical wrapper type:
#[repr(transparent)]
struct Wrapper<T: ?Sized>(T);
unsafe impl<T: ?Sized, F: ?Sized> Projectable<F, Wrapper<F>> for Wrapper<T> {
type Inner = T;
}
Projectable
and projection validity
Projectable
can only be implemented by types for which projection is valid. In particular, it must be the case that, if a memory region contains a Wrapper<T>
, and T
contains a field of type F
at a particular offset, it is valid to treat the bytes at the field offset as a Wrapper<F>
instead of an F
. Roughly speaking, this means that Wrapper
must be #[repr(transparent)]
or some equivalent repr (like #[repr(C, packed)]
). Examples of types which do not support projection are:
PantomData<T>
- a zero-sized type regardless ofT
; any byte offset other than 0 is out-of-bounds (see "Open Questions and Alternatives" for a caveat)#[repr(C)] struct Foo<T> { t: T, u: u8 }
- performing field projection would cause the trailingu
field to overlap with other parts ofT
Thankfully, all of the wrapper types this design is aimed at support projection.
Field offset computation
One of the core challenges of supporting field projection is to calculate the field's offset. For example, consider projecting &Wrapper<(u8, u16)>
into &Wrapper<u16>
. Given the byte offset of the u16
field within the (u8, u16)
type, field projection is trivial - perform the appropriate pointer offset math, and then materialize a &Wrapper<u16>
at the computed memory address. So how do we determine the field offset?
The standard library provides the addr_of!
macro for this purpose. It operates on a "place" expression such as:
let tuple: *const (u8, u16) = ...;
let elem: *const u16 = addr_of!(tuple.1);
Thus, given a &Wrapper<(u8, u16)>
, we can convert it to a *const (u8, u16)
and then use addr_of!
to compute the address at which we should materialize our &Wrapper<u16>
.
The role of the inner_to_field
argument to the project
and project_mut
functions is to encapsulate a call to addr_of!
or addr_of_mut!
. These calls themselves cannot happen inside of project
or project_mut
because the field access operation must operate on a concrete type, and all types are generic inside of project
/project_mut
. The inner_to_field
argument allows the addr_of!
/addr_of_mut!
call to be synthesized inside of the project!
macro, where the types are concrete.
Alignment
Currently, the addr_of!
macro requires that any dereferences that happen - even dereferences that don't result in a load from memory - can only operate on properly-aligned pointers. This means that the following naive implementation would be unsound:
let u = Unalign::new((0u8, 0u16));
let u_ptr: *const Unalign<(u8, u16)> = u;
let inner_ptr = u_ptr as *const (u8, u16);
let u16_ptr = addr_of!((*inner_ptr).1);
Since inner_ptr
may not be properly aligned as required by (u8, u16)
, the *inner_ptr
in addr_of!
may be unsound.
In order to avoid this problem, we don't operate directly on the Projectable::Inner
type. Instead of converting a *const P
to a *const P::Inner
, we convert it to a *const Unalign<P::Inner>
(where Unalign
is a #[doc(hidden)]
type used only by project!
). Then, instead of emitting an addr_of!
call like this (which works if inner: *const P::Inner
):
addr_of!((*inner) $($f)* )
...we insert a .0
since inner
is actually *const Unalign<P::Inner>
:
addr_of!((*inner).0 $($f)* )
This solves the alignment problem, but has the unfortunate side effect of preventing us from supporting projection from unsized types. The reason is that Unalign
is repr(packed)
, and repr(packed)
types cannot be unsized. Luckily, the behavior of addr_of!
may soon change in rust-lang/reference#1387 to permit dereferencing unaligned pointers. If that happens, we can remove the Unalign
trick entirely and support unsized projection (this would also depend upon Miri being taught that unaligned dereferences are sound, which is implemented in rust-lang/rust#114330).
Projectable
's safety invariants
The Projectable
trait's safety invariants might seem to someone familiar with unsafe Rust to be surprisingly complex. See this document for an explanation of why supporting unsized types introduces subtlety that necessitates such complex safety invariants.
Preventing field projection through references or other Deref
/DerefMut
fields
Currently, project!
is unsafe to call. In order to see why, consider the following invocation:
struct Foo {
a: u8,
b: u16,
}
let m0: MaybeUninit<Foo> = ...;
let b0: &MaybeUninit<u16> = project!(&m.b);
let m1: MaybeUninit<&Foo> = ...;
let b1: &MaybeUninit<u16> = project!(&m.b);
In the first code block, we perform a normal field projection into the type Foo
, which works as expected, and is sound. While the contained Foo
might be uninitialized, we at least know it's layout and field offsets. Thus, based on where m0
lives in memory, we can compute the address of the contained Foo::b
field. Since we return it as &MaybeUninit<u16>
, we're not exposing any uninitialized memory.
In the second code block, we perform a field projection into the type &Foo
. This is unsound. Unlike in the first code block, Foo::b
does not live in m1
, but instead lives at some memory address which is referred to by m1
. Since the contents of m1
might be uninitialized, the address that we need to dereference might be uninitialized. Thus, it's possible that the &MaybeUninit<u16>
returned from project!
might itself point anywhere in memory (and semantically, the address itself - not its referent - might be uninitialized).
Thus, we need to prevent field projection through references like &Foo
. References at the top level aren't the only issue - we also need to prevent field projection through references which are nested arbitrarily deep within another type such as (u8, &Foo)
, etc. Even projection through non-reference fields which implement Deref
/DerefMut
(e.g., Box
) need to be prevented.
Unfortunately, there doesn't seem to be any good way to statically forbid such projections without significantly worsening the usability of project!
, for example by requiring the caller to provide a full copy of the definition of the type at the call site.
For this reason, we've decided to simply make the project!
macro unsafe, and put the onus on the caller to avoid projection through references.
Here are some other alternatives we considered and rejected:
- Modify
project
to take aFnOnce(P::Inner) -> F
closure. This closure would never be called, but it would only compile if it was possible to moveP::Inner
by ownership and extract an ownedF
. This has a few problems:- There doesn't seem to be any way to perform destructuring without being able to name the types (e.g., you can't do
let { a, b } = foo
; you need to be able to nameFoo { a, b }
) - You can't just do
inner.a.b.c
because that would work behind references orDeref
/DerefMut
if one of the types behind the reference isCopy
- This approach precludes unsized projection because both
P::Inner
andF
need to be moved by value
- There doesn't seem to be any way to perform destructuring without being able to name the types (e.g., you can't do
- Once it's stabilized, we could use the
offset_of!
macro. We can't use it while it's unstable, of course, but it also has other drawbacks: It currently doesn't support unsized types or array indexing, both of which our macro supports.
Solving this problem without requiring project!
to be unsafe may require a change to the language such as the addition of an alternative to addr_of!
which bans any memory loads in the expression that is its argument.
Why project
/project_mut
take an UnsafeToken
instead of being unsafe fn
s
An earlier version of this design had project
and project_mut
as unsafe fn
s, resulting in macro code like (simplified for brevity):
unsafe {
$crate::project(
$c.borrow(),
|outer| outer as *const _,
|inner| ::core::ptr::addr_of!((*inner) .0 $($f)* ),
|field| field as *const _,
)
}
This has the effect of allowing the the caller to pass unsafe code to project!
(in $c
or $f
), and that code will silently be placed inside an unsafe
block, permitting the caller to invoke unsafe code without needing to write unsafe
themselves. The current version of the design shadows this soundness hole by still requiring an unsafe
block in order to call promise_no_deref
; however, if a future version of the design were to statically prevent projection-through-deref and thus remove this call, the soundness hole would be uncovered.
An obvious solution to this problem is simply move the macro variable expansion outside of the unsafe
block by assigning to variables:
let c = $c.borrow();
let inner_to_field = |inner| ::core::ptr::addr_of!((*inner) .0 $($f)* );
unsafe {
$crate::project(
c,
|outer| outer as *const _,
inner_to_field,
|field| field as *const _,
)
}
Unfortunately, due to a quirk of rustc's type inference algorithm, this causes type inference to fail. The use of UnsafeToken
allows us to avoid both the type inference issue and the unsafe-code-smuggling issue by passing the inner_to_field
closure inline and not needing to place it inside an unsafe
block.
Credit to @SkiFire13 for pointing out the unsafe-code-smuggling issue and for suggesting the UnsafeToken
solution.
Why does Projectable
have type parameters?
Naively, we might expect Projectable
to be defined like this:
pub unsafe trait Projectable {
type Inner: ?Sized;
type Wrapped<T: ?Sized>: ?Sized;
}
In other words, since we have GATs, we shouldn't need to have Projectable
be parameteric over every pair of field type and wrapped-version-of-that-field type (ie, F
and W
). We should be able to use a Wrapped
GAT to express this.
However, this doesn't play nicely with wrapper types that don't support wrapping unsized types. Consider what would happen if we tried to implement Projectable
as defined here for MaybeUninit
, which doesn't support unsized types:
unsafe impl<T> Projectable for MaybeUninit<T> {
type Inner = T;
type Wrapped<T: ?Sized> = MaybeUninit<T>; // ERROR: MaybeUninit requires that `T: Sized`, which isn't guaranteed
}
Alternatively, we could remove the T: ?Sized
bound, but then we wouldn't be able to support projection into unsized fields. By contrast, placing F
and W
in the definition of Projectable
allows types to support or not support unsized types as desired:
unsafe impl<T, F> Projectable<T, MaybeUninit<F>> for MaybeUninit<T> { ... }
unsafe impl<T: ?Sized, F: ?Sized> Projectable<T, UnsizedWrapper<F>> for UnsizedWrapper<T> { ... }
Does this design support projection into an UnsafeCell
?
We might expect that it'd be valid to support projection into an UnsafeCell
:
let cell = UnsafeCell::new((0u8, 1u16));
let one: &Unsafe<u16> = project!(&cell.1);
This design could easily support such projection by implementing Projectable
for UnsafeCell
(and wrapper types like Cell
).
As of this writing, it seems that the intention is for this to be sound, but that it's not currently guaranteed. In fact, there was very recently a bug in the standard library that rendered this unsound in practice (it's been fixed).
Thus, while it will likely be officially sound at some point, we need to wait until that decision is made before we can implement Projectable
for UnsafeCell
and support this.
Miscellaneous
A few other aspects of this design are worth calling out:
- The
project
andproject_mut
functions aren't strictly necessary - all of their logic could be inlined inside ofproject!
. However, in a macro context, there's no way to name the concrete types being operated on, and similarly no way to name bounds on those types. There are ways to hack around these limitations to generate code that will only compile in the right circumstances, but it's unwieldy and error-prone. Theproject
andproject_mut
functions provide a place to encode these types and bounds so that theproject!
macro can be simpler. - It would be reasonable to assume that the
Projectable::Inner
type is unnecessary - if theProjectable
type is#[repr(transparent)]
, then it doesn't matter whether the field offset is calculated from a pointer of typeSelf
or of typeSelf::Inner
. However, the call toaddr_of!
must happen in a context in which the inner type is known and concrete, which in turn requires that theinner_to_field
argument toproject
/project_mut
takes aP::Inner
pointer as its argument. The only way to avoid this would be to get rid ofproject
/project_mut
entirely, and instead put all logic inside ofproject!
, which is undesirable for the reasons discussed above.
Performance
Testing on Godbolt confirms that generated code is well-optimized (-C opt-level=3
):
pub fn project_maybe_uninit_tuple(m: &MaybeUninit<(u8, u16)>) -> &MaybeUninit<u16> {
unsafe { project!(&m.1) }
}
example::project_maybe_uninit_tuple:
lea rax, [rdi + 2]
ret
Open Questions
- How does this interact with enums/unions?
- Currently,
project!
uses$c.borrow()
to make it so that it works with both borrowed and owned containers. Is there a way to do this that wouldn't conflict if a type had an inherent method calledborrow
(ie, make sureBorrow::borrow
is always used)? One possibility would be to add our own extension trait that is implemented for allT: Borrow
with a method whose name is much less likely to conflict like__project_borrow
. - What to do about
dyn Trait
s? It is legal toas
castdyn Trait
fat pointers, so it's not clear how the safety requirements onProjectable
apply to them. It's probably fine in practice because you can't access the fields of adyn Trait
object, and so you couldn't generate a valid call toproject!
. That said, what about something likeMaybeUninit<(u8, dyn Foo)>
? What would happen if you tried to project into.1
?
Potential future improvements
- Could we support mutable projection into multiple non-overlapping fields at once? Rust lets you do this natively because it can reason about fields not overlapping, but our current design doesn't allow it.
- Can
PhantomData<T>
beProjectable
? SincePhantomData
is a ZST, maybe it's fine to materialize references to it which are technically out-of-bounds? - Could we support projection using methods? E.g., imagine that I have an opaque type and instead of being able to access a field,
f
, directly, I can call aget_f(&self) -> &F
method. - We might be able to support unsized projection in a generic context if we use manually transmute fat pointers using
transmute
or a union (the latter might be necessary if the type system doesn't know that both pointers are fat and so have the same size) instead of anas
cast. - It might also be in scope to support arbitrary transposition (e.g.,
W<[T]>
->[W<T>]
,W<[T; N]>
->[W<T>; N]
, etc).
Alternatives
- Could we just implement field projection directly where it's needed (namely, for the
MaybeValid
type as part of the implementation ofTryFromBytes
, and for theUnalign
type) rather than supporting generic field projection? We could, but most of the complexity in this design is inherent to field projection rather than specific to the wrapper type. We would have to solve the same problems twice - once for each type - and again for any future type we wanted to support. Supporting generic field projection is not significantly more complicated, cuts our work roughly in half, and allows us to encapsulate all of the complexity of field projection in a single place.