Skip to content

Simple Arc implementation (without Weak refs) #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@
* [Handling Zero-Sized Types](vec-zsts.md)
* [Final Code](vec-final.md)
* [Implementing Arc and Mutex](arc-and-mutex.md)
* [Arc](arc.md)
* [Layout](arc-layout.md)
* [Base Code](arc-base.md)
* [Cloning](arc-clone.md)
* [Dropping](arc-drop.md)
* [Final Code](arc-final.md)
* [FFI](ffi.md)
* [Beneath `std`](beneath-std.md)
* [#[panic_handler]](panic-handler.md)
2 changes: 1 addition & 1 deletion src/arc-and-mutex.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ Knowing the theory is all fine and good, but the *best* way to understand
something is to use it. To better understand atomics and interior mutability,
we'll be implementing versions of the standard library's Arc and Mutex types.

TODO: ALL OF THIS OMG
TODO: Mutex
136 changes: 136 additions & 0 deletions src/arc-base.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Base Code

Now that we've decided the layout for our implementation of `Arc`, let's create
some basic code.

## Constructing the Arc

We'll first need a way to construct an `Arc<T>`.

This is pretty simple, as we just need to box the `ArcInner<T>` and get a
`NonNull<T>` pointer to it.

```rust,ignore
impl<T> Arc<T> {
pub fn new(data: T) -> Arc<T> {
// We start the reference count at 1, as that first reference is the
// current pointer.
let boxed = Box::new(ArcInner {
rc: AtomicUsize::new(1),
data,
});
Arc {
// It is okay to call `.unwrap()` here as we get a pointer from
// `Box::into_raw` which is guaranteed to not be null.
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(),
phantom: PhantomData,
}
}
}
```

## Send and Sync

Since we're building a concurrency primitive, we'll need to be able to send it
across threads. Thus, we can implement the `Send` and `Sync` marker traits. For
more information on these, see [the section on `Send` and
`Sync`](send-and-sync.md).

This is okay because:
* You can only get a mutable reference to the value inside an `Arc` if and only
if it is the only `Arc` referencing that data (which only happens in `Drop`)
* We use atomics for the shared mutable reference counting

```rust,ignore
unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}
```

We need to have the bound `T: Sync + Send` because if we did not provide those
bounds, it would be possible to share values that are thread-unsafe across a
thread boundary via an `Arc`, which could possibly cause data races or
unsoundness.

For example, if those bounds were not present, `Arc<Rc<u32>>` would be `Sync` or
`Send`, meaning that you could clone the `Rc` out of the `Arc` to send it across
a thread (without creating an entirely new `Rc`), which would create data races
as `Rc` is not thread-safe.

## Getting the `ArcInner`

To dereference the `NonNull<T>` pointer into a `&T`, we can call
`NonNull::as_ref`. This is unsafe, unlike the typical `as_ref` function, so we
must call it like this:
```rust,ignore
unsafe { self.ptr.as_ref() }
```

We'll be using this snippet a few times in this code (usually with an associated
`let` binding).

This unsafety is okay because while this `Arc` is alive, we're guaranteed that
the inner pointer is valid.

## Deref

Alright. Now we can make `Arc`s (and soon will be able to clone and destroy them correctly), but how do we get
to the data inside?

What we need now is an implementation of `Deref`.

We'll need to import the trait:
```rust,ignore
use std::ops::Deref;
```

And here's the implementation:
```rust,ignore
impl<T> Deref for Arc<T> {
type Target = T;

fn deref(&self) -> &T {
let inner = unsafe { self.ptr.as_ref() };
&inner.data
}
}
```

Pretty simple, eh? This simply dereferences the `NonNull` pointer to the
`ArcInner<T>`, then gets a reference to the data inside.

## Code

Here's all the code from this section:
```rust,ignore
use std::ops::Deref;

impl<T> Arc<T> {
pub fn new(data: T) -> Arc<T> {
// We start the reference count at 1, as that first reference is the
// current pointer.
let boxed = Box::new(ArcInner {
rc: AtomicUsize::new(1),
data,
});
Arc {
// It is okay to call `.unwrap()` here as we get a pointer from
// `Box::into_raw` which is guaranteed to not be null.
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(),
phantom: PhantomData,
}
}
}

unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}


impl<T> Deref for Arc<T> {
type Target = T;

fn deref(&self) -> &T {
let inner = unsafe { self.ptr.as_ref() };
&inner.data
}
}
```
94 changes: 94 additions & 0 deletions src/arc-clone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Cloning

Now that we've got some basic code set up, we'll need a way to clone the `Arc`.

Basically, we need to:
1. Increment the atomic reference count
2. Construct a new instance of the `Arc` from the inner pointer

First, we need to get access to the `ArcInner`:
```rust,ignore
let inner = unsafe { self.ptr.as_ref() };
```

We can update the atomic reference count as follows:
```rust,ignore
let old_rc = inner.rc.fetch_add(1, Ordering::???);
```

But what ordering should we use here? We don't really have any code that will
need atomic synchronization when cloning, as we do not modify the internal value
while cloning. Thus, we can use a Relaxed ordering here, which implies no
happens-before relationship but is atomic. When `Drop`ping the Arc, however,
we'll need to atomically synchronize when decrementing the reference count. This
is described more in [the section on the `Drop` implementation for
`Arc`](arc-drop.md). For more information on atomic relationships and Relaxed
ordering, see [the section on atomics](atomics.md).

Thus, the code becomes this:
```rust,ignore
let old_rc = inner.rc.fetch_add(1, Ordering::Relaxed);
```

We'll need to add another import to use `Ordering`:
```rust,ignore
use std::sync::atomic::Ordering;
```

However, we have one problem with this implementation right now. What if someone
decides to `mem::forget` a bunch of Arcs? The code we have written so far (and
will write) assumes that the reference count accurately portrays how many Arcs
are in memory, but with `mem::forget` this is false. Thus, when more and more
Arcs are cloned from this one without them being `Drop`ped and the reference
count being decremented, we can overflow! This will cause use-after-free which
is **INCREDIBLY BAD!**

To handle this, we need to check that the reference count does not go over some
arbitrary value (below `usize::MAX`, as we're storing the reference count as an
`AtomicUsize`), and do *something*.

The standard library's implementation decides to just abort the program (as it
is an incredibly unlikely case in normal code and if it happens, the program is
probably incredibly degenerate) if the reference count reaches `isize::MAX`
(about half of `usize::MAX`) on any thread, on the assumption that there are
probably not about 2 billion threads (or about **9 quintillion** on some 64-bit
machines) incrementing the reference count at once. This is what we'll do.

It's pretty simple to implement this behaviour:
```rust,ignore
if old_rc >= isize::MAX as usize {
std::process::abort();
}
```

Then, we need to return a new instance of the `Arc`:
```rust,ignore
Self {
ptr: self.ptr,
phantom: PhantomData
}
```

Now, let's wrap this all up inside the `Clone` implementation:
```rust,ignore
use std::sync::atomic::Ordering;

impl<T> Clone for Arc<T> {
fn clone(&self) -> Arc<T> {
let inner = unsafe { self.ptr.as_ref() };
// Using a relaxed ordering is alright here as we don't need any atomic
// synchronization here as we're not modifying or accessing the inner
// data.
let old_rc = inner.rc.fetch_add(1, Ordering::Relaxed);

if old_rc >= isize::MAX as usize {
std::process::abort();
}

Self {
ptr: self.ptr,
phantom: PhantomData,
}
}
}
```
98 changes: 98 additions & 0 deletions src/arc-drop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Dropping

We now need a way to decrease the reference count and drop the data once it is
low enough, otherwise the data will live forever on the heap.

To do this, we can implement `Drop`.

Basically, we need to:
1. Decrement the reference count
2. If there is only one reference remaining to the data, then:
3. Atomically fence the data to prevent reordering of the use and deletion of
the data
4. Drop the inner data

First, we'll need to get access to the `ArcInner`:
```rust,ignore
let inner = unsafe { self.ptr.as_ref() };
```

Now, we need to decrement the reference count. To streamline our code, we can
also return if the returned value from `fetch_sub` (the value of the reference
count before decrementing it) is not equal to `1` (which happens when we are not
the last reference to the data).
```rust,ignore
if inner.rc.fetch_sub(1, Ordering::Relaxed) != 1 {
return;
}
```

We then need to create an atomic fence to prevent reordering of the use of the
data and deletion of the data. As described in [the standard library's
implementation of `Arc`][3]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and change this to something like:


What atomic ordering should we use here? To know that, we need to consider what happens-before relationships we want to ensure (or alternatively, what dataraces we want to prevent). Drop is special because it's the one place where we mutate the Arc's payload (by dropping and freeing it). This is a potential read-write data race with all the other threads that have been happily reading the payload without any synchronization.

So we need to ensure all those non-atomic accesses have a proper happens-before relationship with us dropping and freeing the payload. To establish happens-before relationships with non-atomic accesses, we need (at least) Acquire-Release semantics.

As a reminder, Acquires ensure non-atomic accesses after them on the same thread stay after them (they happen-before everything that comes after them) and Releases ensure non-atomic accesses before them on the same thread stay before them (everything before them happens-before them).

So we have many threads that look like this:

(A) non-atomic accesses to payload
(B) atomic decrement refcount

And a "final" thread that looks like this:

(C) non-atomic accesses to payload
(D) atomic decrement refcount
(E) non-atomic free/drop contents

And we want to ensure every thread agrees that everything happens-before E.

One thing that jumps out clearly is that the non-final threads all end with an atomic access (B), and we want to keep everything else (A) before it. That's exactly what a Release does! So it seems we'd like our atomic decrement to be a Release.

if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
  return;
}

However this on its own doesn't work -- our final thread would also use a Release, and that means (E) would be allowed to happen-before (D)! To prevent this we need Release's partner, an Acquire! We could make (D) AcquireRelease (AcqRel), but this would penalize all the other threads performing (B). So instead, we will introduce a separate Acquire that only happens if we're the final thread. And since we've already loaded all the values we need, we can use a fence.

if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
  return;
}
atomic::fence(Ordering::Acquire);

If this helps, you can think of this like a sort of implicit RWLock: every Arc is a ReadGuard which allows unlimited read access until they go away and "Release the lock" (trapping all accesses on that thread before that point). The final thread then upgrades itself to a WriteGuard which "Acquires the lock" (creating a new critical section which strictly happens after everything else).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However this on its own doesn't work -- our final thread would also use a Release, and that means (E) would be allowed to happen-before (D)!

This sounds like for X, Y we always have that one happens-before the other or vice versa... which is not the way happens-before actually works. Happens-before is a partial order, and some events simply are unordered. So I think it'd be better to phrase this accordingly, saying something like "[...] and that means (D) would not be forced to happen-before (E)". Except that's also wrong, program-order is included in happens-before. The actual issue is between (E) and (A), isn't it? We must make (A) happens-before (E), and that's why there needs to be an "acquire" in the "final" thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder, Acquires ensure non-atomic accesses after them on the same thread stay after them (they happen-before everything that comes after them) and Releases ensure non-atomic accesses before them on the same thread stay before them (everything before them happens-before them).

The key thing with an acquire is that the release it reads from (and everything that comes before it) happens-before everything that comes after the acquire. Each acquire is paired with a release, and this pair establishes a happens-before link across threads. Personally, I find this way of thinking about it easier than thinking about the release and the acquire separately. (Also see what I said above: to my knowledge, happens-before includes program-order, so "X happens-before everything that comes after it in the same thread" is true for all X.)

> This fence is needed to prevent reordering of use of the data and deletion of
> the data. Because it is marked `Release`, the decreasing of the reference
> count synchronizes with this `Acquire` fence. This means that use of the data
> happens before decreasing the reference count, which happens before this
> fence, which happens before the deletion of the data.
>
> As explained in the [Boost documentation][1],
>
> > It is important to enforce any possible access to the object in one
> > thread (through an existing reference) to *happen before* deleting
> > the object in a different thread. This is achieved by a "release"
> > operation after dropping a reference (any access to the object
> > through this reference must obviously happened before), and an
> > "acquire" operation before deleting the object.
>
> In particular, while the contents of an Arc are usually immutable, it's
> possible to have interior writes to something like a Mutex<T>. Since a Mutex
> is not acquired when it is deleted, we can't rely on its synchronization logic
> to make writes in thread A visible to a destructor running in thread B.
>
> Also note that the Acquire fence here could probably be replaced with an
> Acquire load, which could improve performance in highly-contended situations.
> See [2].
>
> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html
> [2]: https://github.com/rust-lang/rust/pull/41714
[3]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1440-L1467

To do this, we do the following:
```rust,ignore
atomic::fence(Ordering::Acquire);
```

We'll need to import `std::sync::atomic` itself:
```rust,ignore
use std::sync::atomic;
```

Finally, we can drop the data itself. We use `Box::from_raw` to drop the boxed
`ArcInner<T>` and its data. This takes a `*mut T` and not a `NonNull<T>`, so we
must convert using `NonNull::as_ptr`.

```rust,ignore
unsafe { Box::from_raw(self.ptr.as_ptr()); }
```

This is safe as we know we have the last pointer to the `ArcInner` and that its
pointer is valid.

Now, let's wrap this all up inside the `Drop` implementation:
```rust,ignore
impl<T> Drop for Arc<T> {
fn drop(&mut self) {
let inner = unsafe { self.ptr.as_ref() };
if inner.rc.fetch_sub(1, Ordering::Release) != 1 {
return;
}
// This fence is needed to prevent reordering of the use and deletion
// of the data.
atomic::fence(Ordering::Acquire);
// This is safe as we know we have the last pointer to the `ArcInner`
// and that its pointer is valid.
unsafe { Box::from_raw(self.ptr.as_ptr()); }
}
}
```
Loading