-
Notifications
You must be signed in to change notification settings - Fork 289
Simple Arc implementation (without Weak refs) #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
416ed1d
fcbd950
eea265b
9c266f6
c82fc47
4b9ec32
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
# Base Code | ||
|
||
Now that we've decided the layout for our implementation of `Arc`, let's create | ||
some basic code. | ||
|
||
## Constructing the Arc | ||
|
||
We'll first need a way to construct an `Arc<T>`. | ||
|
||
This is pretty simple, as we just need to box the `ArcInner<T>` and get a | ||
`NonNull<T>` pointer to it. | ||
|
||
```rust,ignore | ||
impl<T> Arc<T> { | ||
pub fn new(data: T) -> Arc<T> { | ||
// We start the reference count at 1, as that first reference is the | ||
// current pointer. | ||
let boxed = Box::new(ArcInner { | ||
rc: AtomicUsize::new(1), | ||
data, | ||
}); | ||
Arc { | ||
// It is okay to call `.unwrap()` here as we get a pointer from | ||
// `Box::into_raw` which is guaranteed to not be null. | ||
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(), | ||
phantom: PhantomData, | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Send and Sync | ||
|
||
Since we're building a concurrency primitive, we'll need to be able to send it | ||
across threads. Thus, we can implement the `Send` and `Sync` marker traits. For | ||
more information on these, see [the section on `Send` and | ||
`Sync`](send-and-sync.md). | ||
|
||
This is okay because: | ||
* You can only get a mutable reference to the value inside an `Arc` if and only | ||
if it is the only `Arc` referencing that data (which only happens in `Drop`) | ||
* We use atomics for the shared mutable reference counting | ||
|
||
```rust,ignore | ||
unsafe impl<T: Sync + Send> Send for Arc<T> {} | ||
unsafe impl<T: Sync + Send> Sync for Arc<T> {} | ||
``` | ||
|
||
We need to have the bound `T: Sync + Send` because if we did not provide those | ||
bounds, it would be possible to share values that are thread-unsafe across a | ||
thread boundary via an `Arc`, which could possibly cause data races or | ||
unsoundness. | ||
|
||
For example, if those bounds were not present, `Arc<Rc<u32>>` would be `Sync` or | ||
`Send`, meaning that you could clone the `Rc` out of the `Arc` to send it across | ||
a thread (without creating an entirely new `Rc`), which would create data races | ||
as `Rc` is not thread-safe. | ||
|
||
## Getting the `ArcInner` | ||
|
||
To dereference the `NonNull<T>` pointer into a `&T`, we can call | ||
`NonNull::as_ref`. This is unsafe, unlike the typical `as_ref` function, so we | ||
must call it like this: | ||
```rust,ignore | ||
unsafe { self.ptr.as_ref() } | ||
``` | ||
|
||
We'll be using this snippet a few times in this code (usually with an associated | ||
`let` binding). | ||
|
||
This unsafety is okay because while this `Arc` is alive, we're guaranteed that | ||
the inner pointer is valid. | ||
|
||
## Deref | ||
|
||
Alright. Now we can make `Arc`s (and soon will be able to clone and destroy them correctly), but how do we get | ||
to the data inside? | ||
|
||
What we need now is an implementation of `Deref`. | ||
|
||
We'll need to import the trait: | ||
```rust,ignore | ||
use std::ops::Deref; | ||
``` | ||
|
||
And here's the implementation: | ||
```rust,ignore | ||
impl<T> Deref for Arc<T> { | ||
type Target = T; | ||
|
||
fn deref(&self) -> &T { | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
&inner.data | ||
} | ||
} | ||
``` | ||
|
||
Pretty simple, eh? This simply dereferences the `NonNull` pointer to the | ||
`ArcInner<T>`, then gets a reference to the data inside. | ||
|
||
## Code | ||
|
||
Here's all the code from this section: | ||
```rust,ignore | ||
use std::ops::Deref; | ||
|
||
impl<T> Arc<T> { | ||
pub fn new(data: T) -> Arc<T> { | ||
// We start the reference count at 1, as that first reference is the | ||
// current pointer. | ||
let boxed = Box::new(ArcInner { | ||
rc: AtomicUsize::new(1), | ||
data, | ||
}); | ||
Arc { | ||
// It is okay to call `.unwrap()` here as we get a pointer from | ||
// `Box::into_raw` which is guaranteed to not be null. | ||
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(), | ||
phantom: PhantomData, | ||
} | ||
} | ||
} | ||
|
||
unsafe impl<T: Sync + Send> Send for Arc<T> {} | ||
unsafe impl<T: Sync + Send> Sync for Arc<T> {} | ||
|
||
|
||
impl<T> Deref for Arc<T> { | ||
type Target = T; | ||
|
||
fn deref(&self) -> &T { | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
&inner.data | ||
} | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Cloning | ||
|
||
Now that we've got some basic code set up, we'll need a way to clone the `Arc`. | ||
|
||
Basically, we need to: | ||
1. Increment the atomic reference count | ||
2. Construct a new instance of the `Arc` from the inner pointer | ||
|
||
First, we need to get access to the `ArcInner`: | ||
```rust,ignore | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
``` | ||
|
||
We can update the atomic reference count as follows: | ||
```rust,ignore | ||
let old_rc = inner.rc.fetch_add(1, Ordering::???); | ||
``` | ||
|
||
But what ordering should we use here? We don't really have any code that will | ||
need atomic synchronization when cloning, as we do not modify the internal value | ||
while cloning. Thus, we can use a Relaxed ordering here, which implies no | ||
happens-before relationship but is atomic. When `Drop`ping the Arc, however, | ||
we'll need to atomically synchronize when decrementing the reference count. This | ||
is described more in [the section on the `Drop` implementation for | ||
`Arc`](arc-drop.md). For more information on atomic relationships and Relaxed | ||
ordering, see [the section on atomics](atomics.md). | ||
|
||
Thus, the code becomes this: | ||
```rust,ignore | ||
let old_rc = inner.rc.fetch_add(1, Ordering::Relaxed); | ||
``` | ||
|
||
ThePuzzlemaker marked this conversation as resolved.
Show resolved
Hide resolved
|
||
We'll need to add another import to use `Ordering`: | ||
```rust,ignore | ||
use std::sync::atomic::Ordering; | ||
``` | ||
|
||
However, we have one problem with this implementation right now. What if someone | ||
decides to `mem::forget` a bunch of Arcs? The code we have written so far (and | ||
will write) assumes that the reference count accurately portrays how many Arcs | ||
are in memory, but with `mem::forget` this is false. Thus, when more and more | ||
Arcs are cloned from this one without them being `Drop`ped and the reference | ||
count being decremented, we can overflow! This will cause use-after-free which | ||
is **INCREDIBLY BAD!** | ||
|
||
To handle this, we need to check that the reference count does not go over some | ||
arbitrary value (below `usize::MAX`, as we're storing the reference count as an | ||
`AtomicUsize`), and do *something*. | ||
|
||
The standard library's implementation decides to just abort the program (as it | ||
is an incredibly unlikely case in normal code and if it happens, the program is | ||
probably incredibly degenerate) if the reference count reaches `isize::MAX` | ||
(about half of `usize::MAX`) on any thread, on the assumption that there are | ||
probably not about 2 billion threads (or about **9 quintillion** on some 64-bit | ||
machines) incrementing the reference count at once. This is what we'll do. | ||
|
||
It's pretty simple to implement this behaviour: | ||
```rust,ignore | ||
if old_rc >= isize::MAX as usize { | ||
std::process::abort(); | ||
} | ||
``` | ||
|
||
Then, we need to return a new instance of the `Arc`: | ||
```rust,ignore | ||
Self { | ||
ptr: self.ptr, | ||
phantom: PhantomData | ||
} | ||
``` | ||
|
||
Now, let's wrap this all up inside the `Clone` implementation: | ||
```rust,ignore | ||
use std::sync::atomic::Ordering; | ||
|
||
impl<T> Clone for Arc<T> { | ||
fn clone(&self) -> Arc<T> { | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
// Using a relaxed ordering is alright here as we don't need any atomic | ||
// synchronization here as we're not modifying or accessing the inner | ||
// data. | ||
let old_rc = inner.rc.fetch_add(1, Ordering::Relaxed); | ||
|
||
if old_rc >= isize::MAX as usize { | ||
std::process::abort(); | ||
} | ||
|
||
Self { | ||
ptr: self.ptr, | ||
phantom: PhantomData, | ||
} | ||
} | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Dropping | ||
|
||
We now need a way to decrease the reference count and drop the data once it is | ||
low enough, otherwise the data will live forever on the heap. | ||
|
||
To do this, we can implement `Drop`. | ||
|
||
Basically, we need to: | ||
1. Decrement the reference count | ||
2. If there is only one reference remaining to the data, then: | ||
3. Atomically fence the data to prevent reordering of the use and deletion of | ||
the data | ||
4. Drop the inner data | ||
|
||
First, we'll need to get access to the `ArcInner`: | ||
```rust,ignore | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
``` | ||
|
||
Now, we need to decrement the reference count. To streamline our code, we can | ||
also return if the returned value from `fetch_sub` (the value of the reference | ||
count before decrementing it) is not equal to `1` (which happens when we are not | ||
the last reference to the data). | ||
```rust,ignore | ||
if inner.rc.fetch_sub(1, Ordering::Relaxed) != 1 { | ||
return; | ||
} | ||
``` | ||
|
||
We then need to create an atomic fence to prevent reordering of the use of the | ||
data and deletion of the data. As described in [the standard library's | ||
implementation of `Arc`][3]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and change this to something like: What atomic ordering should we use here? To know that, we need to consider what happens-before relationships we want to ensure (or alternatively, what dataraces we want to prevent). Drop is special because it's the one place where we mutate the Arc's payload (by dropping and freeing it). This is a potential read-write data race with all the other threads that have been happily reading the payload without any synchronization. So we need to ensure all those non-atomic accesses have a proper happens-before relationship with us dropping and freeing the payload. To establish happens-before relationships with non-atomic accesses, we need (at least) Acquire-Release semantics. As a reminder, Acquires ensure non-atomic accesses after them on the same thread stay after them (they happen-before everything that comes after them) and Releases ensure non-atomic accesses before them on the same thread stay before them (everything before them happens-before them). So we have many threads that look like this:
And a "final" thread that looks like this:
And we want to ensure every thread agrees that everything happens-before E. One thing that jumps out clearly is that the non-final threads all end with an atomic access (B), and we want to keep everything else (A) before it. That's exactly what a Release does! So it seems we'd like our atomic decrement to be a Release.
However this on its own doesn't work -- our final thread would also use a Release, and that means (E) would be allowed to happen-before (D)! To prevent this we need Release's partner, an Acquire! We could make (D) AcquireRelease (AcqRel), but this would penalize all the other threads performing (B). So instead, we will introduce a separate Acquire that only happens if we're the final thread. And since we've already loaded all the values we need, we can use a fence.
If this helps, you can think of this like a sort of implicit RWLock: every Arc is a ReadGuard which allows unlimited read access until they go away and "Release the lock" (trapping all accesses on that thread before that point). The final thread then upgrades itself to a WriteGuard which "Acquires the lock" (creating a new critical section which strictly happens after everything else). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This sounds like for X, Y we always have that one happens-before the other or vice versa... which is not the way happens-before actually works. Happens-before is a partial order, and some events simply are unordered. So I think it'd be better to phrase this accordingly, saying something like "[...] and that means (D) would not be forced to happen-before (E)". Except that's also wrong, program-order is included in happens-before. The actual issue is between (E) and (A), isn't it? We must make (A) happens-before (E), and that's why there needs to be an "acquire" in the "final" thread. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The key thing with an acquire is that the release it reads from (and everything that comes before it) happens-before everything that comes after the acquire. Each acquire is paired with a release, and this pair establishes a happens-before link across threads. Personally, I find this way of thinking about it easier than thinking about the release and the acquire separately. (Also see what I said above: to my knowledge, happens-before includes program-order, so "X happens-before everything that comes after it in the same thread" is true for all X.) |
||
> This fence is needed to prevent reordering of use of the data and deletion of | ||
> the data. Because it is marked `Release`, the decreasing of the reference | ||
> count synchronizes with this `Acquire` fence. This means that use of the data | ||
> happens before decreasing the reference count, which happens before this | ||
> fence, which happens before the deletion of the data. | ||
> | ||
> As explained in the [Boost documentation][1], | ||
> | ||
> > It is important to enforce any possible access to the object in one | ||
> > thread (through an existing reference) to *happen before* deleting | ||
> > the object in a different thread. This is achieved by a "release" | ||
> > operation after dropping a reference (any access to the object | ||
> > through this reference must obviously happened before), and an | ||
> > "acquire" operation before deleting the object. | ||
> | ||
> In particular, while the contents of an Arc are usually immutable, it's | ||
> possible to have interior writes to something like a Mutex<T>. Since a Mutex | ||
> is not acquired when it is deleted, we can't rely on its synchronization logic | ||
> to make writes in thread A visible to a destructor running in thread B. | ||
> | ||
> Also note that the Acquire fence here could probably be replaced with an | ||
> Acquire load, which could improve performance in highly-contended situations. | ||
> See [2]. | ||
> | ||
> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html | ||
> [2]: https://github.com/rust-lang/rust/pull/41714 | ||
[3]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1440-L1467 | ||
|
||
To do this, we do the following: | ||
```rust,ignore | ||
atomic::fence(Ordering::Acquire); | ||
``` | ||
|
||
We'll need to import `std::sync::atomic` itself: | ||
```rust,ignore | ||
use std::sync::atomic; | ||
``` | ||
|
||
Finally, we can drop the data itself. We use `Box::from_raw` to drop the boxed | ||
`ArcInner<T>` and its data. This takes a `*mut T` and not a `NonNull<T>`, so we | ||
must convert using `NonNull::as_ptr`. | ||
|
||
```rust,ignore | ||
unsafe { Box::from_raw(self.ptr.as_ptr()); } | ||
``` | ||
|
||
This is safe as we know we have the last pointer to the `ArcInner` and that its | ||
pointer is valid. | ||
|
||
Now, let's wrap this all up inside the `Drop` implementation: | ||
```rust,ignore | ||
impl<T> Drop for Arc<T> { | ||
fn drop(&mut self) { | ||
let inner = unsafe { self.ptr.as_ref() }; | ||
if inner.rc.fetch_sub(1, Ordering::Release) != 1 { | ||
return; | ||
} | ||
// This fence is needed to prevent reordering of the use and deletion | ||
// of the data. | ||
atomic::fence(Ordering::Acquire); | ||
// This is safe as we know we have the last pointer to the `ArcInner` | ||
// and that its pointer is valid. | ||
unsafe { Box::from_raw(self.ptr.as_ptr()); } | ||
} | ||
} | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.