Imitating specialization with OIBITs
Why does the code below compile?
use std::thread; struct Chinchilla(&'static str); fn main() { let chinchilla = Chinchilla("Flora"); thread::spawn(move || { println!("{}", chinchilla.0); }); }
where T: Send
One would be completely sane to ask
why shouldn't that code compile??
- and so in
order to understand why our initial question makes sense, let our
journey begin with a quick look-up on the definition of
std::thread::spawn
:
pub fn spawn<F, T>(f: F) -> JoinHandle<T> where F: FnOnce() -> T, F: Send + 'static, T: Send + 'static, { // ... }
Among all the noise (types, duh!), one bound might catch an astute
reader's attention: F: Send
.
Send
is a trait used to determine which types can travel across so-called
thread boundary - in other words: values of types that
implement the Send
trait can be created on thread A and
then safely moved into thread B and dropped there.
Most of the values (e.g. those of type String
or
Vec<u8>
) are perfectly safe to be moved willy-nilly
from one thread to another, but there exist a few that require extra
care - for instance
Rc
.
Rc
provides simple garbage-collector-like semantics: when
you invoke Rc::clone()
, it doesn't actually clone the
value contained inside it - instead, it keeps track of how many
Rc
-s are alive (with Rc::clone()
increasing
that counter and Rc::drop()
decreasing) and it's only
when that internal counter gets down to zero (i.e. when the last
Rc
gets out of the scope) that the contained value gets
dropped.
Rc
, contrary to Arc
, keeps track of those
remaining instances by using a non-atomic counter -
non-atomic meaning "unsafe to access from many threads at the
same time", and the way Rust protects us from shooting our feet is by
making Rc
not implement Send
:
use std::rc::Rc; use std::thread; fn main() { let value = Rc::new( "c-rustacean is a Rust programmer who likes C better" ); // ok: `Rc::clone()` happens on the same thread as `Rc::new()` let value2 = Rc::clone(&value); thread::spawn(move || { drop(value2); // err }); } // error[E0277]: `Rc<&str>` cannot be sent between threads safely // --> src/main.rs:12:5 // | // 12 | thread::spawn(move || { // | _____^^^^^^^^^^^^^_- // | | | // | | `Rc<&str>` cannot be sent between threads safely // 13 | | drop(value2); // err // 14 | | }); // | |_____- within this `[closure@src/main.rs:12:19: 14:6]` // | // = help: within `[closure@src/main.rs:12:19: 14:6]`, the // trait `Send` is not implemented for `Rc<&str>` // = note: required because it appears within the type // `[closure@src/main.rs:12:19: 14:6`
Though the message might be a bit cryptic (especially if you're not
used to working with multi-threading in Rust), the most important part
stands out, yelling:
Rc<&str> cannot be sent between threads safely
.
In order for our code to work, we would have to use
Send
-implementing Arc<&str>
- a type
that has been designed from ground-up to be multithreading-friendly by
leaning on atomic counters.
OIBITs
So far we've talked about the role Send
plays in
Rust's standard library, but actually the most captivating fact about
this trait is that, contrary to most of the other traits, this one's
implemented automatically by the compiler!
You don't have to take my word for it - check out this code :
use std::fmt::Display; struct Chinchilla(&'static str); fn assert_display(_: impl Display) { // } fn assert_send(_: impl Send) { // } fn main() { assert_display(Chinchilla("Fauna")); assert_send(Chinchilla("Flora")); } // error[E0277]: `Chinchilla` doesn't implement `Display` // --> src/main.rs:15:20 // | // 5 | fn assert_display(_: impl Display) { // | ------- required by this bound // | in `assert_display` // ... // 14 | assert_display(Chinchilla("Flora")); // | ^^^^^^^^^^^^^^^^^^^
Since we didn't provide impl Display for Chinchilla
, our
assert_display(…)
fails - that's reasonable.
But we didn't provide impl Send for Chinchilla
either, so
shouldn't assert_send(…)
fail, too?
As it turns out , some of our Rust's traits are opt-out instead of
opt-in - that is: some traits are implemented automatically for
all types unless you (or, in some cases, compiler) opt-out of
them via impl !Trait for Type
(yeah, that's an
exclamation mark).
Such traits are called OIBITs, standing for
opt-in built-in traits
-
eventually
though that name was found to be confusing and the whole feature was
renamed into auto traits
, so we'll go with the latter
from now on.
Auto traits
Regular traits are opt-in, meaning that they don't apply unless you
explicitly provide an impl Trait for Type
:
trait Foo { // } impl Foo for &str { // } fn test(_: impl Foo) { // } fn main() { test(123); // err: the trait bound ... is not satisfied test("hi!"); // ok }
Auto traits, on the other hand, are opt-out:
#![feature(auto_traits)] #![feature(negative_impls)] auto trait Foo { // } impl !Foo for &str { // } fn test(_: impl Foo) { // } fn main() { test(123); // ok test("hi!"); // err: the trait bound ... is not satisfied }
Since auto traits cannot contain methods or associated items:
auto trait Foo { type Type; // err fn function(&self); // err }
... they function as so-called marker traits.
While regular traits provide behavior (e.g. methods), marker traits determine properties of values of given types.
For instance Send
is a great example of a marker trait,
as it's used to determine whether a value of given type can be sent
into another thread, without providing any behavior on its own (i.e.
Send
exists purely as a compile-time type helper).
We can even see how Send
is defined by taking a look into
the standard library:
pub unsafe auto trait Send { // empty. }
... additionally, in std/alloc/rc.rs
we can find that:
impl<T: ?Sized> !Send for Rc<T> {}
See, no magic!
To finish this section on auto traits, let's just walk through the
most important rule related to this mechanism: for a type to implement
an auto trait, none of its fields must be of type that has been
impl !
-d, i.e.:
#![feature(auto_traits)] #![feature(negative_impls)] auto trait Arbitrary { // } impl !Arbitrary for &str { // } // implements `Arbitrary` struct Yass; // implements `Arbitrary` struct Foo { value: usize, } // doesn't implement `Arbitrary` struct Bar { value: &'static str, } // doesn't implement `Arbitrary` because `value_2` of type `Bar` // doesn't implement `Arbitrary` too struct Zar { value_1: Foo, value_2: Bar, }
(as always, you can find more information in the RFC .)
Specialization
Let's stash all that boring-auto-trait-knowledge somewhere else for now and imagine we're in a Silicon Valley, starting a brand-new start-up - surely the first thing we've gotta do is to invent a brand-new file format: we're too cool for XML and JSON is already the story of the past (No Code, anyone?).
And so we open emacs
, writing lines that will become the
very first three of our monolithic microservice's:
trait Serialize { fn serialize_in_place(&self, buffer: &mut String); }
Seizing the day, let's add a blanket impl for
serialize()
, to make our lives easier during testing:
trait Serialize { fn serialize_in_place(&self, buffer: &mut String); fn serialize(&self) -> String { let mut buffer = String::new(); self.serialize_in_place(&mut buffer); buffer } }
Our investors say that we're going to be crunching lots of booleans, so why don't we start with them:
impl Serialize for bool { fn serialize_in_place(&self, buffer: &mut String) { if *self { buffer.push_str("b(true)"); } else { buffer.push_str("b(false)"); } } } #[test] fn test_bool() { assert_eq!("b(true)", true.serialize()); assert_eq!("b(false)", false.serialize()); }
They were saying something about strings attached
too,
so:
impl Serialize for &str { fn serialize_in_place(&self, buffer: &mut String) { buffer.push_str("s("); buffer.push_str(self); buffer.push_str(")"); } } #[test] fn test_str() { assert_eq!("s(hummus)", "hummus".serialize()); }
Obviously, a single string or a boolean is of no use - we're
professional programmers, so a plethora of
Vec<T>
is more than certain to appear:
impl<T> Serialize for Vec<T> where T: Serialize { fn serialize_in_place(&self, buffer: &mut String) { buffer.push_str("v("); for (item_idx, item) in self.iter().enumerate() { if item_idx > 0 { buffer.push_str(", "); } item.serialize_in_place(buffer); } buffer.push_str(")"); } } #[test] fn test_vec() { assert_eq!( "v(b(true), b(false))", vec![true, false].serialize(), ); assert_eq!( "v(s(foo), s(bar))", vec!["foo", "bar"].serialize(), ); }
Great - our code, though rudimentary, is already able to serialize an
infinite number of types: bool
, &str
,
Vec<bool>
, Vec<&str>
,
Vec<Vec<…>>
and so on.
While venture cash's flowin', we might optimize our format by adding a
dedicated (specialized) impl only for
Vec<bool>
, so that it's serialized in a more
compact way than regular Vec<T>
.
That is: instead of storing
vec![true, true, false, false]
as
v(b(true), b(true), b(false), b(false))
, we could use a
bitset instead: vb(12)
(12_dec
=
1100_bin
).
So let's do just that, simply by adding another impl for
Vec<bool>
:
impl Serialize for Vec<bool> { fn serialize_in_place(&self, buffer: &mut String) { todo!() } } // error[E0119]: conflicting implementations of trait `Serialize` // for type `Vec<bool>`: // --> src/lib.rs:45:1 // | // 29 | impl<T> Serialize for Vec<T> where T: Serialize { // | -------------------------------- first implementation here // ... // 45 | impl Serialize for Vec<bool> { // | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation // for `Vec<bool>`
... and wait, what's that error message?
The compiler says we've got conflicting implementations:
Serialize
is already implemented for
Vec<T>
and our Vec<bool>
overlaps that implementation, yielding the compiler unable to
tell which code is the one we actually want to invoke.
Nightly Rust offers a dedicated solution to this problem: a feature called specialization .
Specialization allows for methods and associated items to be marked as
default
, making it possible for downstream impls to
override them; in our case, we'd have to mark
Vec<T>
-'s serialize_in_place()
as
default
, like so:
impl<T> Serialize for Vec<T> where T: Serialize { default fn serialize_in_place(&self, buffer: &mut String) { // this is the default implementation for all `Vec`-s } } impl Serialize for Vec<bool> { fn serialize_in_place(&self, buffer: &mut String) { // this is a dedicated implementation only for `Vec<bool>` } }
Specialization is the state of the art solution for this kind of issues, but it's actually not the only one - as the title of this article says: it's, to some degree, possible to imitate specialization with auto traits.
Specialization with auto traits
Since the issue with our current implementation:
impl<T> Serialize for Vec<T> where T: Serialize { /* ... */ }
... is that it overlaps with Vec<bool>
(as
bool: Serialize
is met), what we want to achieve is more
or less:
impl<T> Serialize for Vec<T> where T: Serialize, T != bool { /* ... */ }
Though Rust doesn't support the !=
operator in this
position, a similar outcome can be achieved via auto traits; for
starters, let's create one:
auto trait BlanketVecImpl { // }
... and un-implement it for bool
:
impl !BlanketVecImpl for bool { // }
We can then adjust our previous impl for Vec<T>
to
say:
impl<T> Serialize for Vec<T> where T: Serialize + BlanketVecImpl { /* ... */ }
Voilà - this impl provides a Serialize
for all
Vec<T>
except Vec<bool>
,
which we now can provide manually:
impl Serialize for Vec<bool> { /* ... */ }
The Code
The entire solution leans on two nightly features:
auto_traits
& negative_impls
, and while I
wouldn't recommend to use it in production code, it serves quite an
educational purpose and was a fun piece of code to write:
#![feature(auto_traits)] #![feature(negative_impls)] trait Serialize { fn serialize_in_place(&self, buffer: &mut String); fn serialize(&self) -> String { let mut buffer = String::new(); self.serialize_in_place(&mut buffer); buffer } } mod bool { use super::*; impl Serialize for bool { fn serialize_in_place(&self, buffer: &mut String) { if *self { buffer.push_str("b(true)"); } else { buffer.push_str("b(false)"); } } } #[test] fn test_bool() { assert_eq!("b(true)", true.serialize()); assert_eq!("b(false)", false.serialize()); } } mod str { use super::*; impl Serialize for &str { fn serialize_in_place(&self, buffer: &mut String) { buffer.push_str("s("); buffer.push_str(self); buffer.push_str(")"); } } #[test] fn test_str() { assert_eq!("s(hummus)", "hummus".serialize()); } } mod vec { use super::*; use std::fmt::Write; pub auto trait BlanketVecImpl { // } impl !BlanketVecImpl for bool { // } impl BlanketVecImpl for Vec<bool> { // } impl<T> Serialize for Vec<T> where T: Serialize + BlanketVecImpl { fn serialize_in_place(&self, buffer: &mut String) { buffer.push_str("v("); for (item_idx, item) in self.iter().enumerate() { if item_idx > 0 { buffer.push_str(", "); } item.serialize_in_place(buffer); } buffer.push_str(")"); } } impl Serialize for Vec<bool> { fn serialize_in_place(&self, buffer: &mut String) { let mut bits = 0u8; if self.len() > 8 { unimplemented!("what is this, big-data?"); } for (item_idx, &item) in self.iter().rev().enumerate() { if item { bits |= 1 << item_idx; } } write!(buffer, "vb({})", bits).unwrap(); } } #[test] fn test_vec() { assert_eq!( "vb(12)", vec![true, true, false, false].serialize(), ); assert_eq!( "v(vb(2), vb(1))", vec![vec![true, false], vec![false, true]].serialize(), ); assert_eq!( "v(s(foo), s(bar))", vec!["foo", "bar"].serialize(), ); } }