Learning to Fly: Let's simulate evolution in Rust! (pt 2)

This post is part of the learning-to-fly series:

  1. The Domain
  2. The Neural Network
  3. The Genetic Algorithm
  4. The User Interface

This is second part of the Learning to Fly series in which we're coding a simulation of evolution using neural network and genetic algorithm:

In this post we'll lay the foundations for our project and implement a basic feed-forward neural network that'll later become a brain; we'll also take a look at many intricacies and idioms you might find in common Rust code, including those inside tests.

Strap your straps, seatbelt your seatbelts, and get ready for some coding!

Prerequisites

We'll be using Rust 1.53.0 — to avoid unpleasant unpleasentaries, don't forget to either execute:

$ rustup default nightly-2021-03-25

... or create a file next to Cargo.toml called rust-toolchain that says:

rust-toolchain
nightly-2021-03-25

Workspaces

Our previous post ended with a cliffhanger:

Let's end this post with a cliffhanger:

$ mkdir shorelark

Instead of launching cargo new shorelark, as one usually does, we went with mkdir - it's because we're going to use a feature of Cargo's called workspaces.

Workspaces allow to split a single project into multiple, standalone crates (separate "subprojects"), which has many advantages:

Let's focus on the last point.

You might've noticed that when you do cargo build, it sometimes prints crate names inline during the Building phase:

Updating crates.io index
 Downloaded log v0.4.13
 Downloaded serde v1.0.119
 Downloaded thread_local v1.1.0
 Downloaded byteorder v1.4.2
  Compiling libc v0.2.82
  Compiling cfg-if v1.0.0
  Compiling arrayref v0.3.6
  Compiling matches v0.1.8
   Building [>           ] 6/153: byteorder, matches, cc, byte-tools

It means that it's building those crates in parallel.

Usually only some parts of the dependency tree can be built in parallel - if you have crate A that depends on crates B + C, the latter two can be built at the same time, but A has to wait until B and C have both finished compiling.

Furthermore, it's pointless trying to simultaneously compile more crates than the amount of CPU cores available, as it would only slow down the process, so Cargo has a tall order figuring it out.

On the other side of the spectrum, we've got rustc - it's the actual compiler that Cargo invokes for each crate it has to built, and rustc itself is at the moment mostly single-threaded.

It means that, practically, if your application depends on some external crates (e.g. from crates.io), those crates will be compiled in parallel, but your application itself (as a single-crate entity) will remain built on a single core.

To internalize the potential issues & gains, let's say we're working on a 100k LOC application - if compiling our app, excluding its dependencies, takes 5 minutes now, then by splitting it into 5 separate 20k LOC crates, we might reduce this time to as low as 1~2 minutes.

(due diligence: that's a good-day scenario, sometimes it might not be possible to split the code, please remember to always consult your doctor before refactoring.)

Granted - in the grand scheme of things, 3 minutes might not seem like much, but let's not forget that it applies to the development process, too.

If you've already had the pleasure of working on a gigantic project, you might know the pain of waiting for the compiler to finish: you've changed some error message, wanted to quickly see how it looks in the context and - oh god - do you hear this sound? it's not a military aircraft, it's your computer's fans preparing for some hot & long code munching; it's daunting.

When it comes to Rust, one of the ways to alleviate issues around compile times is then to simply split your application into separate crates.

Considering our toy project won't reach 10k LOC, why am I even talking about all those workspaces and crates?

I just think they’re neat.

Also, I consider them to be a good practice: workspaces allow to introduce clear-cut boundaries between project's modules, and - when applied within reason - not only make the code cleaner, but force it to be.

Having established that workspaces are not elder magic, let's circle back to our original question: why mkdir instead of cargo new?

Simply because Cargo doesn’t support cargo new --workspace yet.

Structure

Breath in, breath out, and let's enter the no man's land:

$ cd shorelark
$ ls -al
total 0
drwxr-xr-x  2 ppp users  40 01-16 19:14 .
drwxrwxrwt 22 ppp users 520 01-16 20:27 ..

Before we're able to write our first line of code, we have to decide on our project's structure.

There are many approaches for organizing workspace-based projects - I'm personally fond of this one, which separates application-like crates from those library-like ones:

project
├─ Cargo.toml
├─ app
│  ├─ Cargo.toml
│  └─ src
│      └─ main.rs
└─ libs
   ├─ subcrate-a
   │  ├─ Cargo.toml
   │  └─ src
   │     └─ lib.rs
   └─ subcrate-b
      ├─ Cargo.toml
      └─ src
         └─ lib.rs

Scaffolding such project is as easy as creating a Cargo.toml manifest with:

[workspace]
members = [
    "libs/*", # look má, wildcards!
]

... and then proceeding with cargo new --lib as usual:

$ mkdir libs
$ cd libs
$ cargo new neural-network --lib

Coding: propagate()

It's time to get down to business.

We'll start top-down, with a structure modelling the entire network - it will provide sort of an entry point to our crate; let's open libs/neural-network/src/lib.rs and write:

pub struct Network;

A neural network's most crucial operation is propagating numbers:

... so:

impl Network {
    pub fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        todo!()
    }
}

Similarly to an ocean filled with water droplets, a network is built from layers:

... so:

pub struct Network {
    layers: Vec<Layer>,
}

struct Layer;

Layers are built from neurons:

... giving us:

struct Layer {
    neurons: Vec<Neuron>,
}

Eventually, neurons contain biases and output weights:

... so:

struct Neuron {
    bias: f32,
    weights: Vec<f32>,
}

Let's see our crude design in its entriety:

pub struct Network {
    layers: Vec<Layer>,
}

struct Layer {
    neurons: Vec<Neuron>,
}

struct Neuron {
    bias: f32,
    weights: Vec<f32>,
}

impl Network {
    pub fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        todo!()
    }
}

Nice.

If you're lucky or perceptive, you might've noticed that only two of our objects are public: Network and Network::propagate() - it's because Layer and Neuron will remain an implementation detail, we won't expose them outside.

Thanks to this approach, we'll be able to introduce changes to our implementation without imposing breaking changes on the downstream crates (i.e. our library's users).

For instance, professional neural networks (for performance reasons) are usually implemented using matrices - if we ever decided to rewrite our network to use matrices, then it wouldn't be a breaking change: Network::propagate()-s signature would remain the same and since users can't access Layer and Neuron, they wouldn't be able to notice these two structs being gone.

Going next - since numbers have to be shoved through each layer, we'll need to have a propagate() in there, too:

impl Layer {
    fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        todo!()
    }
}

Having Layer::propagate(), we can actually go back and implement Network::propagate():

impl Network {
    fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        let mut inputs = inputs;

        for layer in &self.layers {
            inputs = layer.propagate(inputs);
        }

        inputs
    }
}

This is quite a satisfying, correct piece of code - but it's also non-idiomatic: we can write it better, more rustic; let's see how!

Ecstasy of Saint Ferris (upon seeing idiomatic code), colorized

First of all, this is called a rebinding (or shadowing):

let mut inputs = inputs;

... and it's unnecessary, because we might as well move this mut into function's parameters:

fn propagate(&self, mut inputs: Vec<f32>) -> Vec<f32> {
    for layer in &self.layers {
        inputs = layer.propagate(inputs);
    }

    inputs
}

There's one more refinement we can apply to our code - this very pattern is known as folding:

for layer in &self.layers {
    inputs = layer.propagate(inputs);
}

... and Rust's standard library provides a dedicated function for it:

fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
    self.layers
        .iter()
        .fold(inputs, |inputs, layer| layer.propagate(inputs))
}

(one could argue whether our final code is actually more readable or less; while I'm personally fond of the built-in combinators such as .fold(), if you find them obscure - that's fine, too! you do you, i ain't no judge)

Voilà - after all, thanks to the closure, we didn't even need that mut inputs; now you can brag about your code being all functional and Haskell-y.

Let's move on to neurons - a single neuron accepts many inputs and returns a single output, so:

struct Neuron {
    bias: f32,
    weights: Vec<f32>,
}

impl Neuron {
    fn propagate(&self, inputs: Vec<f32>) -> f32 {
        todo!()
    }
}

As before, we can backtrack to implement Layer::propagate():

struct Layer {
    neurons: Vec<Neuron>,
}

impl Layer {
    fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        let mut outputs = Vec::new();

        for neuron in &self.neurons {
            let output = neuron.propagate(inputs);
            outputs.push(output);
        }

        outputs
    }
}

If we try to compile it, we get our first borrow-checker failure:

error[E0382]: use of moved value: `inputs`
  --> src/lib.rs
   |
   |     fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
   |                         ------ move occurs because `inputs` has
   |                                type `Vec<f32>`, which does not
   |                                implement the `Copy` trait
...
   |             let output = neuron.propagate(inputs);
   |                                           ^^^^^^
   |                                value moved here, in previous
   |                                iteration of loop

Obviously, the compiler is right: after invoking neuron.propagate(inputs), we lose ownership of inputs, so we can't possibly use it inside loop's consecutive iterations.

Fortunately, the fix is rather easy and boils down to making Neuron::propagate() work on borrowed values:

impl Layer {
    fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        /* ... */

        for neuron in &self.neurons {
            let output = neuron.propagate(&inputs);
            //                            ^
        }

        /* ... */
    }
}

impl Neuron {
    fn propagate(&self, inputs: &[f32]) -> f32 {
        //                      ^^^^^^
        /* ... */
    }
}

To reiterate, the code we have at the moment is:

impl Layer {
    fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
        let mut outputs = Vec::new();

        for neuron in &self.neurons {
            let output = neuron.propagate(&inputs);
            outputs.push(output);
        }

        outputs
    }
}

The way we wrote our algorithm is correct, but inefficient - since we know how many output values we'll have, we can use this information to preallocate our vector:

fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
    let mut outputs = Vec::with_capacity(self.neurons.len());

    /* ... */
}

In order to understand preallocation, we've gotta go one abstraction layer below and take a look at how Vec works.

Since Vec doesn't know how many elements it will contain, it starts empty and then every other invocation of .push() causes it to grow larger and larger. Growing is a "relatively slow" operation, because it requires for the vector's entire memory to be moved into another place in RAM that contains enough space.

If we know - or we can estimate - a vector's size up-front, we might then create it using Vec::with_capacity(), which accepts a single argument determining how big the returned vector should be; e.g. Vec::with_capacity(10) means that you'll be able to use .push() at least 10 times without causing the returned vector to grow.

Using with_capacity() might be awkward at times and so Rust, fulfilling its promise of zero-cost abstractions, allows to invoke it somewhat transparently:

fn propagate(&self, inputs: Vec<f32>) -> Vec<f32> {
    self.neurons
        .iter()
        .map(|neuron| neuron.propagate(&inputs))
        .collect()
}

There's no explicit Vec::with_capacity() and yet both codes do essentially the same (including preallocation!) - how come?

Well, iterators contain a method called Iterator::size_hint() - it returns an iterator's length (or None, if the length's not known). When you do vec.iter(), it simply creates an iterator that knows its length and then .collect() uses that information to automatically create a vector that's large enough; no magic!

As you can see, even though our refactored code is technically more complex (we're using an anonymous function and iterators, after all), it's at least equally performant and way more readable; writing optimized Rust code frequently cuts down to using high-level structures instead of trying to outsmart the compiler with fancy, hand-written loops.

Currently we've got nowhere else to go, but to complete Neuron::propagate() - as before, let's start with a crude, superfund-ish version:

impl Neuron {
    fn propagate(&self, inputs: &[f32]) -> f32 {
        let mut output = 0.0;

        for i in 0..inputs.len() {
            output += inputs[i] * self.weights[i];
        }

        output += self.bias;

        if output > 0.0 {
            output
        } else {
            0.0
        }
    }
}

This snippet contains two unidiomatic constructs and one potential bug - let's start with the latter.

Since we're iterating through self.weights using length from inputs, there are three edge cases we have to consider:

  1. Given inputs.len() < self.weights.len(),

  2. Given inputs.len() == self.weights.len(),

  3. Given inputs.len() > self.weights.len().

Our code lays on the assumption that #2 is always true, but it's a silent assumption: we don't enforce it anywhere! If we mistakenly passed less or more inputs, we'd get either an invalid result or a crash.

There are at least two ways we could go around improving it:

  1. We could change Neuron::propagate() to return an error message:

    fn propagate(&self, inputs: &[f32]) -> Result<f32, String> {
        if inputs.len() != self.weights.len() {
            return Err(format!(
                "got {} inputs, but {} inputs were expected",
                inputs.len(),
                self.weights.len(),
            ));
        }
    
        /* ... */
    }
    

    ... or, using one of the crates I love the most - thiserror:

    pub type Result<T> = std::result::Result<T, Error>;
    
    #[derive(Debug, Error)]
    pub enum Error {
        #[error(
            "got {got} inputs, but {expected} inputs were expected"
        )]
        MismatchedInputSize {
            got: usize,
            expected: usize,
        },
    }
    
    /* ... */
    
    fn propagate(&self, inputs: &[f32]) -> Result<f32> {
        if inputs.len() != self.weights.len() {
            return Err(Error::MismatchedInputSize {
                got: inputs.len(),
                expected: self.weights.len(),
            });
        }
    
        /* ... */
    }
    
  2. We could use the assert_eq!() / panic!() macros:

    fn propagate(&self, inputs: &[f32]) -> f32 {
        assert_eq!(inputs.len(), self.weights.len());
    
        /* ... */
    }
    

In most cases, the first variant is better, because it allows for the caller to catch the error and handle it - in our case though, the error reporting is simply not worth it, because:

  1. If this assertion ever fails, it means that our implementation is most likely wrong and there's nothing users could do on their side to mitigate it.

  2. This is a toy project, we've already got like fifty other ideas hanging in the air tonight, no need to waste our time.

So:

fn propagate(&self, inputs: &[f32]) -> f32 {
    assert_eq!(inputs.len(), self.weights.len());

    let mut output = 0.0;

    for i in 0..inputs.len() {
        output += inputs[i] * self.weights[i];
    }

    output += self.bias;

    if output > 0.0 {
        output
    } else {
        0.0
    }
}

As for the idioms - this one:

if output > 0.0 {
    output
} else {
    0.0
}

... is f32::max() in disguise:

output.max(0.0)

While this one:

let mut output = 0.0;

for i in 0..inputs.len() {
    output += inputs[i] * self.weights[i];
}

... can be simplified first with .zip():

let mut output = 0.0;

for (&input, &weight) in inputs.iter().zip(&self.weights) {
    output += input * weight;
}

... and then using .map() + .sum():

let output = input
    .iter()
    .zip(&self.weights)
    .map(|(input, weight)| input * weight)
    .sum::<f32>();

Voilà:

impl Neuron {
    fn propagate(&self, inputs: &[f32]) -> f32 {
        let output = inputs
            .iter()
            .zip(&self.weights)
            .map(|(input, weight)| input * weight)
            .sum::<f32>();

        (self.bias + output).max(0.0)
    }
}

It's unquestionably beautiful - but does it work? Can it recognize a cat? Can we use it to predict future Dogecoin prices?

Coding: new()

Up to this point we were focused so much on the algorithm that we gave little to no thought to contructors - but how could we ever play with a network we cannot create?

Our first approach for creating a constructor could be a plain, no-op function of:

pub struct Network {
    layers: Vec<Layer>,
}

impl Network {
    pub fn new(layers: Vec<Layer>) -> Self {
        Self { layers }
    }
}

... which won't do in this case, because - as we've already established - we'd like to keep Layer and Neuron outside the public interface.

If you recall our previous post, you might remember that we were talking a lot about random numbers - so if there's one thing I'm certain, it's that we'll need something in the lines of:

impl Network {
    pub fn random() -> Self {
        todo!()
    }
}

In order to randomize a network, we'll need to know the number of its layers and number of neurons per each layer - they all can be described with a single vector:

pub fn random(neurons_per_layer: Vec<usize>) -> Self {
    todo!()
}

... or, in a bit more elegant way:

pub struct LayerTopology {
    pub neurons: usize,
}

/* ... */

pub fn random(layers: Vec<LayerTopology>) -> Self {
    todo!()
}

// By the way, notice how creating a separate type allowed us to
// untangle argument's name to be just `layers`.
//
// Initially we went with `neurons_per_layer`, because `Vec<usize>`
// doesn't provide enough information to tell what this `usize`
// represents - using a separate type makes the intention explicit.

Now, if you look real close at a neural network's layer:

... perhaps you'll notice that it's actually defined by two numbers: its input and output size; does it mean our single-field LayerTopology is wrong? Au contraire!

What we've done is, as I like to call it, exploiting domain knowledge.

Inside a FFNN, all layers are connected consecutively, back-to-back:

... because layer A's output is layer B's input, if we went with:

pub struct LayerTopology {
    pub input_neurons: usize,
    pub output_neurons: usize,
}

... not only would we make our interface unwieldy, but - what's even more gruesome - we'd have to implement an additional validation ensuring that layer[0].output_neurons == layer[1].input_neurons, and so on, are met; pure nonsense.

Noting this simple fact that consecutive layers must have matching inputs & outputs allows us to simplify the code before it's been even written!

As for the implementation, a naïve, unidiomatic approach could be:

pub fn random(layers: Vec<LayerTopology>) -> Self {
    let mut built_layers = Vec::new();

    for i in 0..(layers.len() - 1) {
        let input_size = layers[i].neurons;
        let output_size = layers[i + 1].neurons;

        built_layers.push(Layer::random(
            input_size,
            output_size,
        ));
    }

    Self { layers: built_layers }
}

Now, let's rustify it! Care to guess what happens when you call Network::random(vec![])?

pub fn random(layers: Vec<LayerTopology>) -> Self {
    // Network with just one layer is technically doable, but doesn't
    // make much sense:
    assert!(layers.len() > 1);

    /* ... */
}

There, better.

As for the for loop - iterating by adjacent items is another pattern covered by the Rust's standard library, via function called .windows():

pub fn random(layers: Vec<LayerTopology>) -> Self {
    /* ... */

    for adjacent_layers in layers.windows(2) {
        let input_size = adjacent_layers[0].neurons;
        let output_size = adjacent_layers[1].neurons;

        built_layers.push(Layer::random(
            input_size,
            output_size,
        ));
    }

    /* ... */
}

In this case, switching to iterators is a no-brainer for me:

pub fn random(layers: Vec<LayerTopology>) -> Self {
    /* ... */

    let layers = layers
        .windows(2)
        .map(|layers| {
            Layer::random(layers[0].neurons, layers[1].neurons)
        })
        .collect();

    Self { layers }
}

And one final touch - when it doesn't make the code awkward, it's a good practice to accept borrowed values instead of owned:

pub fn random(layers: &[LayerTopology]) -> Self {
    /* ... */
}

More often than not, accepting borrowed values doesn't change much inside the function, but makes it more versatile - i.e. with a borrowed array one can now do:

let network = Network::random(&[
    LayerTopology { neurons: 8 },
    LayerTopology { neurons: 15 },
    LayerTopology { neurons: 2 },
]);

... and:

let layers = vec![
    LayerTopology { neurons: 8 },
    LayerTopology { neurons: 15 },
    LayerTopology { neurons: 2 },
];

let network_a = Network::random(&layers);
let network_b = Network::random(&layers);
//                              ^ no need to .clone()

What's next, what's next…​ checks notes…​ ah, Layer::random()!

impl Layer {
    pub fn random(
        input_size: usize,
        output_size: usize,
    ) -> Self {
        let mut neurons = Vec::new();

        for _ in 0..output_size {
            neurons.push(Neuron::random(input_size));
        }

        Self { neurons }
    }
}

Let's skip the pleasantries:

pub fn random(input_size: usize, output_size: usize) -> Self {
    let neurons = (0..output_size)
        .map(|_| Neuron::random(input_size))
        .collect();

    Self { neurons }
}

Finally, the last piece of our chain - Neuron::random():

impl Neuron {
    pub fn random(input_size: usize) -> Self {
        let bias = todo!();

        let weights = (0..input_size)
            .map(|_| todo!())
            .collect();

        Self { bias, weights }
    }
}

Contrary to C++'s or Python's standard libraries, the Rust's one doesn't provide any pseudo-random number generator (PRNG) - do you know what it means?

it's crate time!

Which crate should we choose? Well, let's see:

When it comes to PRNGs, rand is the de facto standard, extremely versatile crate that allows to generate not only pseudo-random numbers, but also other types such as strings.

In order to fetch rand, we have to add it to our libs/neural-network/Cargo.toml:

# ...

[dependencies]
rand = "0.8"

... and then:

pub fn random(input_size: usize) -> Self {
    let mut rng = rand::thread_rng();

    let bias = rng.gen_range(-1.0..=1.0);

    let weights = (0..input_size)
        .map(|_| rng.gen_range(-1.0..=1.0))
        .collect();

    Self { bias, weights }
}

Neat.

Certainly, rng.gen_range(-1.0..=1.0) predicts Dogecoin prices quite accurately - but is there a way we could ensure our entire network works as intended?

Testing

A pure function is a function whose given the same arguments, always returns the same value.

For instance, this is a pure function:

pub fn add(x: usize, y: usize) -> usize {
    x + y
}

... while this one is not:

pub fn read(path: impl AsRef<Path>) -> String {
    std::fs::read_to_string(path).unwrap()
}

(add(1, 2) will always return 3, while read("file.txt") will return different strings depending on what given file happens to contain at the moment.)

Pure functions are nice, because they can be easily tested in isolation:

// This test always succeeds
// (i.e this test is *deterministic*)
#[test]
fn test_add() {
    assert_eq!(add(1, 2), 3);
}

// This test might succeed or it might fail, hard to anticipate
// (i.e. this test is *indeterministic*)
#[test]
fn test_read() {
    assert_eq!(
        read("serials-to-watch.txt"),
        "killing eve",
    );
}

Unfortunately, the way we randomize numbers inside our Neuron::random() makes it impure, which can be easily proven with:

#[test]
fn random_is_pure() {
    let neuron_a = Neuron::random(4);
    let neuron_b = Neuron::random(4);

    // If `Neuron::random()` was pure, then both neurons would always
    // have to be the same:
    assert_eq!(neuron_a, neuron_b);
}

Testing unpure functions is hard, because there's not much we can assert reliably:

struct Neuron {
    bias: f32,
    weights: Vec<f32>,
}

impl Neuron {
    pub fn random(input_size: usize) -> Self {
        /* ... */
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    mod random {
        use super::*;

        #[test]
        fn test() {
            let neuron = Neuron::random(4);

            assert!(/* what? */);
        }
    }
}

We might try:

#[test]
fn test() {
    let neuron = Neuron::random(4);

    assert_eq!(neuron.weights.len(), 4);
}

... but that's a development counterpart of a false friend - though it's better than no test at all, it doesn't check much.

On the other hand, making Neuron::random() a pure-pure function seems…​ preposterous? What's the point of randomizing, if the outcome would always remain the same?

Well, the way I usually reconcile both worlds is by looking at the origin of impurity - in our case, it's rand::thread_rng():

pub fn random(input_size: usize) -> Self {
    let mut rng = rand::thread_rng();

    /* ... */
}

If instead of invoking thread_rng(), we accepted a parameter with the randomizer:

pub fn random(
    rng: &mut dyn rand::RngCore,
    input_size: usize,
) -> Self {
    /* ... */
}

... then we could use a fake, predictable PRNG in our tests, while users would simply pass an actual PRNG of their choosing.

Because the rand crate doesn't provide a predictable or seedable PRNG, we have to make use of another crate - I like rand_chacha (easy to remember, you've probably already memorized it):

# ...

[dev-dependencies]
rand_chacha = "0.3"

... which allows us to do:

use rand::SeedableRng;
use rand_chacha::ChaCha8Rng;

#[test]
fn test() {
    // Because we always use the same seed, our `rng` in here will
    // always return the same set of values
    let mut rng = ChaCha8Rng::from_seed(Default::default());
    let neuron = Neuron::random(&mut rng, 4);

    assert_eq!(neuron.bias, /* ... */);
    assert_eq!(neuron.weights, &[/* ... */]);
}

We don't know which numbers will be returned just yet, but finding out is pretty easy: we'll just start with zeros and then copy-paste numbers from test's output:

#[test]
fn test() {
    /* ... */

    assert_eq!(neuron.bias, 0.0);
    assert_eq!(neuron.weights, &[0.0, 0.0, 0.0, 0.0]);
}

First cargo test gives us:

thread '...' panicked at 'assertion failed: `(left == right)`
  left: `-0.6255188`,
 right: `0.0`

... so:

#[test]
fn test() {
    /* ... */

    assert_eq!(neuron.bias, -0.6255188);

    /* ... */
}

Another cargo test:

thread '...' panicked at 'assertion failed: `(left == right)`
  left: `[0.67383957, 0.8181262, 0.26284897, 0.5238807]`,
 right: `[0.0, 0.0, 0.0, 0.0]`', src/lib.rs:29:5

... and we end up with:

#[test]
fn test() {
    /* ... */

    assert_eq!(neuron.weights, &[
        0.67383957,
        0.8181262,
        0.26284897,
        0.5238807,
    ]);
}

Notice the numbers are different and that's alright - they are allowed to be different as long as each cargo test consistently works on the same set of numbers (and it does, because we used a PRNG with a constant seed).

Before moving on to another test, there's one touch we really should apply here - it stems from floating-point inaccuracies.

The type we're using, f32, models a 32-bit floating-point number that can represent values between ~1.2*10^-38 and ~3.4*10^38 - alas, it cannot represent all of those numbers, just some.

For instance, with f32 you cannot encode exactly 0.15 - it'll always be off by a bit:

fn main() {
    println!("{:.10}", 0.15f32);
    // prints: 0.1500000060
}

... or 0.45:

fn main() {
   println!("{:.10}", 0.45f32);
   // prints: 0.4499999881
}

Usually it doesn't matter, because floating-point numbers were never made to be exact (only fast) - but when it does come up, it hits one like a brick falling from the sky:

#[test]
fn test() {
    assert_eq!(0.45f32, 0.15 + 0.15 + 0.15);
}
thread 'test' panicked at 'assertion failed: `(left == right)`
  left: `0.45`,
 right: `0.45000002`'

To avoid reinventing the wheel, I'll just drop this link: What Every Programmer Should Know About Floating-Point Arithmetic - if you haven't read about floating-point numbers yet, I encourage you to give it a shot!

So, if we shouldn't compare numbers exactly, what can we do? Compare them approximately!

#[test]
fn test() {
    let actual: f32 = 0.1 + 0.2;
    let expected = 0.3;

    assert!((actual - expected).abs() < f32::EPSILON);
}

This is the standard way to compare floats across all programming languages that implement IEEE 754 (so, like, basically all the languages): instead of fishing for an exact result, you simply compare both numbers with some margin of error (also called tolerance).

Because comparing numbers this way is awkward, it's more cushy to either do it via a macro:

macro_rules! assert_almost_eq {
    ($left:expr, $right:expr) => {
        let left: f32 = $left;
        let right: f32 = $right;

        assert!((left - right).abs() < f32::EPSILON);
    }
}

#[test]
fn test() {
    assert_almost_eq!(0.45f32, 0.15 + 0.15 + 0.15);
}

... or - which is the best approach - using a crate such as approx:

#[test]
fn test() {
    approx::assert_relative_eq!(0.45f32, 0.15 + 0.15 + 0.15);
}

I personally like approx, so let's add it to our neural network's Cargo.toml:

# ...

[dev-dependencies]
approx = "0.4"

... and then adjust the tests:

#[test]
fn test() {
    let mut rng = ChaCha8Rng::from_seed(Default::default());
    let neuron = Neuron::random(&mut rng, 4);

    assert_relative_eq!(neuron.bias, -0.6255188);

    assert_relative_eq!(neuron.weights.as_slice(), [
        0.67383957,
        0.8181262,
        0.26284897,
        0.5238807,
    ].as_ref());
}

This covers half of neuron's functions - thankfully, knowing what we know now, writing a test for Neuron::propagate() should go easy:

#[cfg(test)]
mod tests {
    use super::*;

    mod random {
        use super::*;

        #[test]
        fn test() {
            /* ... */
        }
    }

    mod propagate {
        use super::*;

        #[test]
        fn test() {
            todo!()
        }
    }
}

How can we ensure propagate() works correctly? By computing the expected response manually:

#[test]
fn test() {
    let neuron = Neuron {
        bias: 0.5,
        weights: vec![-0.3, 0.8],
    };

    // Ensures `.max()` (our ReLU) works:
    approx::assert_relative_eq!(
        neuron.propagate(&[-10.0, -10.0]),
        0.0,
    );

    // `0.5` and `1.0` chosen by a fair dice roll:
    approx::assert_relative_eq!(
        neuron.propagate(&[0.5, 1.0]),
        (-0.3 * 0.5) + (0.8 * 1.0) + 0.5,
    );

    // We could've written `1.15` right away, but showing the entire
    // formula makes our intentions clearer
}

From this point, implementing tests for Layer and Network gets pretty straightforward and thus has been left as an exercise for the reader :-)

Closing thoughts

What have we created, exactly?

It might seem that what we've implemented has nothing to do with learning or simulating:

... but that's only because the neural network itself, while being a relatively complex piece of our codebase, doesn't do much on its own; thou shall not worry though, as in the end all the pieces will fit together.

In the meantime, feel free to checkout the entire source code at my GitHub repository.

What about rustfmt?

This might come off like an advertisement, but let's give it a shot anyway:

Code formatting is important - doesn't matter if your project's big or small, rustfmt is the way to go!

What about TDD?

Test-driven development is a programming methodology in which you first write tests and only then start to work on the implementation; I use it more often than not, but while writing this article, I've decided that starting with implementation will be more educational than the adventure TDD would take us on.

So if you're wondering whether TDD is worth pursuing in Rust - it totally is! Though, as always - not always :-)

Why is our design inflated?

When you gogoduck for python neural network from scratch, you'll see lots of articles that encapsulate FFNNs in a handful lines of Python code - compared to them, our design seems bean-to-the-square-inflated; why is that so?

It's because we could learn more this way - we could've coded our network in 1/10th of its current size by using nalgebra, we could've used one of the already-existing crates, but it's not the destination that matters, it's the journey.

What’s next?

To avoid closing our closing thoughts on some cheesy waltz, let's establish what's coming next.

At the moment we've got a working, bare-bones neural network - in the upcoming article, we'll cargo new genetic-algorithm --lib and we'll see what our current implementation is missing in order to make it compatible with genetic algorithms.

The last, fourth post, will be about WebAssembly-ing all our crates together to end up with our opus magnum: flying birds.

Until then!