I’ve been learning Rust on and off for quite a some time and every time I pick it up again I learn something new. A few days ago when I came back to it again, I was looking for a educational project to do and I decided to implement the Base64 algorithm in Rust. In this post, I will explain how to encode data to base64 in Rust. Decoding from base64 will be covered in a separate blog post, but it’s worth practicing by implementing it yourself. I’ll share the resources I used at the end of this post.

Rust Book

What is Base64?

Theory

Now, let’s dive into the theoretical aspects of how base64 works. By the end of this section, we’ll be able to manually encode data to base64 using pen and paper.

The steps to encode a set of characters to base64 are as follows:

  1. Split the input, typically a sequence of bytes, into groups of three bytes.
  2. Each group of three bytes corresponds to a total of 24 bits.
  3. Split the 24 bits into four 6-bit groups.
  4. Map each 6-bit group to its corresponding base64 character from the character set: A-Z, a-z, 0-9, +, and /.
  5. If the number of bytes is not divisible by three, add the padding character ‘=’.

Let’s illustrate this process by manually converting the text ‘Rust’ into base64.

  1. Split the input into groups of three bytes:

step one

  1. In the next step, we combine the binary values and split them into four 6-bit groups:

step two

  1. Now, let’s map each group to its corresponding character using the standard base64 character mapping based on RFC4648 (image from Wikipedia):

step three

  1. The base64 output for ‘Rus’ is as follows:

step three

  1. Next, we continue the same steps for the remaining text. In our case, the only character left is ’t’, which would be encoded as follows:

step three

Note that if the last group doesn’t have enough bits, we fill it with 0s until it reaches 6 bits in length.

Adding padding:

So far, our output is UnVzdA, but because it’s not divisible by 3, we need to add the padding character = until it becomes divisible by three.

To calculate how many paddings we need, use the following formula:

r = len(input) % 3
  • if r is equal to 1: we need to add two paddings
  • if r is equal to 2: we need to add one padding
  • if r is equal to 0: no paddings needed

Since the 4 % 3 = 1 we need to add two paddings at the end and the final output would be UnVzdA==.

Now that we understand the manual conversion process, let’s implement these steps in Rust.

Implementation:

We’ll start by creating a function called encode in a file named base64.rs that takes a series of bytes as input and returns the encoded output as a string:

// file base64.rs

fn encode(input: &[u8]) -> String {
    const BASE_CHARS: &[u8] = b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

    let mut output = Vec::new();
    let input_len = input.len();

    for i in (0..input_len).step_by(3) {
        let a = input.get(i).unwrap();
        let b = input.get(i + 1).unwrap_or(&0);
        let c = input.get(i + 2).unwrap_or(&0);

        let enc1 = (a >> 2) as usize;
        let enc2 = (((a & 0x3) << 4) | (b >> 4)) as usize;
        let enc3 = (((b & 0xf) << 2) | (c >> 6)) as usize;
        let enc4 = (c & 0x3f) as usize;

        output.push(BASE_CHARS[enc1]);
        output.push(BASE_CHARS[enc2]);
       
        output.push(BASE_CHARS[enc3]);
        output.push(BASE_CHARS[enc4]);
    }

    let output_len = output.len();
    let padding_len = match input_len % 3 {
        1 => 2, // Add two padding
        2 => 1, // Add one paddings
        _ => 0, // No paddings needed
    };

    for i in 0..padding_len {
        output[output_len - 1 - i] = b'=';
    }

    String::from_utf8(output).unwrap()
}
  • In the for loop we retrieve the first three byte of the input. Note that the first index is always present but the second and thrid index might be missing. That’s why we used unwrap_or(&0).

For each group, we perform the necessary bitwise operations to obtain the corresponding base64 characters.

  • In the line let enc1 = (a >> 2) as usize;, we use the right shift operator (») to remove the last two bits (MSB) from the variable a. The binary representation of character R is 01010010 and by shifting it two times to the right the result is: 010100

  • in the next line let enc2 = (((a & 0x3) << 4) | (b >> 4)) as usize;, first we extract the last two bits of a by &ing it with 0x3 (11 in binary), then shift it four times to left and combine the value with first four bits of b. The enc2 has the value of 100111

  • Then we have let enc3 = (((b & 0xf) << 2) | (c >> 6)) as usize; which extracts the last four bits of b, shift it two times to right and combine the result with first two bits of c which gives us the values of 010101

  • and finally, let enc4 = (c & 0x3f) as usize; which extracts the remaining bytes from c by &ing it with 0x3f (1111 in binary). The enc4 is equal to 110011 in binary.

  • At the end of the for loop we obtain the corresponding base64 characters from BASE_CHARS and push it to output vector.

We are not done yet, we need to check if the padding is necessary or not:

let output_len = output.len();
    let padding_len = match input_len % 3 {
        1 => 2, // Add two padding
        2 => 1, // Add one paddings
        _ => 0, // No paddings needed
    };

    for i in 0..padding_len {
        output[output_len - 1 - i] = b'=';
    }

This section of the code determines the number of padding characters required and adds them to the end of the output.

Testing

Let’s write some test to test our algorithm.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode() {
        let encoded = encode(b"Rust");
        assert_eq!("UnVzdA==", encoded);
    }

    #[test]
    fn encode_has_padding() {
        let encoded = encode(b"Rust");
        assert!(encoded.ends_with("=="));
    }
    
}

and the result:


running 2 tests
test base64::tests::encode_has_padding ... ok
test base64::tests::test_encode ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Summary

In this post, we achieved two goals simultaneously: understanding the base64 encoding algorithm and gained knowledge about Rust. For me, the step_by function, bitwise operators, and unwrap_or method were particularly new and useful. You can use these resources, or any others you prefer, to learn more about base64 and Rust. It’s a great exercise to try implementing the base64 decoding algorithm.

resources