I’ve been learning Rust on and off for quite a some time and every time I pick it up again I learn something new. A few days ago when I came back to it again, I was looking for a educational project to do and I decided to implement the Base64 algorithm in Rust. In this post, I will explain how to encode data to base64 in Rust. Decoding from base64 will be covered in a separate blog post, but it’s worth practicing by implementing it yourself. I’ll share the resources I used at the end of this post.

Rust Book

What is Base64?

## Theory#

Now, let’s dive into the theoretical aspects of how base64 works. By the end of this section, we’ll be able to manually encode data to base64 using pen and paper.

The steps to encode a set of characters to base64 are as follows:

1. Split the input, typically a sequence of bytes, into groups of three bytes.
2. Each group of three bytes corresponds to a total of 24 bits.
3. Split the 24 bits into four 6-bit groups.
4. Map each 6-bit group to its corresponding base64 character from the character set: A-Z, a-z, 0-9, +, and /.
5. If the number of bytes is not divisible by three, add the padding character ‘=’.

Let’s illustrate this process by manually converting the text ‘Rust’ into base64.

1. Split the input into groups of three bytes: 1. In the next step, we combine the binary values and split them into four 6-bit groups: 1. Now, let’s map each group to its corresponding character using the standard base64 character mapping based on RFC4648 (image from Wikipedia): 1. The base64 output for ‘Rus’ is as follows: 1. Next, we continue the same steps for the remaining text. In our case, the only character left is ’t’, which would be encoded as follows: Note that if the last group doesn’t have enough bits, we fill it with 0s until it reaches 6 bits in length.

So far, our output is `UnVzdA`, but because it’s not divisible by 3, we need to add the padding character `=` until it becomes divisible by three.

To calculate how many paddings we need, use the following formula:

``````r = len(input) % 3
``````
• if `r` is equal to 1: we need to add two paddings
• if `r` is equal to 2: we need to add one padding
• if `r` is equal to 0: no paddings needed

Since the `4 % 3 = 1` we need to add two paddings at the end and the final output would be `UnVzdA==`.

Now that we understand the manual conversion process, let’s implement these steps in Rust.

## Implementation:#

We’ll start by creating a function called `encode` in a file named `base64.rs` that takes a series of bytes as input and returns the encoded output as a string:

``````// file base64.rs

fn encode(input: &[u8]) -> String {
const BASE_CHARS: &[u8] = b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

let mut output = Vec::new();
let input_len = input.len();

for i in (0..input_len).step_by(3) {
let a = input.get(i).unwrap();
let b = input.get(i + 1).unwrap_or(&0);
let c = input.get(i + 2).unwrap_or(&0);

let enc1 = (a >> 2) as usize;
let enc2 = (((a & 0x3) << 4) | (b >> 4)) as usize;
let enc3 = (((b & 0xf) << 2) | (c >> 6)) as usize;
let enc4 = (c & 0x3f) as usize;

output.push(BASE_CHARS[enc1]);
output.push(BASE_CHARS[enc2]);

output.push(BASE_CHARS[enc3]);
output.push(BASE_CHARS[enc4]);
}

let output_len = output.len();
let padding_len = match input_len % 3 {
_ => 0, // No paddings needed
};

output[output_len - 1 - i] = b'=';
}

String::from_utf8(output).unwrap()
}
``````
• In the `for` loop we retrieve the first three byte of the `input`. Note that the first index is always present but the second and thrid index might be missing. That’s why we used `unwrap_or(&0)`.

For each group, we perform the necessary bitwise operations to obtain the corresponding base64 characters.

• In the line `let enc1 = (a >> 2) as usize;`, we use the right shift operator (») to remove the last two bits (MSB) from the variable `a`. The binary representation of character `R` is `01010010` and by shifting it two times to the right the result is: `010100`

• in the next line `let enc2 = (((a & 0x3) << 4) | (b >> 4)) as usize;`, first we extract the last two bits of `a` by `&`ing it with `0x3` (11 in binary), then shift it four times to left and combine the value with first four bits of `b`. The `enc2` has the value of `100111`

• Then we have `let enc3 = (((b & 0xf) << 2) | (c >> 6)) as usize;` which extracts the last four bits of `b`, shift it two times to right and combine the result with first two bits of `c` which gives us the values of `010101`

• and finally, `let enc4 = (c & 0x3f) as usize;` which extracts the remaining bytes from `c` by &ing it with `0x3f` (1111 in binary). The `enc4` is equal to `110011` in binary.

• At the end of the `for` loop we obtain the corresponding base64 characters from `BASE_CHARS` and push it to output vector.

We are not done yet, we need to check if the padding is necessary or not:

``````let output_len = output.len();
let padding_len = match input_len % 3 {
_ => 0, // No paddings needed
};

output[output_len - 1 - i] = b'=';
}
``````

This section of the code determines the number of padding characters required and adds them to the end of the output.

## Testing#

Let’s write some test to test our algorithm.

``````#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_encode() {
let encoded = encode(b"Rust");
assert_eq!("UnVzdA==", encoded);
}

#[test]
let encoded = encode(b"Rust");
assert!(encoded.ends_with("=="));
}

}
``````

and the result:

``````
running 2 tests
In this post, we achieved two goals simultaneously: understanding the base64 encoding algorithm and gained knowledge about Rust. For me, the `step_by` function, bitwise operators, and `unwrap_or` method were particularly new and useful. You can use these resources, or any others you prefer, to learn more about base64 and Rust. It’s a great exercise to try implementing the base64 decoding algorithm.