Rust Macros and Crate Exploration: auto_ops

2020-03-28 // 10 minutes

We're going to dive into the crate auto_ops, which is a fork and successor of impl_ops. But first, if you need a link to the documentation on Rust macros, please refer to this link. I've also found that this cheatseet on the different types of fragment specifiers (used in macros) to be a handy reference.

#[macro_export]
macro_rules! hello {
    () => {
        println!("Hello!");
    };
}

Let's start out with a dead simple macro. We're cheating a little here by using the println! macro within our macro, but you should have used this plenty of times before reading this blog post. As a reminder, #[macro_export] is used to make that macro available whenever the crate that holds it is brought into scope. Without this annotation, you cannot export this macro. macro_rules! is a reserved keyword in Rust, hello is the name of our macro, and you should also note that our macro currently takes no arguments. One last syntactic note, although the macross definiton does not include the exclamation point, you would still call the macro as hello!().

#[macro_export]
macro_rules! adder {
    ($e1:expr, $e2:expr) => {
    	let e1 : u32 = $e1;
    	let e2 : u32 = $e2;
    	let total : u32 =  e1 + e2;
        println!("{} + {} = {}", e1, e2, total);
    };
}

Let's start ramping up the complexity a bit. Our new macro, adder, now has two parameters - $e1 and $e2. In macro parlance, these are also called fragment specifiers. $e1 and $e2 are both of the expr type, which is unsurprisingly short for expression. Some of the other fragments are less easy to guess, so I recommend referring to the metavariables section of the documentation.

#[macro_export]
macro_rules! adder {
    ( $($e:expr),* ) => {
    	{
            let mut total : u32 = 0;
            $(
                total += $e;
            )*
            total
        }
    };
}

One last contrived example before we dive into the auto_ops code. Here, we've modified our adder macro to take on a variadic number of input parameters. Notice the * in the parameters? That tells the macro to expect 0 or more input parameters. You can also specify that you want one or more by using a + symbol instead of *. You should also notice that the * symbol is used in the body of the macro. This expand to the number of += operations needed. The third option is to use the ? symbol for an optional fragment with zero or one occurences.

With the absolute basics under our belt, let's shift our focus back to auto_ops and operator overloading. To briefly recap our problem space, we are going to define a custom struct - Vec3f - and we want to override operators to perform vector math rather than primitive math. This is fairly straightforward in most languages, but Rust (at least through the current 2018 version) complicates this matter by treating a borrowed Vec3f - or &Vec3f - as a separate type. Ergo, we have to account for four different operand types per overloaded operation.

For our example below, we're just going to consider vector addition where C = A + B such that C.x = A.x + B.x and so forth.

impl_op_ex!(+ |a: &Vec3f, b: &Vec3f| -> Vec3f { 
	Vec3f {
		x: a.x + b.x,
		y: a.y + b.y,
		z: a.z + b.z
	}
});
#[macro_export]
macro_rules! impl_op_ex {
    ($op:tt |$lhs_i:ident : &mut $lhs:path, $rhs_i:ident : &$rhs:path| $body:block) => (
        $crate::_parse_assignment_op!($op, $lhs, &$rhs, lhs, rhs, {|$lhs_i : &mut $lhs, $rhs_i : &$rhs| -> () {$body} (lhs, rhs);});
        $crate::_parse_assignment_op!($op, $lhs, $rhs, lhs, rhs, {|$lhs_i : &mut $lhs, $rhs_i : &$rhs| -> () {$body} (lhs, &rhs);});
    );
    ($op:tt |$lhs_i:ident : &mut $lhs:path, $rhs_i:ident : $rhs:path| $body:block) => (
        $crate::_parse_assignment_op!($op, $lhs, $rhs, lhs, rhs, {|$lhs_i : &mut $lhs, $rhs_i : $rhs| -> () {$body} (lhs, rhs);});
    );
    ($op:tt |$lhs_i:ident : &$lhs:path| -> $out:path $body:block) => (
        $crate::_parse_unary_op!($op, &$lhs, $out, lhs, {|$lhs_i : &$lhs| -> $out {$body} (lhs)});
        $crate::_parse_unary_op!($op, $lhs, $out, lhs, {|$lhs_i : &$lhs| -> $out {$body} (&lhs)});
    );
    ($op:tt |$lhs_i:ident : &$lhs:path, $rhs_i:ident : &$rhs:path| -> $out:path $body:block) => (
        $crate::impl_op!($op |$lhs_i : &$lhs, $rhs_i : &$rhs| -> $out $body);
        $crate::_parse_binary_op!($op, &$lhs, $rhs, $out, lhs, rhs, {|$lhs_i : &$lhs, $rhs_i : &$rhs| -> $out {$body} (lhs, &rhs)});
        $crate::_parse_binary_op!($op, $lhs, &$rhs, $out, lhs, rhs, {|$lhs_i : &$lhs, $rhs_i : &$rhs| -> $out {$body} (&lhs, rhs)});
        $crate::_parse_binary_op!($op, $lhs, $rhs, $out, lhs, rhs, {|$lhs_i : &$lhs, $rhs_i : &$rhs| -> $out {$body} (&lhs, &rhs)});
    );
    ($op:tt |$lhs_i:ident : &$lhs:path, $rhs_i:ident : $rhs:path| -> $out:path $body:block) => (
        $crate::impl_op!($op |$lhs_i : &$lhs, $rhs_i : $rhs| -> $out $body);
        $crate::_parse_binary_op!($op, $lhs, $rhs, $out, lhs, rhs, {|$lhs_i : &$lhs, $rhs_i : $rhs| -> $out {$body} (&lhs, rhs)});
    );
    ($op:tt |$lhs_i:ident : $lhs:path|  -> $out:path $body:block) => (
        $crate::_parse_unary_op!($op, $lhs, $out, lhs, {|$lhs_i : $lhs| -> $out {$body} (lhs)});
    );
    ($op:tt |$lhs_i:ident : $lhs:path, $rhs_i:ident : &$rhs:path| -> $out:path $body:block) => (
        $crate::impl_op!($op |$lhs_i : $lhs, $rhs_i : &$rhs| -> $out $body);
        $crate::_parse_binary_op!($op, $lhs, $rhs, $out, lhs, rhs, {|$lhs_i : $lhs, $rhs_i : &$rhs| -> $out {$body} (lhs, &rhs)});
    );
    ($op:tt |$lhs_i:ident : $lhs:path, $rhs_i:ident : $rhs:path| -> $out:path $body:block) => (
        $crate::impl_op!($op |$lhs_i : $lhs, $rhs_i : $rhs| -> $out $body);
    );
}

In the first code block, we have our call to impl_op_ex!. And in our second code block, we've posted the macro definiton from the crate's GitHub repository. The first thing that should stand out to you in the macro definition is that there are multiple implementations. This is quite handy as it allows us to define multiple slightly different use case all within one macro.

While multiple implementations is helpful, it also obfuscates which path our macro invocation takes. Though we can reasonably guess that our vector addition code is eventually going to call _parse_binary_op!, so that narrows the paths down to three.

Let's break it down piece by piece. Every implementation starts with the operator $op which is a tt, or token, fragment - this aligns with our "+" operator. The pipe symbols are not macro-specific, but they're used to demarcate the function signature from it's block.

If you look at our macro invocation, you'll notice that a : &Vec3f maps to $lhs_i:ident : &$lhs:path - with a similar mapping for the second parameter. Looking at the fragements, both ident and path are new to us. Ident is either an identifier - read: a variable name - or a keyword. Path refers to a TypePath. Put another way, it's the name of a struct. Path becomes involved because a type indentifier can be path::to::some::crate::IAmAnIdentifier. Just like path is used for the input parameters, it's also used for the return variable's type. The last piece of the puzzle is the block fragment type - which, as you likely guessed, is just the body of the function.

The first part of the body calls imp_op!, however that ends up calling more functions from impl_op_ex. Since we're not covering every function and the second portion reaches the same function, we'll just cover that. If you switch over to the binary.rs file, we can pick up on the second lines call to _parse_binary_op&#33.

#[macro_export]
macro_rules! _parse_binary_op {
    (+, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Add, add, $($t)+););
    (-, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Sub, sub, $($t)+););
    (*, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Mul, mul, $($t)+););
    (/, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Div, div, $($t)+););
    (%, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Rem, rem, $($t)+););
    (&, $($t:tt)+) => ($crate::_impl_binary_op_internal!(BitAnd, bitand, $($t)+););
    (|, $($t:tt)+) => ($crate::_impl_binary_op_internal!(BitOr, bitor, $($t)+););
    (^, $($t:tt)+) => ($crate::_impl_binary_op_internal!(BitXor, bitxor, $($t)+););
    (<<, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Shl, shl, $($t)+););
    (>>, $($t:tt)+) => ($crate::_impl_binary_op_internal!(Shr, shr, $($t)+););
}

Remember the + symbol that I mentioned before? And how it means the macro expects one or more? It's being used here as a shorthand declaration of "the rest of the variables". This would be especially useful where the implementations don't take the same number of arguments. However, every option in _impl_binary_op_internal! has the same number of arguments - so this is purely used for brevity. I normally prefer my code - especially my metaprogramming code - to be more explicit. However, I like that this communicates that this function serves as a passthrough to add the $ops_trait and $ops_fn (needed for operator overloading) to the parameter list.

#[macro_export]
macro_rules! _impl_binary_op_internal {
    ($ops_trait:ident, $ops_fn:ident, &$lhs:ty, &$rhs:ty, $out:ty, $lhs_i:ident, $rhs_i:ident, $body:block) => {
        $crate::_impl_binary_op_borrowed_borrowed!(
            $ops_trait, $ops_fn, $lhs, $rhs, $out, $lhs_i, $rhs_i, $body
        );
    };
    ($ops_trait:ident, $ops_fn:ident, &$lhs:ty, $rhs:ty, $out:ty, $lhs_i:ident, $rhs_i:ident, $body:block) => {
        $crate::_impl_binary_op_borrowed_owned!(
            $ops_trait, $ops_fn, $lhs, $rhs, $out, $lhs_i, $rhs_i, $body
        );
    };
    ($ops_trait:ident, $ops_fn:ident, $lhs:ty, &$rhs:ty, $out:ty, $lhs_i:ident, $rhs_i:ident, $body:block) => {
        $crate::_impl_binary_op_owned_borrowed!(
            $ops_trait, $ops_fn, $lhs, $rhs, $out, $lhs_i, $rhs_i, $body
        );
    };
    ($ops_trait:ident, $ops_fn:ident, $lhs:ty, $rhs:ty, $out:ty, $lhs_i:ident, $rhs_i:ident, $body:block) => {
        $crate::_impl_binary_op_owned_owned!(
            $ops_trait, $ops_fn, $lhs, $rhs, $out, $lhs_i, $rhs_i, $body
        );
    };
}

By now, the above code block should fairly easy to follow. We know that we have four cases to handle depending on whether or not the left hand operand and/or the right hand operand are borrowed or not. We don't need to dive into all four macro definitons as they're pretty similar. However, let's take a look at _impl_binary_op_owned_owned! next to a naive definition of operator overloading for comparison purposes.

#[macro_export]
macro_rules! _impl_binary_op_owned_borrowed {
    ($ops_trait:ident, $ops_fn:ident, $lhs:ty, $rhs:ty, $out:ty, $lhs_i:ident, $rhs_i:ident, $body:block) => {
        impl ::std::ops::$ops_trait<&$rhs> for $lhs {
            type Output = $out;

            fn $ops_fn(self, $rhs_i: &$rhs) -> Self::Output {
                let $lhs_i = self;
                $body
            }
        }
    };
}
	
impl ::std::ops::Add for Vec3f {
	type Output = Vec3f;

	fn add(self, other: Vec3f) -> Vec3f {
		Vec3f {
			x : self.x + other.x,
			y : self.y + other.y,
			z : self.z + other.z
		}
	}
}

Having the naive version should help you see why we needed the passthrough function to prepend $ops_trait (Add) and $ops_fn (add). The rest should be a straightforward read: the body gets passed through no matter what so you can define how your custom structs are combined, the input classes are defined for every combination of being borrowed or not, and the output variable type is also a simple pass through.

Whoo, that was a lot just to get in some operator overloading. I try to avoid a prescriptivist view on what a language should do when I don't have a solution in mind - especially since compilers aren't currently one of my strong suits. However, I do hope that the Rust compiler team finds a more elegant solution to this use case. Most systems programming languages I've encountered have you find compile times as the project grows. I'd be very curious to see how this macro expansion impacts larger Rust projects.

Hopefully you found this blog post as useful as I did while writing it. This crate is a clever bit of work, despite my aforementioned concerns about macros being the best holistic solution.