Refactoring in Rust: Abstraction with the Newtype Pattern

February 21, 2023

The following piece of code takes a PathBuf and extracts the file name, eventually converting it to an owned String.

let file_name = function_path // => &PathBuf
    .file_name() // => Option<&OsStr>
    .and_then(|os_str| os_str.to_str()) // => Option<&str>
    .unwrap() // => &str
    .to_string(); // => String

There’s a lot going on. We need to convert over several types, from &PathBuf over &OsStr to &str to eventually String. It’s also an operation that might not contain a value, as paths might be directories, not necessarily files. Calling unwrap puts a lot of trust into this piece of code; things might go wrong!

That’s a big setup to get to the desired outcome: A String.

The same piece of code exists in a few other places of your application, which can be problematic:

What if we need to do more changes, e.g. stripping away file name endings? We need to be aware of all occurrences.
What if unwrap() works in certain occasions but suddenly doesn’t in others? Nothing tells us that this piece of code won’t panic.
What if have multiple file names we want to extract? How do we name our variables and bindings?
What if we don’t want to have String as output, but something different?
What if ownership changes and the file name should live as long as function_path?
What if the code needs to diverge in one scenario but not the other? How do you then keep track of your changes?

So many possible things that could change, and the fact that we use this piece of code multiple times in our code bases increases the chance of friction.

This calls for an abstraction!

Let’s put aside the process of getting the desired result and think about the purpose of the owned String. We want to have the name of a file. There are a lot of semantics attached to this concept that goes beyond what a simple String has to offer. And even if we don’t actually offer access to those semantics, we can still offer an abstraction in form of a type that tells us exactly what we’re dealing with.

We introduce the FileName struct, a tuple struct containing an owned String.

struct FileName(String);

FileName has no constructor, nor does it have any direct impl blocks. Its sole purpose is to carry String for us. However, we can create a FileName out of an &PathBuf, using the From trait.

impl From<&PathBuf> for FileName {
    fn from(function_path: &PathBuf) -> Self {
        let path = function_path
            .file_name()
            .and_then(|os_str| os_str.to_str())
            .unwrap()
            .to_string();
        Self(path)
    }
}

This implementation contains the actual conversion from &PathBuf to a String, however, those details are hidden from us. We don’t care about how we get to a FileName, the only thing important for us is that we have a FileName in the end.

I also want to get rid of unwrap(). This is too brittle and can cause panic if somebody uses our new type in the wrong way. We have a couple of options.

Maybe our program doesn’t actually care about existing file names but instead can work with defaults. We can describe this by implementing the Default trait for FileName, and defaulting to it if &PathBuf doesn’t contain a file name for us.

impl Default for FileName {
    fn default() -> Self {
        Self("script.py".to_string())
    }
}
impl From<&PathBuf> for FileName {
    fn from(function_path: &PathBuf) -> Self {
        let path = function_path
            .file_name()
            .and_then(|os_str| os_str.to_str());
        match path {
            Some(path) => Self(path.to_string()),
            None => Self::default()
        }
    }
}

If a real file on disk is absolutely necessary, we should use TryFrom instead of From and prepare for errors.

// An error if PathBuf has no file name for us
#[derive(Debug)]
struct FileNameError;

impl std::error::Error for FileNameError {}

impl std::fmt::Display for FileNameError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Error: No file name found")
    }
}

// Cautious way to parse file names
impl TryFrom<&PathBuf> for FileName {
    type Error = FileNameError;

    fn try_from(function_path: &PathBuf) -> Result<Self, Self::Error> {
         let path = function_path
            .file_name()
            .and_then(|os_str| os_str.to_str())
            .ok_or(FileNameError {})?
            .to_string();
        Ok(Self(path))
    }
}

In both cases, the API speaks volumes to us. In the first case, we know that a FileName can always be created from a &PathBuf no matter what, in the other case we see from the interface that stuff might break. It’s not only an Option, but an actual Result with an Error, that points at us saying “You better deal with this, this is important!”, and “You wouldn’t unwrap a Result, would you?”. The nice thing is that we defer the decision of what to do with the Result to the point where we are actually converting. This can mean different things in different scenarios.

// Scenario 1: Defaults are OK for us
FileName::try_from(function_path).unwrap_or_default();

// Scenario 2: Bubble up the error and stop control flow
FileName::try_from(function_path)?;

Now that we converted &PathBuf to FileName, we also need some way to get to the String, which will be eventually used.

impl From<FileName> for String {
    fn from(name: FileName) -> Self {
        name.0
    }
}

The From trait indicates a conversion, which means that afterward FileName ceases to exist. That might be desired behavior, but since you’re eventually working with String, you could also think of implementing ToString to indicate that you are always getting a String representation, keeping the original struct intact.

impl ToString for FileName {
    fn to_string(&self) -> String {
        self.0.to_owned()
    }
}

Both implementations tell us a lot about their purpose. We can decide which one is the best for our scenario.

And that’s the final type. This newtype doesn’t contain any methods – yet! – but already tells us a lot about its purpose and its intent:

It can be created from &PathBuf
It owns its contents.
It can be converted to a String.
If you implemented From, you know that this conversion will never fail.
If you implemented TryFrom, you know that this is an error you need to handle.
It has a default value.

Also, we get a lot of flexibility:

Creating a new file name is as easy as calling FileName::from(...). But we stay flexible about what we want to do with the results.
If the contents or even the type of a FileName should change, we have one spot where we can work on that change.
We can name things better: let script_name = FileName::from(...) or let file_id = FileName::from(...).
If we want FileName to contain a reference and not an owned value, all the change that’s necessary to happen, happens within the bounds of our abstraction: Changing the wrapped type, defining lifetime parameters, the conversion to a string, etc.
We find all occurrences of FileName by looking for the type.

And when you create a new FileName, you use fewer lines of code. Which at least helps my attention a lot.

Those are the strengths of the newtype pattern: You create your name and slap a bunch of traits on it to explicitly and intentionally tell what the purpose of this struct is. The API speaks for itself, and it becomes much clearer what we can expect.

November 22, 2024 | Rust Unsafe for work
November 20, 2024 | Rust, Tokio Tokio: Macros
November 19, 2024 | Rust, Tokio Tokio: Channels

Refactoring in Rust: Abstraction with the Newtype Pattern

Related Articles