File Management With The {fs} Package

We go through common file system operations using the {fs} package
Author

Albert Rapp

Published

March 30, 2025

As data scientists we often have to deal with lots of tedious tasks. One such tedious task can be interacting with the file system on our computer or the remote machine we’re working with. Thankfully, the {fs} package has a bunch of convenvience function that make our life a whole lot easier.

Let’s check out a few examples. And if videos are more your thing, you can also watch the video version of this blog post on YouTube.

Assemble paths

Check out this data set.

library(tidyverse)
library(fs)
original_tib <- tibble(
  dir = c('some/path/blub', 'bla/here/', 'direct/'),
  file_names = c('file_a.csv', 'file_b.csv', 'file_c.txt')
)
original_tib
## # A tibble: 3 × 2
##   dir            file_names
##   <chr>          <chr>     
## 1 some/path/blub file_a.csv
## 2 bla/here/      file_b.csv
## 3 direct/        file_c.txt

Here, assembling a path in the form directory/file_name.ext can be tricky. Some directories have trailing / and some don’t. So, working with paste0() or glue::glue() would be challenging. Thankfully, the path() function from the {fs} package doesn’t care whether trailing / are there or not.

original_tib |>
  mutate(path = path(dir, file_names))
## # A tibble: 3 × 3
##   dir            file_names path                     
##   <chr>          <chr>      <fs::path>               
## 1 some/path/blub file_a.csv some/path/blub/file_a.csv
## 2 bla/here/      file_b.csv bla/here/file_b.csv      
## 3 direct/        file_c.txt direct/file_c.txt

Remove and set extensions

We can even modify file extensions really easily. That’s convenient when we want to take input from csv-files and then turn the data into images using the same file names.

original_tib |>
  mutate(
    path = path(dir, file_names),
    out_path = path_ext_set(path, 'png')
  )
## # A tibble: 3 × 4
##   dir            file_names path                      out_path                 
##   <chr>          <chr>      <fs::path>                <fs::path>               
## 1 some/path/blub file_a.csv some/path/blub/file_a.csv some/path/blub/file_a.png
## 2 bla/here/      file_b.csv bla/here/file_b.csv       bla/here/file_b.png      
## 3 direct/        file_c.txt direct/file_c.txt         direct/file_c.png

Get directory infos

You can get information on a directory as a tree in the console. Here, I’m using a directory called raw-input inside my working directory to demonstrate that.

dir_tree('raw-input')
## raw-input
## ├── a
## │   └── dat.csv
## ├── b
## │   └── dat.csv
## └── c
##     └── dat.csv

You can also get lots of information on these files.

dir_info('raw-input')
## # A tibble: 3 × 18
##   path        type    size permissions modification_time   user  group device_id
##   <fs::path>  <fct>  <fs:> <fs::perms> <dttm>              <chr> <chr>     <dbl>
## 1 raw-input/a direc…    4K rwxrwxr-x   2025-03-29 09:02:24 albe… albe…     66307
## 2 raw-input/b direc…    4K rwxrwxr-x   2025-03-29 09:04:33 albe… albe…     66307
## 3 raw-input/c direc…    4K rwxrwxr-x   2025-03-29 09:04:35 albe… albe…     66307
## # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
## #   block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
## #   access_time <dttm>, change_time <dttm>, birth_time <dttm>

But in a lot of cases, it will probably suffice to just get the file paths.

dir_ls('raw-input')
## raw-input/a raw-input/b raw-input/c

In this function, you’ll need to use recurse = TRUE, though, to go into nested structures.

dir_ls('raw-input', recurse = TRUE)
## raw-input/a         raw-input/a/dat.csv raw-input/b         raw-input/b/dat.csv 
## raw-input/c         raw-input/c/dat.csv

Iterate over file paths

Usually, you don’t want to stop after finding the desired paths. You usually want to iterate over them. For this, you can save the output of dir_ls() into a vector and iterate through it using the map() or walk() function. Here, the function I use inside of walk() will

  • load the data using the specified path,
  • create a ggplot from it, and
  • save the image.

The tricky thing here is that I do want to save the files in an output directory. It is supposed to have the same structure as the raw-input directory. That’s why I also need to create the necessary paths and directories for that inside the function.

csv_files <- dir_ls(
  'raw-input',
  recurse = TRUE,
  regexp = '\\.csv$'
)

csv_files |>
  walk(
    \(file_path) {
      plt <- read_csv(file_path) |>
        ggplot(aes(col_a, col_b)) +
        geom_point(size = 10, col = 'dodgerblue4')

      out_path <- file_path |>
        path_ext_set('.png') |>
        str_replace('^raw-input', 'output')

      dir_create(path_dir(out_path))
      ggsave(filename = out_path)
    }
  )
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image

Splendid. This should have worked and you can now see the output directory and the plots in the file tree.

dir_tree()
## .
## ├── index.qmd
## ├── index.rmarkdown
## ├── output
## │   ├── a
## │   │   └── dat.png
## │   ├── b
## │   │   └── dat.png
## │   └── c
## │       └── dat.png
## └── raw-input
##     ├── a
##     │   └── dat.csv
##     ├── b
##     │   └── dat.csv
##     └── c
##         └── dat.csv

Enjoyed this blog post?

Here are three other ways I can help you:

3 Minutes Wednesdays

Every week, I share bite-sized R tips & tricks. Reading time less than 3 minutes. Delivered straight to your inbox. You can sign up for free weekly tips online.

Data Cleaning With R Master Class

This in-depth video course teaches you everything you need to know about becoming better & more efficient at cleaning up messy data. This includes Excel & JSON files, text data and working with times & dates. If you want to get better at data cleaning, check out the course page.

Insightful Data Visualizations for "Uncreative" R Users

This video course teaches you how to leverage {ggplot2} to make charts that communicate effectively without being a design expert. Course information can be found on the course page.