File Management With The `{fs}` Package

We go through common file system operations using the {fs} package

Author

Albert Rapp

Published

March 30, 2025

As data scientists we often have to deal with lots of tedious tasks. One such tedious task can be interacting with the file system on our computer or the remote machine we’re working with. Thankfully, the {fs} package has a bunch of convenvience function that make our life a whole lot easier.

Let’s check out a few examples. And if videos are more your thing, you can also watch the video version of this blog post on YouTube.

Assemble paths

Check out this data set.

library(tidyverse)
library(fs)
original_tib <- tibble(
  dir = c('some/path/blub', 'bla/here/', 'direct/'),
  file_names = c('file_a.csv', 'file_b.csv', 'file_c.txt')
)
original_tib
## # A tibble: 3 × 2
##   dir            file_names
##   <chr>          <chr>     
## 1 some/path/blub file_a.csv
## 2 bla/here/      file_b.csv
## 3 direct/        file_c.txt

Here, assembling a path in the form directory/file_name.ext can be tricky. Some directories have trailing / and some don’t. So, working with paste0() or glue::glue() would be challenging. Thankfully, the path() function from the {fs} package doesn’t care whether trailing / are there or not.

original_tib |>
  mutate(path = path(dir, file_names))
## # A tibble: 3 × 3
##   dir            file_names path                     
##   <chr>          <chr>      <fs::path>               
## 1 some/path/blub file_a.csv some/path/blub/file_a.csv
## 2 bla/here/      file_b.csv bla/here/file_b.csv      
## 3 direct/        file_c.txt direct/file_c.txt

Remove and set extensions

We can even modify file extensions really easily. That’s convenient when we want to take input from csv-files and then turn the data into images using the same file names.

original_tib |>
  mutate(
    path = path(dir, file_names),
    out_path = path_ext_set(path, 'png')
  )
## # A tibble: 3 × 4
##   dir            file_names path                      out_path                 
##   <chr>          <chr>      <fs::path>                <fs::path>               
## 1 some/path/blub file_a.csv some/path/blub/file_a.csv some/path/blub/file_a.png
## 2 bla/here/      file_b.csv bla/here/file_b.csv       bla/here/file_b.png      
## 3 direct/        file_c.txt direct/file_c.txt         direct/file_c.png

Get directory infos

You can get information on a directory as a tree in the console. Here, I’m using a directory called raw-input inside my working directory to demonstrate that.

dir_tree('raw-input')
## raw-input
## ├── a
## │   └── dat.csv
## ├── b
## │   └── dat.csv
## └── c
##     └── dat.csv

You can also get lots of information on these files.

dir_info('raw-input')
## # A tibble: 3 × 18
##   path        type    size permissions modification_time   user  group device_id
##   <fs::path>  <fct>  <fs:> <fs::perms> <dttm>              <chr> <chr>     <dbl>
## 1 raw-input/a direc…    4K rwxrwxr-x   2025-03-29 09:02:24 albe… albe…     66307
## 2 raw-input/b direc…    4K rwxrwxr-x   2025-03-29 09:04:33 albe… albe…     66307
## 3 raw-input/c direc…    4K rwxrwxr-x   2025-03-29 09:04:35 albe… albe…     66307
## # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
## #   block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
## #   access_time <dttm>, change_time <dttm>, birth_time <dttm>

But in a lot of cases, it will probably suffice to just get the file paths.

dir_ls('raw-input')
## raw-input/a raw-input/b raw-input/c

In this function, you’ll need to use recurse = TRUE, though, to go into nested structures.

dir_ls('raw-input', recurse = TRUE)
## raw-input/a         raw-input/a/dat.csv raw-input/b         raw-input/b/dat.csv 
## raw-input/c         raw-input/c/dat.csv

Iterate over file paths

Usually, you don’t want to stop after finding the desired paths. You usually want to iterate over them. For this, you can save the output of dir_ls() into a vector and iterate through it using the map() or walk() function. Here, the function I use inside of walk() will

load the data using the specified path,
create a ggplot from it, and
save the image.

The tricky thing here is that I do want to save the files in an output directory. It is supposed to have the same structure as the raw-input directory. That’s why I also need to create the necessary paths and directories for that inside the function.

csv_files <- dir_ls(
  'raw-input',
  recurse = TRUE,
  regexp = '\\.csv$'
)

csv_files |>
  walk(
    \(file_path) {
      plt <- read_csv(file_path) |>
        ggplot(aes(col_a, col_b)) +
        geom_point(size = 10, col = 'dodgerblue4')

      out_path <- file_path |>
        path_ext_set('.png') |>
        str_replace('^raw-input', 'output')

      dir_create(path_dir(out_path))
      ggsave(filename = out_path)
    }
  )
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image

Splendid. This should have worked and you can now see the output directory and the plots in the file tree.

dir_tree()
## .
## ├── index.qmd
## ├── index.rmarkdown
## ├── output
## │   ├── a
## │   │   └── dat.png
## │   ├── b
## │   │   └── dat.png
## │   └── c
## │       └── dat.png
## └── raw-input
##     ├── a
##     │   └── dat.csv
##     ├── b
##     │   └── dat.csv
##     └── c
##         └── dat.csv

File Management With The `{fs}` Package

Assemble paths

Remove and set extensions

Get directory infos

Iterate over file paths

Enjoyed this blog post?

3 Minutes Wednesdays

Data Cleaning With R Master Class

Insightful Data Visualizations for "Uncreative" R Users