runif(100) |> round() |> mean()
## [1] 0.46
Comparing pipes: Base-R |>
vs {magrittr} %>%
{magrittr}
pipe.
Beginners are sometimes confused by the fact that
- some R users use the native Base R pipe
|>
and - others use the
{magrittr}
pipe%>%
.
So in today’s video, I want to compare the two and show you the strengths and weaknesses of each one. Let’s dive in.
Keyboard shortcut
Whatever pipe you use, you should definitely use the RStudio shortcut ctrl
+ shift
+ M
. This is much quicker than writing it out. By default, this will throw the {magrittr}
pipe. But you can change that in the settings.
Simple function chaining
The big advantage of the base-R pipe is that it can easily chain together a couple of functions whether any packages are loaded or not.
The same doesn’t work with the {magrittr}
pipe because I have to load the package first.
runif(100) %>% round() %>% mean()
## Error in runif(100) %>% round() %>% mean(): could not find function "%>%"
But if I do load something like the Tidyverse that contains {magrittr}
it works fine.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
runif(100) %>% round() %>% mean()
## [1] 0.52
Form strictness
The nice thing about the {magrittr}
pipe is that it isn’t as strict as the base-R pipe. For example, {magrittr}
allows you to forget function calls and just use the function name.
runif(100) %>% round # works
runif(100) |> round # Error function call with () is enforced
## Error: The pipe operator requires a function call as RHS (<text>:2:15)
Standard scenario
I don’t think the strictness is much of a disadvantage, though. In most cases (at least in my 90% of pipe use cases), you’ll likely use the pipe with something like mutate()
where you specify additional arguments anyway. In that scenario, both pipes work pretty much the same.
<- tibble(x = 1:3, y = 10:12)
dat_with_super_long_name |>
dat_with_super_long_name mutate(z = x + y)
## # A tibble: 3 × 3
## x y z
## <int> <int> <int>
## 1 1 10 11
## 2 2 11 13
## 3 3 12 15
%>%
dat_with_super_long_name mutate(z = x + y)
## # A tibble: 3 × 3
## x y z
## <int> <int> <int>
## 1 1 10 11
## 2 2 11 13
## 3 3 12 15
Using a placeholder
Fans of the original {magrittr}
pipe will tell you that it’s really cool to use the .
operator as a placeholder. Rightfully so, this is a neat feature.
%>% lm(y ~ x, data = .)
dat_with_super_long_name ##
## Call:
## lm(formula = y ~ x, data = .)
##
## Coefficients:
## (Intercept) x
## 9 1
Initially, the base-R pipe could not pull of such a stunt. However, since R 4.3.0. it has a placeholder too.
|> lm(y ~ x, data = _)
dat_with_super_long_name ##
## Call:
## lm(formula = y ~ x, data = dat_with_super_long_name)
##
## Coefficients:
## (Intercept) x
## 9 1
Using multiple placeholders
At this point, fans of the .
operator will shout “The dot operator is even cooler. It can be used multiple times!” And they are absolutely right about that. That’s pretty dope.
And for the unenlightened: By wrapping a subsequent function call into {}
, you can use the .
operator as many times as you’d like over there. In each instance, .
will then represent the data that went into {}
.
Sadly, the base pipe cannot do such a thing. Its strictness forbids {}
.
## Error: { not allowed
|> {plot(_$x, _$y, cex = 3, lwd = 5)}
dat_with_super_long_name ## Error: function '{' not supported in RHS call of a pipe (<text>:2:29)
A workaround for that would be to
- define an anonymous function with
\(.)
, - wrap that into parentheses, and then
- call that function.
Shoutout to Isabella Velásquez’s blog post that taught me about this little trick.
Conditional flows
Now, sometimes people like to use if-statements in their pipe-chains. By combining the {magrittr}
pipe with curly brackets and the .
operator, this could look like this.
<- TRUE
duplicate_flag <- tibble(x = 1:3, z = 21:23)
duplicates %>%
dat_with_super_long_name
{if (duplicate_flag) {
|> left_join(duplicates, by = 'x')
. else {
}
.
}%>%
} summarize(across(everything(), mean))
## # A tibble: 1 × 3
## x y z
## <dbl> <dbl> <dbl>
## 1 2 11 22
<- FALSE
duplicate_flag <- tibble(x = 1:3, z = 21:23)
duplicates %>%
dat_with_super_long_name
{if (duplicate_flag) {
|> left_join(duplicates, by = 'x')
. else {
}
.
}%>%
} summarize(across(everything(), mean))
## # A tibble: 1 × 2
## x y
## <dbl> <dbl>
## 1 2 11
In the past, I have written code like this too. Nowadays, though, I try to break out such things into their own functions. Preferably, one with a descriptive function name.
That way,
- the base-R pipe can handle this much better,
- my original chain hopefully stays short, and
- when I outsource the helper functions to a separate script, the function name hopefully still tells me what it does.
<- function(dat, duplicate_flag) {
left_join_if_duplicate if (duplicate_flag) {
|> left_join(duplicates, by = 'x')
dat else {
}
dat
}
}<- TRUE
duplicate_flag |>
dat_with_super_long_name left_join_if_duplicate(duplicate_flag) |>
summarize(across(everything(), mean))
## # A tibble: 1 × 3
## x y z
## <dbl> <dbl> <dbl>
## 1 2 11 22
<- function(dat, duplicate_flag) {
left_join_if_duplicate if (duplicate_flag) {
|> left_join(duplicates, by = 'x')
dat else {
}
dat
}
}<- FALSE
duplicate_flag |>
dat_with_super_long_name left_join_if_duplicate(duplicate_flag) |>
summarize(across(everything(), mean))
## # A tibble: 1 × 2
## x y
## <dbl> <dbl>
## 1 2 11