Programming with the tidyverse
Published:
Let’s write some function with dplyr!
I you want a read a quick introduction for how to create functions using dplyr I recommend reading this fantastic post Programming with dplyr.
We will be using data from Peru’s 2016 population prison survey.
Importing data
pob_pen_sel <- read_sav("https://github.com/DanielSotoHurtado/Censo_Nacional_Penitenciario/raw/main/_data/pob_pen_sel.sav", encoding = "UTF-8", user_na = FALSE)
Exploratory data Analysis —– So, imagine you want to do some exploratory data analysis. An easy way to avoid mistakes is to create a function that allows us to count each variable’s categories. Let’s call it eda_count_1
:
eda_count_1 <- function(data, var) {
data |>
count( var , sort = TRUE) |>
mutate(pct = round((n/sum(n)) * 100, digits = 2)) |>
drop_na( var )
}
This should be quite straigthforward:
However if we run the code we’ll get an error.
Curly Curly to the rescue
The main issue is that the tidyverse makes use of tidy evaluation - a form of non-standard evaluation. You can by-pass this behavior making use of the curly-curly
brackets ``:
eda_count_1 <- function(data, var) {
data |>
# make sure to embrace var with 'curly-curly'
count(, sort = TRUE) |>
mutate(pct = round((n/sum(n)) * 100, digits = 2)) |>
drop_na()
}
And voila!
Resources
dplyr
- Programming with dplyr vignette
- Bryan Shalloway Webpage
- Thomas Neitman Blog
- Stanford Data Lab Open content
- enquo() and bang bang!!