Programming with the tidyverse

1 minute read

Published:

Let’s write some function with dplyr!

I you want a read a quick introduction for how to create functions using dplyr I recommend reading this fantastic post Programming with dplyr.

We will be using data from Peru’s 2016 population prison survey.

Importing data

pob_pen_sel <- read_sav("https://github.com/DanielSotoHurtado/Censo_Nacional_Penitenciario/raw/main/_data/pob_pen_sel.sav", encoding = "UTF-8", user_na = FALSE)

Exploratory data Analysis —– So, imagine you want to do some exploratory data analysis. An easy way to avoid mistakes is to create a function that allows us to count each variable’s categories. Let’s call it eda_count_1:

eda_count_1 <- function(data, var) {
  data |> 
    count( var , sort = TRUE) |> 
    mutate(pct = round((n/sum(n)) * 100, digits = 2)) |> 
    drop_na( var )
}

This should be quite straigthforward:

However if we run the code we’ll get an error.

Curly Curly to the rescue

The main issue is that the tidyverse makes use of tidy evaluation - a form of non-standard evaluation. You can by-pass this behavior making use of the curly-curly brackets ``:

eda_count_1 <- function(data, var) {
  data |> 
    # make sure to embrace var with  'curly-curly'
    count(, sort = TRUE) |> 
    mutate(pct = round((n/sum(n)) * 100, digits = 2)) |> 
    drop_na()
}

And voila!

Resources

dplyr

ggplot