Working with .sav files labels

less than 1 minute read

Published:

Ever wondered how to work with labelled data? Specifically, how to make use of variables metadata from sav files? With the sjlabelled package you can work with ‘value labels’ and add it to your variable’s factor levels. Just use the command sjlabelled::as_label:

Loading libraries

# For importing data an data cleaning
library(haven)
library(tidyverse)
# For label magic
library(sjPlot)
library(sjlabelled)

We will be using data from Peru’s 2016 population prison survey.

Importing data

pob_pen_raw <- read_sav("https://github.com/DanielSotoHurtado/Censo_Nacional_Penitenciario/raw/main/_data/pob_pen_sel.sav", encoding = "UTF-8")

Selecting variables

selected_var <- c('ID_CARATULA', 'DD', 'EST_PENIT', 'GENERO', 'EDAD', 'NACIONALIDAD', 'DELITO_GENERICO',
                    'DELITO_ESPECIFICO', 'P104_1',
                    'P119_1', 'P119_2', 'P119_3', 'P119_4', 'P119_5', 'P119_6',
                    'P109_1','P109_2', 'P126', 'P127', 'P128', 'P129',
                    'P133', 'P135', 'P136',
                    'P216'
                    )
pob_pen_sel <- pob_pen_raw |>
  select(all_of(selected_var))

Cleaning data


Let’s clean our data with dplyr across and sjlabelled as_label. Let’s apply it for all of our selected variables

sel_clean <- pob_pen_sel  |> 
  mutate(across(.cols = all_of(selected_var),
                .fns = ~ as_label(.x)),
         across(.cols = all_of(selected_var),
                .fns = ~ str_replace_all(.x, "\\?", "" )),
         across(.cols = where(is.character) & !c(EDAD),
                .fns = as.factor))

Resources