Describing data: part 4

Lecture 10

2024-06-03

Logistics

  • Project component 2: descriptive statistics

    • All materials are posted (instructions, example, github repos)
    • Due Thursday June 6 11:59pm
    • We have covered everything for parts 1-3. Part 4 is today and tomorrow.
    • Grades/feedback coming this afternoon!

Today

  • Answers to filtering
  • Pipes
  • Working on creating variables

Filtering exercise (ex-5-29): answers!

  • Find this on your computer (no need to clone it again)

Stringing commands together with pipes (|>)

  • Often we need to change data frames in more than one way
  • Example from last week: How does employment status vary by age category?
  • We need to create an age category variable (as we talked about Thursday)
  • But we probably want to filter too–why analyze kids?

Stringing commands together: approach 1

We can do this with the tools we have.

# first we make the new variable
acs12_newagevar <- mutate(acs12, agecat = case_when(age < 14 ~ "child",
                                                    age < 18 ~ "teen",
                                                    age < 67 ~ "adult",
                                                    TRUE ~ "retired"))

# then we filter to remove children
acs12_nokids <- filter(acs12_newagevar, agecat != "child")

# did it work?
table(acs12_nokids$agecat, useNA = "always")

  adult retired    teen    <NA> 
   1264     297      89       0 
  • But it’s kind of ugly… we don’t need to save that middle step

Stringing commands together: approach 2, with |>

  • The pipe operator, |>, lets us pass the result of one function directly into another one
  • The | symbol is the key below “delete” on your keyboard (not I, not l, not 1)
  • It replaces the first mutate/filter argument (the dataset)
  • “Take the thing that came before this and give it to the function that comes after this”
acs12_nokids <- acs12 |> # start with acs12
  mutate(agecat = case_when(age < 14 ~ "child", #then add a new variable to it
                            age < 18 ~ "teen",
                            age < 67 ~ "adult",
                            TRUE ~ "retired")) |> 
  filter(agecat != "child") # then filter out the kids

table(acs12_nokids$agecat, useNA = "always")

  adult retired    teen    <NA> 
   1264     297      89       0 

Another |> example

  • This:
mutate(acs12, agecat = case_when(age < 14 ~ "child",
                                 age < 18 ~ "teen",
                                 age < 67 ~ "adult",
                                 TRUE ~ "retired"))
  • Is the same as this:
acs12 |> 
  mutate(agecat = case_when(age < 14 ~ "child",
                            age < 18 ~ "teen",
                            age < 67 ~ "adult",
                            TRUE ~ "retired"))

More on |>

  • We’re just scratching the surface in this class

  • When you need to clean your data or when your analyses are more complex, |> makes your life a lot easier!

  • Sometimes in internet resources more than ~2 years old, you’ll see %>% instead–this older version does exactly the same thing

  • Annoying to type out? There’s a keyboard shortcut

    • on mac, command-shift-M
    • on pc, ctrl-shift-M

Continuing on creating new variables

  • Find your exercise from last Thursday (ex-5-30)

    • No need to clone it again; find where it’s saved on your computer
  • 10-15 minutes to work