class: center, middle, inverse, title-slide # > 03 π
> dplyr, or a dance with data ## π
robust-tools.djnavarro.net
### danielle navarro --- layout: true <div class="my-footer"> <span> <a href="https://robust-tools.djnavarro.net" target="_blank">robust-tools.djnavarro.net</a> </span> </div> --- class: middle, inverse ## https://rstudio.cloud/project/1006868 --- class: middle, inverse background-image: url("img/swow_concreteness_fade.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[1] ] .pull-right-wide[ .larger[.embolden[ Small World of Words ]] ] --- class:middle background-image: url("img/swow_concreteness_strongfade.jpg") background-size: cover .pull-left[ <img src="img/science_network.png" width="672" /> ] .pull-right[ <br><br> De Deyne, Navarro, Perfors, Brysbaert & Storms (2019). <br><br> *The Small World of Words: English word association norms for over 12,000 cue words* <br><br> https://psyarxiv.com/mb93p/ <br> https://smallworldofwords.org/ ] --- class: inverse background-image: url("img/swow_concreteness_strongfade.jpg") background-size: cover .hand[Import the SWOW data] -- ```r library(tidyverse) swow <- read_tsv(file = "data_swow.csv.zip") swow <- swow %>% mutate(id = 1:n()) # <- ignore for now ``` -- ``` ## # A tibble: 483,636 x 6 *## cue response R1 N R1.Strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 a one 21 97 0.216 1 ## 2 a the 16 97 0.165 2 ## 3 a b 9 97 0.0928 3 ## 4 a an 4 97 0.0412 4 ## 5 a first 3 97 0.0309 5 ## # β¦ with 483,631 more rows ``` --- class: inverse background-image: url("img/swow_concreteness_strongfade.jpg") background-size: cover .hand[Automated name cleaning?] -- ```r library(janitor) swow <- clean_names(swow) ``` -- ``` ## # A tibble: 483,636 x 6 *## cue response r1 n r1_strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 a one 21 97 0.216 1 ## 2 a the 16 97 0.165 2 ## 3 a b 9 97 0.0928 3 ## 4 a an 4 97 0.0412 4 ## 5 a first 3 97 0.0309 5 ## # β¦ with 483,631 more rows ``` --- class: inverse background-image: url("img/swow_concreteness_strongfade.jpg") background-size: cover .hand[Manual name cleaning] -- ```r swow <- swow %>% rename(n_response = r1, n_total = n, strength = r1_strength) ``` -- ``` ## # A tibble: 483,636 x 6 *## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 a one 21 97 0.216 1 ## 2 a the 16 97 0.165 2 ## 3 a b 9 97 0.0928 3 ## # β¦ with 483,633 more rows ``` --- class: inverse, middle background-image: url("img/swow_concreteness_strongfade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #1</span>] - open `exercise_dplyr_01.R` - write your comments at the beginning - load tidyverse - load the swow data - use `rename()` to get new variable names --- class: middle, inverse background-image: url("img/filter.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[2] ] .pull-right-wide[ .larger[.embolden[ Filtering Data ]] ] --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[Filter to keep a subset of the data] - The SWOW data has 483,636 rows - Let's extract the cases when `cue == "woman"` --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[Filter to keep a subset of the data] -- ```r swow %>% filter(cue == "woman") ``` -- ``` ## # A tibble: 28 x 6 ## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 woman man 38 100 0.38 477315 ## 2 woman female 22 100 0.22 477316 ## 3 woman girl 7 100 0.07 477317 ## 4 woman lady 5 100 0.05 477318 ## # β¦ with 24 more rows ``` --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[Filter to keep a subset of the data] .pull-left[ ```r woman_fwd <- swow %>% filter(cue == "woman") ggplot(woman_fwd) + geom_col(aes( x = response, y = strength )) ``` ] -- .pull-right[ ![](index_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[Filter to keep a subset of the data] .pull-left[ ```r woman_fwd <- swow %>% filter(cue == "woman") ggplot(woman_fwd) + geom_col(aes( x = response, y = strength )) + coord_flip() ``` ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[One filter(), then another filter()] .pull-left[ ```r woman_fwd <- swow %>% filter(cue == "woman") %>% filter(n_response > 1) ggplot(woman_fwd) + geom_col(aes( x = response, y = strength )) + coord_flip() ``` ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- background-image: url("img/filter_fade.jpg") background-size: cover .hand[Two expressions in one filter()] .pull-left[ ```r woman_fwd <- swow %>% filter( cue == "woman", n_response > 1 ) ggplot(woman_fwd) + geom_col(aes( x = response, y = strength )) + coord_flip() ``` ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- class: inverse, middle background-image: url("img/filter_fade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #2</span>] - open `exercise_dplyr_02.R` - create `woman_bck` for "backward associates" - i.e. when `"woman"` is the `response` - only keep cases with at least two responses --- class: middle, inverse background-image: url("img/arrange_dark.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[3] ] .pull-right-wide[ <br> .larger[.embolden[.plainwhite[ Data Arranging ]]] ] --- background-image: url("img/arrange_fade.jpg") background-size: cover .hand[The forward data are arranged neatly] ```r swow %>% filter(cue == "woman", n_response > 1) ``` ``` ## # A tibble: 8 x 6 ## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 woman man 38 100 0.38 477315 ## 2 woman female 22 100 0.22 477316 ## 3 woman girl 7 100 0.07 477317 ## 4 woman lady 5 100 0.05 477318 ## 5 woman beauty 2 100 0.02 477319 ## 6 woman me 2 100 0.02 477320 ## # β¦ with 2 more rows ``` --- background-image: url("img/arrange_fade.jpg") background-size: cover .hand[The backward data are not] ```r swow %>% filter(response == "woman", n_response > 1) ``` ``` ## # A tibble: 200 x 6 ## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 abuse woman 2 100 0.02 1353 ## 2 actress woman 7 100 0.07 3921 ## 3 Amazon woman 2 100 0.02 12286 ## 4 American woman 2 100 0.02 12941 ## 5 apron woman 2 98 0.0204 19310 ## 6 argumentative woman 2 100 0.02 20182 ## # β¦ with 194 more rows ``` --- background-image: url("img/arrange_fade.jpg") background-size: cover .hand[Lets arrange() them] ```r swow %>% filter(response == "woman", n_response > 1) %>% arrange(strength) ``` -- ``` ## # A tibble: 200 x 6 ## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 abuse woman 2 100 0.02 1353 ## 2 Amazon woman 2 100 0.02 12286 ## 3 American woman 2 100 0.02 12941 ## 4 argumentative woman 2 100 0.02 20182 ## 5 attract woman 2 100 0.02 25876 ## # β¦ with 195 more rows ``` --- background-image: url("img/arrange_fade.jpg") background-size: cover .hand[Lets arrange() them, in descending order] ```r swow %>% filter(response == "woman", n_response > 1) %>% arrange(desc(strength)) ``` ``` ## # A tibble: 200 x 6 ## cue response n_response n_total strength id ## <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 man woman 57 99 0.576 258593 ## 2 lady woman 36 100 0.36 240149 ## 3 feminist woman 30 99 0.303 158641 ## 4 female woman 23 99 0.232 158492 ## 5 pregnant woman 18 100 0.18 327286 ## # β¦ with 195 more rows ``` --- class: inverse, middle background-image: url("img/arrange_fade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #3</span>] - open script `exercise_dplyr_03.R` - create analogous data sets `man_fwd` and `man_bck` - make sure all data sets are arranged by descending `strength` --- class: middle, inverse background-image: url("img/select_dark.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[4] ] .pull-right-wide[ <br> .larger[.embolden[.plainwhite[ Variable<br> Selection ]]] ] --- background-image: url("img/select_fade.jpg") background-size: cover .hand[Select cue, response and strength] ```r swow %>% filter(response == "woman", n_response > 1) %>% arrange(desc(strength)) %>% * select(cue, response, strength) ``` -- ``` ## # A tibble: 200 x 3 ## cue response strength ## <chr> <chr> <dbl> ## 1 man woman 0.576 ## 2 lady woman 0.36 ## 3 feminist woman 0.303 ## # β¦ with 197 more rows ``` --- background-image: url("img/select_fade.jpg") background-size: cover .hand[Alternative approach...] ```r swow %>% filter(response == "woman", n_response > 1) %>% arrange(desc(strength)) %>% * select(-n_response, -n_total) ``` -- ``` ## # A tibble: 200 x 4 ## cue response strength id ## <chr> <chr> <dbl> <int> ## 1 man woman 0.576 258593 ## 2 lady woman 0.36 240149 ## 3 feminist woman 0.303 158641 ## # β¦ with 197 more rows ``` --- class: inverse, middle background-image: url("img/select_fade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #4</span>] - open script `exercise_dplyr_04.R` - notice that the `woman_fwd` pipeline uses `select()` - use `select()` for the other variables --- class: middle, inverse background-image: url("img/butterfly.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[5] ] .pull-right-wide[ <br> .larger[.embolden[ Mutate ]] ] --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Digression: Psychological measurement is hard] <img src="index_files/figure-html/mutate0-1.png" width="600" height="400" /> --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Forward: compete with responses to **same** cue] ```r woman_fwd ``` ``` ## # A tibble: 8 x 4 ## cue response strength id ## <chr> <chr> <dbl> <int> ## 1 woman man 0.38 477315 ## 2 woman female 0.22 477316 ## 3 woman girl 0.07 477317 ## 4 woman lady 0.05 477318 ## 5 woman beauty 0.02 477319 ## 6 woman me 0.02 477320 ## # β¦ with 2 more rows ``` --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Backward: compete with responses to **other** cues] ```r woman_bck ``` ``` ## # A tibble: 200 x 4 ## cue response strength id ## <chr> <chr> <dbl> <int> ## 1 man woman 0.576 258593 ## 2 lady woman 0.36 240149 ## 3 feminist woman 0.303 158641 ## 4 female woman 0.232 158492 ## 5 pregnant woman 0.18 327286 ## 6 housewife woman 0.17 209394 ## # β¦ with 194 more rows ``` --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Use mutate() to compute new variables] ```r woman_fwd %>% mutate(rank = rank(-strength)) ``` ``` ## # A tibble: 8 x 5 ## cue response strength id rank ## <chr> <chr> <dbl> <int> <dbl> ## 1 woman man 0.38 477315 1 ## 2 woman female 0.22 477316 2 ## 3 woman girl 0.07 477317 3 ## 4 woman lady 0.05 477318 4 ## 5 woman beauty 0.02 477319 6.5 ## # β¦ with 3 more rows ``` --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Use mutate() to compute new variables] ```r woman_bck %>% mutate(rank = rank(-strength)) ``` ``` ## # A tibble: 200 x 5 ## cue response strength id rank ## <chr> <chr> <dbl> <int> <dbl> ## 1 man woman 0.576 258593 1 ## 2 lady woman 0.36 240149 2 ## 3 feminist woman 0.303 158641 3 ## 4 female woman 0.232 158492 4 ## 5 pregnant woman 0.18 327286 5 ## # β¦ with 195 more rows ``` --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Use mutate() to compute new variables] ```r woman_bck %>% mutate(rank = rank(-strength), type = "backward") ``` ``` ## # A tibble: 200 x 6 ## cue response strength id rank type ## <chr> <chr> <dbl> <int> <dbl> <chr> ## 1 man woman 0.576 258593 1 backward ## 2 lady woman 0.36 240149 2 backward ## 3 feminist woman 0.303 158641 3 backward ## 4 female woman 0.232 158492 4 backward ## 5 pregnant woman 0.18 327286 5 backward ## # β¦ with 195 more rows ``` --- background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[Which should we use, "rank" or "strength"?] <img src="index_files/figure-html/mutate6-1.png" width="600" height="400" /> --- class: inverse, middle background-image: url("img/butterfly_fade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #5</span>] - open script `exercise_dplyr_05.R` - mutate the data sets to include four new variables: - `rank`: see the slides - `type`: either `forward` or `backward` - `word`: either `man` or `woman` - `associate`: see script for details! --- class: middle, inverse background-image: url("img/bind.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[6] ] .pull-right-wide[ <br> .larger[.embolden[.plainwhite[ Bind ]]] ] --- background-image: url("img/bind_fade.jpg") background-size: cover .hand[Use bind_rows() to stack data sets vertically] ```r gender <- bind_rows(woman_fwd, woman_bck, man_fwd, man_bck) ``` -- ``` *## # A tibble: 505 x 8 ## cue response strength id rank type word associate ## * <chr> <chr> <dbl> <int> <dbl> <fct> <chr> <chr> ## 1 woman man 0.38 477315 1 forward woman man ## 2 woman female 0.22 477316 2 forward woman female ## 3 woman girl 0.07 477317 3 forward woman girl ## 4 woman lady 0.05 477318 4 forward woman lady ## 5 woman beauty 0.02 477319 6.5 forward woman beauty ## 6 woman me 0.02 477320 6.5 forward woman me ## # β¦ with 499 more rows ``` --- background-image: url("img/bind_fade.jpg") background-size: cover .hand[Clean up using select()...] ```r gender <- bind_rows(woman_fwd, woman_bck, man_fwd, man_bck) %>% select(id:associate) ``` ``` ## # A tibble: 505 x 5 ## id rank type word associate ## <int> <dbl> <fct> <chr> <chr> ## 1 477315 1 forward woman man ## 2 477316 2 forward woman female ## 3 477317 3 forward woman girl ## 4 477318 4 forward woman lady ## 5 477319 6.5 forward woman beauty ## # β¦ with 500 more rows ``` --- background-image: url("img/bind_fade.jpg") background-size: cover .hand[Clean up using select() and filter()] ```r gender <- bind_rows(woman_fwd, woman_bck, man_fwd, man_bck) %>% select(id:associate) %>% filter(associate != "man", associate != "woman") ``` ``` ## # A tibble: 501 x 5 ## id rank type word associate ## <int> <dbl> <fct> <chr> <chr> ## 1 477316 2 forward woman female ## 2 477317 3 forward woman girl ## 3 477318 4 forward woman lady ## 4 477319 6.5 forward woman beauty ## # β¦ with 497 more rows ``` --- background-image: url("img/bind_fade.jpg") background-size: cover .hand[Check that it worked!] .pull-left[ ```r gender %>% group_by(word, type) %>% count() ``` ] -- .pull-right[ ``` ## # A tibble: 4 x 3 ## # Groups: word, type [4] ## word type n ## <chr> <fct> <int> ## 1 man forward 7 ## 2 man backward 288 ## 3 woman forward 7 ## 4 woman backward 199 ``` ] --- class: inverse, middle background-image: url("img/bind_fade.jpg") background-size: cover .hand[<span style="font-size:60pt">exercise #6</span>] - open script `exercise_dplyr_06.R` - take a look at the instructions! π --- class: middle, inverse background-image: url("img/pivot.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[7] ] .pull-right-wide[ <br> .larger[.embolden[.plainwhite[ Pivot ]]] ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .hand[Lovely data] .pull-left[ ```r love <- read_csv( "data_love.csv" ) ``` ] -- .pull-right[ ``` ## # A tibble: 4 x 3 ## colour heart book ## <chr> <chr> <chr> ## 1 blue π π ## 2 green π π ## 3 yellow π π ## 4 orange 𧑠π ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .hand["Pivot" to "longer" data...] .pull-left[ ```r long_love <- love %>% pivot_longer( cols = c(heart, book), names_to = "object", values_to = "emoji" ) ``` ] -- .pull-right[ ``` ## # A tibble: 8 x 3 ## colour object emoji ## <chr> <chr> <chr> ## 1 blue heart π ## 2 blue book π ## 3 green heart π ## 4 green book π ## 5 yellow heart π ## 6 yellow book π ## 7 orange heart 𧑠## 8 orange book π ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .hand["Pivot" to "wider" data...] .pull-left[ ```r wide_love <- long_love %>% pivot_wider( id_cols = colour, names_from = object, values_from = emoji ) ``` ] -- .pull-right[ ``` ## # A tibble: 4 x 3 ## colour heart book ## <chr> <chr> <chr> ## 1 blue π π ## 2 green π π ## 3 yellow π π ## 4 orange 𧑠π ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .pull-left[ .hand[Reshape the data] ```r gender_fwd <- gender %>% filter( type == "forward" ) %>% pivot_wider( id_cols = associate, names_from = word, values_from = rank ) ``` ] -- .pull-right[ ``` ## # A tibble: 13 x 3 ## associate woman man ## <chr> <dbl> <dbl> ## 1 female 2 NA ## 2 girl 3 NA ## 3 lady 4 NA ## 4 beauty 6.5 NA ## 5 me 6.5 NA ## 6 strong 6.5 6.5 ## 7 wife 6.5 NA ## 8 male NA 2 ## 9 human NA 3 ## 10 husband NA 4 ## 11 boy NA 6.5 ## 12 gender NA 6.5 ## 13 person NA 6.5 ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .pull-left-wide[ .hand[Reshape the data] ```r gender_fwd <- gender_fwd %>% mutate( woman = replace_na(1/woman, 0), man = replace_na(1/man, 0), diff = woman - man ) %>% arrange(diff) ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .pull-right-wide[ ``` ## # A tibble: 13 x 4 ## associate woman man diff ## <chr> <dbl> <dbl> <dbl> ## 1 male 0 0.5 -0.5 ## 2 human 0 0.333 -0.333 ## 3 husband 0 0.25 -0.25 ## 4 boy 0 0.154 -0.154 ## 5 gender 0 0.154 -0.154 ## 6 person 0 0.154 -0.154 ## 7 strong 0.154 0.154 0 ## 8 beauty 0.154 0 0.154 ## 9 me 0.154 0 0.154 ## 10 wife 0.154 0 0.154 ## 11 lady 0.25 0 0.25 ## 12 girl 0.333 0 0.333 ## 13 female 0.5 0 0.5 ``` ] --- background-image: url("img/pivot_fade.jpg") background-size: cover .hand[Reshape the data] .pull-left[ ```r ggplot( data = gender_fwd, mapping = aes( x = associate %>% reorder(diff), y = diff )) + geom_col() + coord_flip() ``` ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] --- background-image: url("img/pivot_fade.jpg") background-size: cover ![](index_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- class: inverse, middle background-image: url("img/pivot_fade.jpg") background-size: cover .pull-left-narrow[ <br><br><br> .hand[<span style="font-size:40pt">exercise #7</span>] ] .pull-right-wide[ <img src="index_files/figure-html/unnamed-chunk-19-1.png" height="600" /> ] --- class: middle, inverse background-image: url("img/join_dark.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[8] ] .pull-right-wide[ <br> .larger[.embolden[.plainwhite[ Join ]]] ] --- class: middle, inverse background-image: url("img/join_fade.jpg") background-size: cover .hand[Restore all the variables?] ```r gender ``` ``` ## # A tibble: 501 x 5 ## id rank type word associate ## <int> <dbl> <fct> <chr> <chr> ## 1 477316 2 forward woman female ## 2 477317 3 forward woman girl ## 3 477318 4 forward woman lady ## 4 477319 6.5 forward woman beauty ## 5 477320 6.5 forward woman me ## 6 477321 6.5 forward woman strong ## 7 477322 6.5 forward woman wife ## 8 240149 2 backward woman lady ## 9 158641 3 backward woman feminist ## 10 158492 4 backward woman female ## # β¦ with 491 more rows ``` --- class: middle, inverse background-image: url("img/join_fade.jpg") background-size: cover .hand[Restore all the variables?] ```r gender %>% left_join(swow, by = "id") ``` ``` ## # A tibble: 501 x 10 ## id rank type word associate cue response n_response n_total strength ## * <int> <dbl> <fct> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 477316 2 forwβ¦ woman female woman female 22 100 0.22 ## 2 477317 3 forwβ¦ woman girl woman girl 7 100 0.07 ## 3 477318 4 forwβ¦ woman lady woman lady 5 100 0.05 ## 4 477319 6.5 forwβ¦ woman beauty woman beauty 2 100 0.02 ## 5 477320 6.5 forwβ¦ woman me woman me 2 100 0.02 ## 6 477321 6.5 forwβ¦ woman strong woman strong 2 100 0.02 ## 7 477322 6.5 forwβ¦ woman wife woman wife 2 100 0.02 ## 8 240149 2 backβ¦ woman lady lady woman 36 100 0.36 ## 9 158641 3 backβ¦ woman feminist femiβ¦ woman 30 99 0.303 ## 10 158492 4 backβ¦ woman female femaβ¦ woman 23 99 0.232 ## # β¦ with 491 more rows ``` --- class: middle, inverse background-image: url("img/snow_road2.jpg") background-size: cover .pull-right-wide[ <br> .hand[Want more information?] R for Data Science has a lot of this<br> https://r4ds.had.co.nz <br><br> Stat 545 is a wonderful resource<br> https://stat545.com/ ]