class: center, middle, inverse, title-slide # > 02 π€
> say hello to your data ## π
robust-tools.djnavarro.net
### danielle navarro --- layout: true <div class="my-footer"> <span> <a href="https://robust-tools.djnavarro.net" target="_blank">robust-tools.djnavarro.net</a> </span> </div> --- class: middle background-image: url("img/thinking_skeleton.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[1] ] .pull-right-wide[ .larger[ π©βπ¬ ] ] --- class:middle .pull-left[ ![](index_files/figure-html/summary.plot-1.png)<!-- --> ] .pull-right[ <br><br> Hayes, Banner, Forrester & Navarro (2019). <br><br> *Selective sampling and inductive inference: Drawing inferences based on observed and missing evidence* <br><br> https://psyarxiv.com/2m83v/ ] --- class: middle ### https://robust-tools.djnavarro.net/reasoning/ <img src="img/robot1.jpg" width="1707" /> --- class: middle .hand[Property sampling: the robot only detects plaxium spheres] <img src="img/data10_property.jpg" width="410" /> --- class: middle .hand[Category sampling: the robot only tests small spheres] <img src="img/data10_category.jpg" width="410" /> --- class: middle .hand[Small sample size: Elicit judgments after two observations] <img src="img/data2_property.jpg" width="410" /> --- class: middle .hand[Medium sample size: Elicit judgments after six observations] <img src="img/data6_property.jpg" width="410" /> --- class: middle .hand[Large sample size: Elicit judgments after twelve observations] <img src="img/data12_property.jpg" width="410" /> --- class: middle .hand[Seven test items that vary in size: Smallest...] <img src="img/test2.jpg" width="410" /> --- class: middle .hand[Seven test items that vary in size: Largest...] <img src="img/test8.jpg" width="410" /> --- class: middle .pull-left[ ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] .pull-right[ <br><br> - smooth generalisation profiles - similar for small samples - different for large samples <br><br> - property shows "tightening" - category does not ] --- class: middle background-image: url("img/thinking_skeleton.jpg") background-size: cover .pull-left[ ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> ] .pull-right[ <br><br><br> <br><br><br> <br> .hand[ <p style="text-align:right">Exercise #1:<br>discuss this study!</p> ]
01
:
00
] --- class: middle, inverse background-image: url("img/bookshelves.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[2] ] .pull-right-wide[ .larger[ π <br> <span style="color:white;font-weight: bold">Reading</span><br> <span style="color:white;font-weight: bold">your data</span> ] ] --- class: middle, inverse ## https://rstudio.cloud/project/978818 --- class: inverse .hand[Reading the data into R] ```r library(tidyverse) frames <- read_csv(file = "data_reasoning.csv") ``` -- ``` ## Parsed with column specification: ## cols( ## id = col_double(), ## gender = col_character(), ## age = col_double(), ## condition = col_character(), ## sample_size = col_character(), ## n_obs = col_double(), ## test_item = col_double(), ## response = col_double() ## ) ``` --- class: inverse .hand[Click on "frames" in the environment pane] <img src="img/view_frames_0.png" width="900" /> --- class: inverse .hand[Click on "frames" in the environment pane] <img src="img/view_frames_1.png" width="900" /> --- class: inverse .hand[Inspecting the data] ```r print(frames) ``` -- ``` ## # A tibble: 4,725 x 8 ## id gender age condition sample_size n_obs test_item ## <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> ## 1 1 male 36 category small 2 1 ## 2 1 male 36 category small 2 2 ## 3 1 male 36 category small 2 3 ## 4 1 male 36 category small 2 4 ## 5 1 male 36 category small 2 5 ## 6 1 male 36 category small 2 6 ## # β¦ with 4,719 more rows, and 1 more variable: ## # response <dbl> ``` --- class: inverse .hand[Inspecting the data] ```r glimpse(frames) ``` -- ``` ## Rows: 4,725 ## Columns: 8 ## $ id <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, β¦ ## $ gender <chr> "male", "male", "male", "male", "malβ¦ ## $ age <dbl> 36, 36, 36, 36, 36, 36, 36, 36, 36, β¦ ## $ condition <chr> "category", "category", "category", β¦ ## $ sample_size <chr> "small", "small", "small", "small", β¦ ## $ n_obs <dbl> 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, β¦ ## $ test_item <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, β¦ ## $ response <dbl> 8, 7, 6, 6, 5, 6, 3, 9, 7, 5, 6, 4, β¦ ``` --- background-image: url("img/bookshelves.jpg") background-size: cover class: middle .hand[<span style="color:white;font-weight: bold;font-size:64pt">Exercise #2</span>] --- class: middle, inverse background-image: url("img/pipes.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[3] ] .pull-right-wide[ .larger[ The pipe<br> .embolden[%>%]<br> ] ] --- background-image: url("img/pipes2.jpg") background-size: cover <br><br><br> .hand[The pipe, %>%] .pull-left[ - Take the frames data... - Do one thing... - Then another... - And then one more... ] -- .pull-right[ ```{} frames %>% do_one_thing(.) %>% then_another(.) %>% and_then_one_more(.) ``` ] --- background-image: url("img/pipes2.jpg") background-size: cover class: middle .pull-left-wide[ ```{} data %>% tidy() %>% describe() %>% visualise() %>% analyse() ``` - We "pipe" our data through operations - The data set *flows* through our analysis ] --- background-image: url("img/pipes2.jpg") background-size: cover class: middle .hand[<span style="font-weight: bold;font-size:64pt">Exercise #3</span>] --- class: middle, inverse background-image: url("img/drawers.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[4] ] .pull-right-wide[ .larger[.embolden[.plainwhite[ Grouped<br> summary ]]] ] --- class: middle, inverse background-image: url("img/drawer2.jpg") background-size: cover .pull-left-wide[ .hand[The basic idea] ```r frames %>% group_by( GROUP ) %>% summarise( EXPRESSION ) %>% ungroup() ``` - use `group_by` to define groups - use `summarise` to... summarise - use `ungroup` to remove grouping ] --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[Group, summarise...] ```r frames %>% group_by(test_item, sample_size, n_obs, condition) %>% summarise(response = mean(response)) ``` -- ``` ## # A tibble: 42 x 5 *## # Groups: test_item, sample_size, n_obs [21] *## test_item sample_size n_obs condition response ## <dbl> <chr> <dbl> <chr> <dbl> ## 1 1 large 12 category 7.60 ## 2 1 large 12 property 7.16 ## 3 1 medium 6 category 7.32 ## 4 1 medium 6 property 6.66 ## # β¦ with 38 more rows ``` --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[Group, summarise, and ungroup] ```r frames %>% group_by(test_item, sample_size, n_obs, condition) %>% summarise(response = mean(response)) %>% * ungroup() ``` ``` ## # A tibble: 42 x 5 *## test_item sample_size n_obs condition response ## <dbl> <chr> <dbl> <chr> <dbl> ## 1 1 large 12 category 7.60 ## 2 1 large 12 property 7.16 ## 3 1 medium 6 category 7.32 ## 4 1 medium 6 property 6.66 ## # β¦ with 38 more rows ``` --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[A more realistic example] .pull-left[ ```r frames %>% group_by(test_item) %>% summarise( * m = mean(response), * s = sd(response), * n = n() ) %>% ungroup() ``` ] -- .pull-right[ ``` ## # A tibble: 7 x 4 ## test_item m s n ## * <dbl> <dbl> <dbl> <int> ## 1 1 6.77 2.56 675 ## 2 2 6.88 2.10 675 ## 3 3 5.71 2.41 675 ## 4 4 4.48 2.68 675 ## 5 5 3.76 2.81 675 ## 6 6 3.43 2.99 675 ## 7 7 3.26 3.11 675 ``` ] ??? - these are bad variable names - i've done it to fit the slides - don't do it in real life --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[A more realistic example] .pull-left[ ```r *by_item <- frames %>% group_by(test_item) %>% summarise( m = mean(response), s = sd(response), n = n() ) %>% ungroup() ``` ] --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[A more realistic example] .pull-left[ ```r ggplot( data = by_item, mapping = aes( x = test_item, y = m ) ) + geom_point() ``` ] -- .pull-right[ ![](index_files/figure-html/unnamed-chunk-23-1.png)<!-- --> ] --- class: inverse background-image: url("img/drawer2.jpg") background-size: cover .hand[A more realistic example] .pull-left[ ```r ggplot( data = by_item, mapping = aes( x = test_item, y = m, * ymin = m - s, * ymax = m + s ) ) + *geom_pointrange() ``` ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] --- class: middle background-image: url("img/drawer2.jpg") background-size: cover .hand[.plainwhite[ <span style="font-weight:bold;font-size:64pt">Exercises #4 and #5</span> ]] --- class: middle, inverse background-image: url("img/typewriter.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[5] ] .pull-right-wide[ .larger[ π<br> .plainwhite[Writing<br>data] ] ] --- class: inverse .hand[If we read with read_csv...] ```r read_csv("data_reasoning.csv") ``` <br><br> -- .hand[... then we write with write_csv!] ```r write_csv(by_item, "summary_reasoning_by_item.csv") ``` --- class: middle background-image: url("img/typewriter.jpg") background-size: cover .hand[.plainwhite[ <span style="font-weight:bold;font-size:64pt">Exercises #6 and #7</span> ]] --- class: middle background-image: url("img/snow_road.jpg") background-size: cover .pull-left-narrow[ .huge-blue-number[6] ] .pull-right-wide[ .larger[ Where<br> next? ] ] --- background-image: url("img/snow_road2.jpg") background-size: cover .hand[Not in this class, but for future reference...] - Excel files? `readxl` package - SPSS, Stata or SAS? `haven` package - JSON format? `jsonlite` package - Databases? `dbplyr` package .pull-right-wide[ <br><br><br> .hand[Want more information?] Data import chapter in R for Data Science<br> https://r4ds.had.co.nz/data-import.html ]