Say hello to your data

# > 02 🤝 > say hello to your data
## 🔗 <a href="https://robust-tools.djnavarro.net">robust-tools.djnavarro.net</a>
### danielle navarro

---

layout: true
 
<div class="my-footer">

<a href="https://robust-tools.djnavarro.net" target="_blank">robust-tools.djnavarro.net</a>

</div>

---

---

class:middle

.pull-right[
 
Hayes, Banner, Forrester & Navarro (2019). 
 
*Selective sampling and inductive inference: Drawing inferences based on observed and missing evidence* 
https://psyarxiv.com/2m83v/ 
]

---

### https://robust-tools.djnavarro.net/reasoning/

---

.hand[Property sampling: the robot only detects plaxium spheres]
<img src="img/data10_property.jpg" width="410" />

---

class: middle
.hand[Category sampling: the robot only tests small spheres]
<img src="img/data10_category.jpg" width="410" />

---

class: middle
.hand[Small sample size: Elicit judgments after two observations]
<img src="img/data2_property.jpg" width="410" />

---

class: middle
.hand[Medium sample size: Elicit judgments after six observations]
<img src="img/data6_property.jpg" width="410" />

---

class: middle
.hand[Large sample size: Elicit judgments after twelve observations]
<img src="img/data12_property.jpg" width="410" />

---

class: middle
.hand[Seven test items that vary in size: Smallest...]
<img src="img/test2.jpg" width="410" />

---

class: middle
.hand[Seven test items that vary in size: Largest...]
<img src="img/test8.jpg" width="410" />

---

.pull-right[
 
- smooth generalisation profiles
- similar for small samples
- different for large samples
 
- property shows "tightening"
- category does not
]

---

<div class="countdown" id="timer_5ea114db" style="right:0;bottom:0;" data-warnwhen="0">
<code class="countdown-time">01:00</code>
</div>
]

---

.pull-left-narrow[
 .huge-blue-number[2]
]
.pull-right-wide[
 .larger[
 &nbsp; 📖 
 &nbsp; Reading 
 &nbsp; your data
 ]
]

---

## https://rstudio.cloud/project/978818

---

```r
library(tidyverse)
frames <- read_csv(file = "data_reasoning.csv")
```
--

```
## Parsed with column specification:
## cols(
##   id = col_double(),
##   gender = col_character(),
##   age = col_double(),
##   condition = col_character(),
##   sample_size = col_character(),
##   n_obs = col_double(),
##   test_item = col_double(),
##   response = col_double()
## )
```

---

class: inverse
.hand[Click on "frames" in the environment pane]
<img src="img/view_frames_0.png" width="900" />

---

class: inverse
.hand[Click on "frames" in the environment pane]
<img src="img/view_frames_1.png" width="900" />

---

```r
print(frames)
```
--

```
## # A tibble: 4,725 x 8
## id gender age condition sample_size n_obs test_item
## <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 1 male 36 category small 2 1
## 2 1 male 36 category small 2 2
## 3 1 male 36 category small 2 3
## 4 1 male 36 category small 2 4
## 5 1 male 36 category small 2 5
## 6 1 male 36 category small 2 6
## # … with 4,719 more rows, and 1 more variable:
## # response <dbl>
```

---

```r
glimpse(frames)
```
--

```
## Rows: 4,725
## Columns: 8
## $ id <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ gender <chr> "male", "male", "male", "male", "mal…
## $ age <dbl> 36, 36, 36, 36, 36, 36, 36, 36, 36, …
## $ condition <chr> "category", "category", "category", …
## $ sample_size <chr> "small", "small", "small", "small", …
## $ n_obs <dbl> 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, …
## $ test_item <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, …
## $ response <dbl> 8, 7, 6, 6, 5, 6, 3, 9, 7, 5, 6, 4, …
```

---

---

.pull-left-narrow[
 .huge-blue-number[3]
]
.pull-right-wide[
 .larger[
 &nbsp;The pipe 
 &nbsp;.embolden[%&gt;%] 
 ]
]

---

.pull-left[
- Take the frames data...
- Do one thing...
- Then another...
- And then one more...
]
--
.pull-right[
```{}
frames %>%
  do_one_thing(.) %>%
  then_another(.) %>%
  and_then_one_more(.)
```
]

---

- We "pipe" our data through operations
- The data set *flows* through our analysis
]

---

---

.pull-left-narrow[
 .huge-blue-number[4]
]
.pull-right-wide[
 .larger[.embolden[.plainwhite[
 &nbsp; Grouped 
 &nbsp; summary
 ]]]
]

---

```r
frames %>%
  group_by( GROUP ) %>%
  summarise( EXPRESSION ) %>%
  ungroup()
```

- use `group_by` to define groups
- use `summarise` to... summarise
- use `ungroup` to remove grouping
]

---
class: inverse

```r
frames %>%
  group_by(test_item, sample_size, n_obs, condition) %>%
  summarise(response = mean(response))
```
--

```
## # A tibble: 42 x 5
*## # Groups: test_item, sample_size, n_obs [21]
*## test_item sample_size n_obs condition response
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 large 12 category 7.60
## 2 1 large 12 property 7.16
## 3 1 medium 6 category 7.32
## 4 1 medium 6 property 6.66
## # … with 38 more rows
```

---
class: inverse

```r
frames %>%
  group_by(test_item, sample_size, n_obs, condition) %>%
  summarise(response = mean(response)) %>%
* ungroup()
```

```
## # A tibble: 42 x 5
*## test_item sample_size n_obs condition response
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 large 12 category 7.60
## 2 1 large 12 property 7.16
## 3 1 medium 6 category 7.32
## 4 1 medium 6 property 6.66
## # … with 38 more rows
```

---
class: inverse

```r
frames %>% 
  group_by(test_item) %>%
  summarise(
*   m = mean(response),
*   s = sd(response),
*   n = n()
  ) %>%
  ungroup()
```
]
--
.pull-right[

```
## # A tibble: 7 x 4
## test_item m s n
## * <dbl> <dbl> <dbl> <int>
## 1 1 6.77 2.56 675
## 2 2 6.88 2.10 675
## 3 3 5.71 2.41 675
## 4 4 4.48 2.68 675
## 5 5 3.76 2.81 675
## 6 6 3.43 2.99 675
## 7 7 3.26 3.11 675
```
]

???
- these are bad variable names
- i've done it to fit the slides
- don't do it in real life

---
class: inverse

```r
*by_item <- frames %>%
 group_by(test_item) %>%
 summarise(
 m = mean(response), 
 s = sd(response), 
 n = n() 
 ) %>%
 ungroup()
```
]

---
class: inverse

```r
ggplot(
 data = by_item,
 mapping = aes(
 x = test_item, 
 y = m
 )
) +
geom_point() 
```
]
--
.pull-right[
![](index_files/figure-html/unnamed-chunk-23-1.png)
]

---
class: inverse

```r
ggplot(
 data = by_item,
 mapping = aes(
 x = test_item, 
 y = m, 
* ymin = m - s,
* ymax = m + s
 )
) +
*geom_pointrange()
```
]
.pull-right[
![](index_files/figure-html/unnamed-chunk-24-1.png)
]

---

---

.pull-left-narrow[
 .huge-blue-number[5]
]
.pull-right-wide[
 .larger[
 🖋 
 .plainwhite[Writing data]
 ]
]

---

```r
read_csv("data_reasoning.csv")
```

--
.hand[... then we write with write_csv!]

```r
write_csv(by_item, "summary_reasoning_by_item.csv")
```

---

---

---

.hand[Not in this class, but for future reference...]
- Excel files? `readxl` package
- SPSS, Stata or SAS? `haven` package
- JSON format? `jsonlite` package
- Databases? `dbplyr` package

Data import chapter in R for Data Science 
https://r4ds.had.co.nz/data-import.html
]