+ - 0:00:00
Notes for current slide
Notes for next slide

> 02 🀝
> say hello to your data

πŸ”— robust-tools.djnavarro.net

danielle navarro

`

1

πŸ‘©β€πŸ”¬

`



Hayes, Banner, Forrester & Navarro (2019).

Selective sampling and inductive inference: Drawing inferences based on observed and missing evidence

https://psyarxiv.com/2m83v/

`

Property sampling: the robot only detects plaxium spheres

`

Category sampling: the robot only tests small spheres

`

Small sample size: Elicit judgments after two observations

`

Medium sample size: Elicit judgments after six observations

`

Large sample size: Elicit judgments after twelve observations

`

Seven test items that vary in size: Smallest...

`

Seven test items that vary in size: Largest...

`



  • smooth generalisation profiles
  • similar for small samples
  • different for large samples

  • property shows "tightening"
  • category does not
`








Exercise #1:
discuss this study!

01:00
`

2

  πŸ“–
  Reading
  your data

`

Reading the data into R

library(tidyverse)
frames <- read_csv(file = "data_reasoning.csv")
`

Reading the data into R

library(tidyverse)
frames <- read_csv(file = "data_reasoning.csv")
## Parsed with column specification:
## cols(
## id = col_double(),
## gender = col_character(),
## age = col_double(),
## condition = col_character(),
## sample_size = col_character(),
## n_obs = col_double(),
## test_item = col_double(),
## response = col_double()
## )
`

Click on "frames" in the environment pane

`

Click on "frames" in the environment pane

`

Inspecting the data

print(frames)
`

Inspecting the data

print(frames)
## # A tibble: 4,725 x 8
## id gender age condition sample_size n_obs test_item
## <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 1 male 36 category small 2 1
## 2 1 male 36 category small 2 2
## 3 1 male 36 category small 2 3
## 4 1 male 36 category small 2 4
## 5 1 male 36 category small 2 5
## 6 1 male 36 category small 2 6
## # … with 4,719 more rows, and 1 more variable:
## # response <dbl>
`

Inspecting the data

glimpse(frames)
`

Inspecting the data

glimpse(frames)
## Rows: 4,725
## Columns: 8
## $ id <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ gender <chr> "male", "male", "male", "male", "mal…
## $ age <dbl> 36, 36, 36, 36, 36, 36, 36, 36, 36, …
## $ condition <chr> "category", "category", "category", …
## $ sample_size <chr> "small", "small", "small", "small", …
## $ n_obs <dbl> 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, …
## $ test_item <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, …
## $ response <dbl> 8, 7, 6, 6, 5, 6, 3, 9, 7, 5, 6, 4, …
`

3

 The pipe
 %>%

`




The pipe, %>%

  • Take the frames data...
  • Do one thing...
  • Then another...
  • And then one more...
`




The pipe, %>%

  • Take the frames data...
  • Do one thing...
  • Then another...
  • And then one more...
frames %>%
do_one_thing(.) %>%
then_another(.) %>%
and_then_one_more(.)
`
data %>%
tidy() %>%
describe() %>%
visualise() %>%
analyse()
  • We "pipe" our data through operations
  • The data set flows through our analysis
`

4

  Grouped
  summary

`

The basic idea

frames %>%
group_by( GROUP ) %>%
summarise( EXPRESSION ) %>%
ungroup()
  • use group_by to define groups
  • use summarise to... summarise
  • use ungroup to remove grouping
`

Group, summarise...

frames %>%
group_by(test_item, sample_size, n_obs, condition) %>%
summarise(response = mean(response))
`

Group, summarise...

frames %>%
group_by(test_item, sample_size, n_obs, condition) %>%
summarise(response = mean(response))
## # A tibble: 42 x 5
## # Groups: test_item, sample_size, n_obs [21]
## test_item sample_size n_obs condition response
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 large 12 category 7.60
## 2 1 large 12 property 7.16
## 3 1 medium 6 category 7.32
## 4 1 medium 6 property 6.66
## # … with 38 more rows
`

Group, summarise, and ungroup

frames %>%
group_by(test_item, sample_size, n_obs, condition) %>%
summarise(response = mean(response)) %>%
ungroup()
## # A tibble: 42 x 5
## test_item sample_size n_obs condition response
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 large 12 category 7.60
## 2 1 large 12 property 7.16
## 3 1 medium 6 category 7.32
## 4 1 medium 6 property 6.66
## # … with 38 more rows
`

A more realistic example

frames %>%
group_by(test_item) %>%
summarise(
m = mean(response),
s = sd(response),
n = n()
) %>%
ungroup()
`

A more realistic example

frames %>%
group_by(test_item) %>%
summarise(
m = mean(response),
s = sd(response),
n = n()
) %>%
ungroup()
## # A tibble: 7 x 4
## test_item m s n
## * <dbl> <dbl> <dbl> <int>
## 1 1 6.77 2.56 675
## 2 2 6.88 2.10 675
## 3 3 5.71 2.41 675
## 4 4 4.48 2.68 675
## 5 5 3.76 2.81 675
## 6 6 3.43 2.99 675
## 7 7 3.26 3.11 675
`
  • these are bad variable names
  • i've done it to fit the slides
  • don't do it in real life

A more realistic example

by_item <- frames %>%
group_by(test_item) %>%
summarise(
m = mean(response),
s = sd(response),
n = n()
) %>%
ungroup()
`

A more realistic example

ggplot(
data = by_item,
mapping = aes(
x = test_item,
y = m
)
) +
geom_point()
`

A more realistic example

ggplot(
data = by_item,
mapping = aes(
x = test_item,
y = m
)
) +
geom_point()

`

A more realistic example

ggplot(
data = by_item,
mapping = aes(
x = test_item,
y = m,
ymin = m - s,
ymax = m + s
)
) +
geom_pointrange()

`

Exercises #4 and #5

`

5

πŸ–‹
Writing
data

`

If we read with read_csv...

read_csv("data_reasoning.csv")



`

If we read with read_csv...

read_csv("data_reasoning.csv")



... then we write with write_csv!

write_csv(by_item, "summary_reasoning_by_item.csv")
`

Exercises #6 and #7

`

6

 Where
 next?

`

Not in this class, but for future reference...

  • Excel files? readxl package
  • SPSS, Stata or SAS? haven package
  • JSON format? jsonlite package
  • Databases? dbplyr package




Want more information?

Data import chapter in R for Data Science
https://r4ds.had.co.nz/data-import.html

`

1

πŸ‘©β€πŸ”¬

`
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, β†’, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow