Loading packages and data

#load packages
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
-- Attaching packages ------------------------------------------------------------------------------------------- tidyverse 1.3.2 --v ggplot2 3.3.5     v purrr   0.3.4
v tibble  3.1.8     v dplyr   1.0.9
v tidyr   1.1.4     v stringr 1.4.0
v readr   2.1.2     v forcats 0.5.1-- Conflicts ---------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
d <- read_csv("isbell_2019_background_data.csv")
Rows: 198 Columns: 45-- Column specification ------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (12): ID, Home_Country, Gender, Current_Student, Highest_Ed, LangDom_1, LangDom_2, LangDom_3, LangDom_4, LangDom_5, LangDom_...
dbl (33): Birthyear, Lang2_percent, Lang3_percent, Lang4_percent, Lang5_percent, LangOthr_percent, Korean_jth_lang, Age_Start_Ko...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

1. Gender in the sample

Several ways to do this. First up: table() function

table(d$Gender)

  F   M 
174  24 

Or, count()

d %>% count(Gender)

2. Birthyear and age

First, create the age var. Two ways of doing this:

d$Age <- 2018 - d$Birthyear

d <- d %>% mutate(Age = 2018 - Birthyear)

Summary stats:

d %>% summarise(mean = mean(Age),
                sd = sd(Age),
                median = median(Age),
                min = min(Age),
                max = max(Age))
NA

Time for a histogram. We’ll make it decent looking via ggplot

d %>% ggplot(aes(x = Age))+
  geom_histogram(binwidth = 1)+ #bindwidth sets the 'width' of each bucket/bar of the histogram
  theme_bw()

3. Home countries

An easy way to do this is unique()

unique(d$Home_Country)
 [1] "Germany"      "China"        "Taiwan"       "Hong Kong"    "Russia"       "El Salvador"  "Sri Lanka"    "Mexico"      
 [9] "Turkmenistan" "Ecuador"      "Vietnam"      "Singapore"    "Malaysia"     "Indonesia"    "Kazahkstan"   "Japan"       
[17] "France"       "Philippines"  "Turkey"       "Iran"         "Thailand"     "Italy"        "Belarus"      "Bangladesh"  
[25] "Ukraine"      "Chile"        "Columbia"     "Brazil"       "Mongolia"     "Kyrgyzstan"   "Uzbekistan"   "Azerbaijan"  
[33] "Peru"         "USA"          "Bermuda"      "Spain"       

36 unique home countries represented in the data!

To count them, we’ll use count() again. You could also use table(), but that can get messy with so many countries.

d %>% count(Home_Country)

This gives a lot of info, but you could manually find the top 5. You could pass this to a viewing window with a %>% view() added to that last line of code.

Fancier solution:

d %>% count(Home_Country) %>% arrange(desc(n)) %>% slice_head(n = 5)

What that code did was arrange the output from count(Home_country) in descending order and then grab just the top 5 rows.

4. Self-Assessed Skills

For this, you can copy-paste the code from Task 2.

#KorSpk
d %>% summarise(mean = mean(KorSpk),
                sd = sd(KorSpk),
                median = median(KorSpk),
                min = min(KorSpk),
                max = max(KorSpk))

#KorLis
d %>% summarise(mean = mean(KorLis),
                sd = sd(KorLis),
                median = median(KorLis),
                min = min(KorLis),
                max = max(KorLis))

Scatterplot time:

d %>% ggplot(aes(x = KorLis, y = KorSpk))+
  geom_point()+
  theme_bw()

This works, but many of the points are overlapping (you don’t see all 198 points!). One way to fix this is introduce some “jitter”, or slightly adjustment of points for visualization purposes:

d %>% ggplot(aes(x = KorLis, y = KorSpk))+
  geom_jitter()+
  theme_bw()

6. Free Choice

Here’s something interesting… “Motiv_Love” is self-reported motivation to learn Korean due to having a Korean-speaking significant other.

d %>% ggplot(aes(x = Motiv_Love, y = KorSpk))+
  geom_jitter()+
  theme_bw()

Doesn’t look like much of a relationship!

LS0tDQp0aXRsZTogIldlZWsgMiBIb21ld29yayBTb2x1dGlvbiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQojIExvYWRpbmcgcGFja2FnZXMgYW5kIGRhdGENCg0KYGBge3J9DQojbG9hZCBwYWNrYWdlcw0KbGlicmFyeSh0aWR5dmVyc2UpDQoNCmQgPC0gcmVhZF9jc3YoImlzYmVsbF8yMDE5X2JhY2tncm91bmRfZGF0YS5jc3YiKQ0KDQpgYGANCg0KIyAxLiBHZW5kZXIgaW4gdGhlIHNhbXBsZQ0KDQpTZXZlcmFsIHdheXMgdG8gZG8gdGhpcy4gRmlyc3QgdXA6IGB0YWJsZSgpYCBmdW5jdGlvbg0KYGBge3J9DQp0YWJsZShkJEdlbmRlcikNCmBgYA0KDQpPciwgYGNvdW50KClgDQpgYGB7cn0NCmQgJT4lIGNvdW50KEdlbmRlcikNCmBgYA0KDQojIDIuIEJpcnRoeWVhciBhbmQgYWdlDQoNCkZpcnN0LCBjcmVhdGUgdGhlIGFnZSB2YXIuIFR3byB3YXlzIG9mIGRvaW5nIHRoaXM6DQoNCmBgYHtyfQ0KZCRBZ2UgPC0gMjAxOCAtIGQkQmlydGh5ZWFyDQoNCmQgPC0gZCAlPiUgbXV0YXRlKEFnZSA9IDIwMTggLSBCaXJ0aHllYXIpDQpgYGANCg0KU3VtbWFyeSBzdGF0czoNCmBgYHtyfQ0KZCAlPiUgc3VtbWFyaXNlKG1lYW4gPSBtZWFuKEFnZSksDQogICAgICAgICAgICAgICAgc2QgPSBzZChBZ2UpLA0KICAgICAgICAgICAgICAgIG1lZGlhbiA9IG1lZGlhbihBZ2UpLA0KICAgICAgICAgICAgICAgIG1pbiA9IG1pbihBZ2UpLA0KICAgICAgICAgICAgICAgIG1heCA9IG1heChBZ2UpKQ0KDQpgYGANCg0KVGltZSBmb3IgYSBoaXN0b2dyYW0uIFdlJ2xsIG1ha2UgaXQgZGVjZW50IGxvb2tpbmcgdmlhIGdncGxvdA0KDQpgYGB7cn0NCmQgJT4lIGdncGxvdChhZXMoeCA9IEFnZSkpKw0KICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDEpKyAjYmluZHdpZHRoIHNldHMgdGhlICd3aWR0aCcgb2YgZWFjaCBidWNrZXQvYmFyIG9mIHRoZSBoaXN0b2dyYW0NCiAgdGhlbWVfYncoKQ0KYGBgDQojIDMuIEhvbWUgY291bnRyaWVzDQoNCkFuIGVhc3kgd2F5IHRvIGRvIHRoaXMgaXMgYHVuaXF1ZSgpYA0KDQpgYGB7cn0NCnVuaXF1ZShkJEhvbWVfQ291bnRyeSkNCmBgYA0KDQozNiB1bmlxdWUgaG9tZSBjb3VudHJpZXMgcmVwcmVzZW50ZWQgaW4gdGhlIGRhdGEhDQoNClRvIGNvdW50IHRoZW0sIHdlJ2xsIHVzZSBgY291bnQoKWAgYWdhaW4uIFlvdSBjb3VsZCBhbHNvIHVzZSBgdGFibGUoKWAsIGJ1dCB0aGF0IGNhbiBnZXQgbWVzc3kgd2l0aCBzbyBtYW55IGNvdW50cmllcy4NCg0KYGBge3J9DQpkICU+JSBjb3VudChIb21lX0NvdW50cnkpDQpgYGANClRoaXMgZ2l2ZXMgYSBsb3Qgb2YgaW5mbywgYnV0IHlvdSBjb3VsZCBtYW51YWxseSBmaW5kIHRoZSB0b3AgNS4gWW91IGNvdWxkIHBhc3MgdGhpcyB0byBhIHZpZXdpbmcgd2luZG93IHdpdGggYSBgJT4lIHZpZXcoKWAgYWRkZWQgdG8gdGhhdCBsYXN0IGxpbmUgb2YgY29kZS4NCg0KRmFuY2llciBzb2x1dGlvbjoNCg0KYGBge3J9DQpkICU+JSBjb3VudChIb21lX0NvdW50cnkpICU+JSBhcnJhbmdlKGRlc2MobikpICU+JSBzbGljZV9oZWFkKG4gPSA1KQ0KYGBgDQoNCldoYXQgdGhhdCBjb2RlIGRpZCB3YXMgYXJyYW5nZSB0aGUgb3V0cHV0IGZyb20gYGNvdW50KEhvbWVfY291bnRyeSlgIGluIGRlc2NlbmRpbmcgb3JkZXIgYW5kIHRoZW4gZ3JhYiBqdXN0IHRoZSB0b3AgNSByb3dzLg0KDQojIDQuIFNlbGYtQXNzZXNzZWQgU2tpbGxzDQoNCkZvciB0aGlzLCB5b3UgY2FuIGNvcHktcGFzdGUgdGhlIGNvZGUgZnJvbSBUYXNrIDIuIA0KDQpgYGB7cn0NCiNLb3JTcGsNCmQgJT4lIHN1bW1hcmlzZShtZWFuID0gbWVhbihLb3JTcGspLA0KICAgICAgICAgICAgICAgIHNkID0gc2QoS29yU3BrKSwNCiAgICAgICAgICAgICAgICBtZWRpYW4gPSBtZWRpYW4oS29yU3BrKSwNCiAgICAgICAgICAgICAgICBtaW4gPSBtaW4oS29yU3BrKSwNCiAgICAgICAgICAgICAgICBtYXggPSBtYXgoS29yU3BrKSkNCg0KI0tvckxpcw0KZCAlPiUgc3VtbWFyaXNlKG1lYW4gPSBtZWFuKEtvckxpcyksDQogICAgICAgICAgICAgICAgc2QgPSBzZChLb3JMaXMpLA0KICAgICAgICAgICAgICAgIG1lZGlhbiA9IG1lZGlhbihLb3JMaXMpLA0KICAgICAgICAgICAgICAgIG1pbiA9IG1pbihLb3JMaXMpLA0KICAgICAgICAgICAgICAgIG1heCA9IG1heChLb3JMaXMpKQ0KYGBgDQoNClNjYXR0ZXJwbG90IHRpbWU6DQoNCmBgYHtyfQ0KZCAlPiUgZ2dwbG90KGFlcyh4ID0gS29yTGlzLCB5ID0gS29yU3BrKSkrDQogIGdlb21fcG9pbnQoKSsNCiAgdGhlbWVfYncoKQ0KYGBgDQpUaGlzIHdvcmtzLCBidXQgbWFueSBvZiB0aGUgcG9pbnRzIGFyZSBvdmVybGFwcGluZyAoeW91IGRvbid0IHNlZSBhbGwgMTk4IHBvaW50cyEpLiBPbmUgd2F5IHRvIGZpeCB0aGlzIGlzIGludHJvZHVjZSBzb21lICJqaXR0ZXIiLCBvciBzbGlnaHRseSBhZGp1c3RtZW50IG9mIHBvaW50cyBmb3IgdmlzdWFsaXphdGlvbiBwdXJwb3NlczoNCg0KYGBge3J9DQpkICU+JSBnZ3Bsb3QoYWVzKHggPSBLb3JMaXMsIHkgPSBLb3JTcGspKSsNCiAgZ2VvbV9qaXR0ZXIoKSsNCiAgdGhlbWVfYncoKQ0KYGBgDQojIDYuIEZyZWUgQ2hvaWNlDQoNCkhlcmUncyBzb21ldGhpbmcgaW50ZXJlc3RpbmcuLi4gIk1vdGl2X0xvdmUiIGlzIHNlbGYtcmVwb3J0ZWQgbW90aXZhdGlvbiB0byBsZWFybiBLb3JlYW4gZHVlIHRvIGhhdmluZyBhIEtvcmVhbi1zcGVha2luZyBzaWduaWZpY2FudCBvdGhlci4NCg0KYGBge3J9DQpkICU+JSBnZ3Bsb3QoYWVzKHggPSBNb3Rpdl9Mb3ZlLCB5ID0gS29yU3BrKSkrDQogIGdlb21faml0dGVyKCkrDQogIHRoZW1lX2J3KCkNCmBgYA0KRG9lc24ndCBsb29rIGxpa2UgbXVjaCBvZiBhIHJlbGF0aW9uc2hpcCE=