HW4

Reverse the order of input to the series of dplyr::*_join examples using data from the Nobel laureates in literature and explain the resulting output.

load the data and check data structure, display first 6 rows of Vocab

## 'data.frame':    8 obs. of  2 variables:
##  $ Country: Factor w/ 7 levels "Canada","China",..: 3 6 6 7 1 2 4 5
##  $ Year   : int  2014 1950 2017 2016 2013 2012 2015 2011
##   Country Year
## 1  France 2014
## 2      UK 1950
## 3      UK 2017
## 4      US 2016
## 5  Canada 2013
## 6   China 2012
## 'data.frame':    7 obs. of  3 variables:
##  $ Name  : Factor w/ 7 levels "Alice  Munro",..: 6 2 4 3 1 5 7
##  $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 1 2 1
##  $ Year  : int  2014 1950 2017 2016 2013 2012 1938
##                Name Gender Year
## 1   Patrick Modiano   Male 2014
## 2 Bertrand  Russell   Male 1950
## 3    Kazuo Ishiguro   Male 2017
## 4        Bob  Dylan   Male 2016
## 5      Alice  Munro Female 2013
## 6            Mo Yan   Male 2012

Mutating joins
inner_join: Return all rows from n.countries where there are matching year in n.winner, and all columns from n.countries and n.winner

##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male

left_join: Return all rows from n.countries and and all columns from n.countries and n.winner. All combination of the matches are returned

##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7  Russia 2015              <NA>   <NA>
## 8  Sweden 2011              <NA>   <NA>

right_join: Return all rows from n.winner and and all columns from n.countries and n.winner. All combination of the matches are returned

##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7    <NA> 1938        Pearl Buck Female

full_join: Return all rows and all columns from both n.countries and n.winner

##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7  Russia 2015              <NA>   <NA>
## 8  Sweden 2011              <NA>   <NA>
## 9    <NA> 1938        Pearl Buck Female

Filitering joins
semi_join: Return all rows from n.countries where there are matching year in n.winner, keeping just columns from n.countries

##   Country Year
## 1  France 2014
## 2      UK 1950
## 3      UK 2017
## 4      US 2016
## 5  Canada 2013
## 6   China 2012

Anti_join: Return all rows from n.countries where there are not matching year in n.winner, keeping just columns from n.countries

##   Country Year
## 1  Russia 2015
## 2  Sweden 2011

HW5

Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?

load the data and display first 6 rows of dta

##               V2   V3     V4 V5  V6  V7   V8
## Alabama    4.405 17.2 31.144  8 491 538 1029
## Alaska     8.963 17.6 47.951 47 445 489  934
## Arizona    4.778 19.3 32.175 27 448 496  944
## Arkansas   4.459 17.1 28.934  6 482 523 1005
## California 4.992 24.0 41.078 45 417 485  902
## Colorado   5.443 18.4 34.571 29 462 518  980

rename column

VARIABLE DESCRIPTIONS:
Columns
1 - 16 Name of state (in quotation marks)
18 - 22 Current expenditure per pupil in average daily attendance
in public elementary and secondary schools, 1994-95 (in thousands of dollars)
24 - 27 Average pupil/teacher ratio in public elementary and secondary schools, Fall 1994
29 - 34 Estimated average annual salary of teachers in public elementary and secondary schools, 1994-95 (in thousands of dollars)
36 - 37 Percentage of all eligible students taking the SAT, 1994-95
39 - 41 Average verbal SAT score, 1994-95
43 - 45 Average math SAT score, 1994-95
47 - 50 Average total score on the SAT, 1994-95
available on “http://jse.amstat.org/datasets/sat.txt

create new variable

extract slope of estimate for regressing total score on the SAT onto salary

## Warning: package 'dplyr' was built under R version 3.6.3

give colnames and create new column for the divisons then select those negative slope estimate

##         slope                    division
## 1  -0.2067041        New England.x$salary
## 2  -2.6215161 East South Central.x$salary
## 3 -28.1929942 West South Central.x$salary
## 4  -1.3174294 West North Central.x$salary
## 5 -13.7999482           Mountain.x$salary