Packages

require(dplyr)
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
require(dplyr)
require(magrittr)
Loading required package: magrittr
require(ggplot2)
Loading required package: ggplot2

Load Data

data <- tbl_df(read.csv(file="http://www.personal.psu.edu/dlp/w540/midtermdata.csv"))

Use the following dplyr verbs to transform data:

select

filter

data_filter <- filter(data, sex>1)
head(data_filter)

mutate

data_mutate<- mutate(data, 
                     cal1 = sex - race,
                     cal2 = math_verbal / race * 3)
head(data_mutate)

summarize

data_summarize<-summarize(data, mean_math = mean(math_verbal))
data_summarize

group_by

Create the following plots:

a plot of two continuous variables that include geom_point and geom_smooth geometries;

data(mtcars)
ggplot(data, aes(x=race, y=math_verbal, fill=vocconc)) + geom_dot()
Error: could not find function "geom_dot"

a plot of a discrete and a continuous variable using geom_barand geom_boxplot geometries;

a plot of a single continuous variable using geom_dot plot and geom_histogram geometries.

table(data$race)

   1    2    4 
1195  866 2840 

Label titles axes of plots with easily read descriptions.

ggplot(data, aes(x=as.factor(sex), y=math_verbal, color=sex)) + 
  geom_boxplot() +
  xlab("Gender") +
  ylab("Score of Math Exam in Mid-term") +
  ggtitle("Class 10-B's Midterm Result by Gender")

table(data$sex)

   1    2 
2434 2467 

Select the appropriate statistical technique to compute linear regressions, crosstabulations, and differences between means.

Conduct and interpret null hypothesis tests

t-tests (Ho: 1 = 2), crosstabulations of two discrete variables (Ho: r x c = 0), and in linear regression (Ho: yx = 0 as well as Ho:  = 0).

T-test

data_ttets

    Welch Two Sample t-test

data:  data.male$math_verbal and data.female$math_verbal
t = -1.4204, df = 4880, p-value = 0.1556
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.7863344  0.4450698
sample estimates:
mean of x mean of y 
 47.91873  49.08936 

Crosstabulation

data.crosstabulation <- xtabs(~data$race + data$vocconc)
data.crosstabulation 
         data$vocconc
data$race    0    1
        1  807  388
        2  618  248
        4 1840 1000
summary(data.crosstabulation )
Call: xtabs(formula = ~data$race + data$vocconc)
Number of cases in table: 4901 
Number of factors: 2 
Test for independence of all factors:
    Chisq = 13.488, df = 2, p-value = 0.001178

Regression

data.linear<-lm(math_verbal~vocconc, data=data)
data.linear

Call:
lm(formula = math_verbal ~ vocconc, data = data)

Coefficients:
(Intercept)      vocconc  
      49.18        -2.01  
summary(data.linear)

Call:
lm(formula = math_verbal ~ vocconc, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-49.179 -25.091  -1.097  24.689  52.831 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  49.1790     0.5045  97.477   <2e-16 ***
vocconc      -2.0103     0.8732  -2.302   0.0214 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 28.83 on 4899 degrees of freedom
Multiple R-squared:  0.001081,  Adjusted R-squared:  0.0008768 
F-statistic:   5.3 on 1 and 4899 DF,  p-value: 0.02137

- Calculate and interpret 95% confidence intervals

around 1 = 2 and around  = 0 in t-tests and linear regression analyses, respectively.

Confident Interval from t-test

data_ttets$conf.int[1:2]
[1] -2.7863344  0.4450698

Confident Interval from regression

confint(data.linear)
                2.5 %     97.5 %
(Intercept) 48.189963 50.1681310
vocconc     -3.722225 -0.2983817
LS0tCnRpdGxlOiAiTWlkLXRlcm0iCmF1dGhvcjogIkRyLldobyIKZGF0ZTogIk9jdG9iZXIgNCwgMjAxNiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gRkFMU0UpCmBgYAoKCiMgUGFja2FnZXMKYGBge3J9CnJlcXVpcmUoZHBseXIpCnJlcXVpcmUobWFncml0dHIpCnJlcXVpcmUoZ2dwbG90MikKYGBgCgoKIyBMb2FkIERhdGEKYGBge3J9CmRhdGEgPC0gdGJsX2RmKHJlYWQuY3N2KGZpbGU9Imh0dHA6Ly93d3cucGVyc29uYWwucHN1LmVkdS9kbHAvdzU0MC9taWR0ZXJtZGF0YS5jc3YiKSkKCmBgYAoKCiNVc2UgdGhlIGZvbGxvd2luZyBkcGx5ciB2ZXJicyB0byB0cmFuc2Zvcm0gZGF0YTogCiMjc2VsZWN0IApgYGB7cn0KZGF0YV9zZWxlY3QgPC0gc2VsZWN0KGRhdGEsIHNleCwgcmFjZSkKaGVhZChkYXRhX3NlbGVjdCkKCmBgYAoKCiMjZmlsdGVyIApgYGB7cn0KZGF0YV9maWx0ZXIgPC0gZmlsdGVyKGRhdGEsIHNleD4xKQpoZWFkKGRhdGFfZmlsdGVyKQoKYGBgCgoKIyNtdXRhdGUKYGBge3J9CmRhdGFfbXV0YXRlPC0gbXV0YXRlKGRhdGEsIAogICAgICAgICAgICAgICAgICAgICBjYWwxID0gc2V4IC0gcmFjZSwKICAgICAgICAgICAgICAgICAgICAgY2FsMiA9IG1hdGhfdmVyYmFsIC8gcmFjZSAqIDMpCmhlYWQoZGF0YV9tdXRhdGUpCgpgYGAKCgojI3N1bW1hcml6ZQpgYGB7cn0KZGF0YV9zdW1tYXJpemU8LXN1bW1hcml6ZShkYXRhLCBtZWFuX21hdGggPSBtZWFuKG1hdGhfdmVyYmFsKSkKZGF0YV9zdW1tYXJpemUKCmBgYAoKIyMgZ3JvdXBfYnkKYGBge3J9CmRhdGFfZ3JvdXBieTwtZ3JvdXBfYnkoZGF0YSwgcmFjZSkgJT4lIHN1bW1hcmlzZShtZWFuX3ZlcmJhbD1tZWFuKG1hdGhfdmVyYmFsKSwgbWVhbl92b2Njb25jPW1lYW4odm9jY29uYykpCmRhdGFfZ3JvdXBieQpgYGAKCgoKIyBDcmVhdGUgdGhlIGZvbGxvd2luZyBwbG90czoKIyMgYSBwbG90IG9mIHR3byBjb250aW51b3VzIHZhcmlhYmxlcyB0aGF0IGluY2x1ZGUgZ2VvbV9wb2ludCBhbmQgZ2VvbV9zbW9vdGggZ2VvbWV0cmllczsKCmBgYHtyfQpkYXRhKG10Y2FycykKZ2dwbG90KGRhdGEsIGFlcyh4PXJhY2UsIHk9bWF0aF92ZXJiYWwsIGZpbGw9dm9jY29uYykpICsgZ2VvbV9kb3QoKQoKYGBgCgoKIyMgYSBwbG90IG9mIGEgZGlzY3JldGUgYW5kIGEgY29udGludW91cyB2YXJpYWJsZSB1c2luZyBnZW9tX2JhcmFuZCBnZW9tX2JveHBsb3QgZ2VvbWV0cmllczsKYGBge3J9CmdncGxvdChkYXRhLCBhZXMoeD1yYWNlKSkgKyBnZW9tX2JhcigpCmdncGxvdChkYXRhLCBhZXMoeD1hcy5mYWN0b3IocmFjZSksIHk9bWF0aF92ZXJiYWwpKSArIGdlb21fYm94cGxvdCgpCgoKYGBgCgojIyBhIHBsb3Qgb2YgYSBzaW5nbGUgY29udGludW91cyB2YXJpYWJsZSB1c2luZyBnZW9tX2RvdCBwbG90IGFuZCBnZW9tX2hpc3RvZ3JhbSBnZW9tZXRyaWVzLgpgYGB7cn0KZ2dwbG90KGRhdGEsIGFlcyh4PXJhY2UsIHk9bWF0aF92ZXJiYWwsIGZpbGw9YXMuZmFjdG9yKHJhY2UpKSkgKyBnZW9tX2RvdHBsb3QoKQpnZ3Bsb3QoZGF0YSwgYWVzKHg9cmFjZSkpICsgZ2VvbV9oaXN0b2dyYW0oKQp0YWJsZShkYXRhJHJhY2UpCmBgYAoKCiMjIExhYmVsIHRpdGxlcyBheGVzIG9mIHBsb3RzIHdpdGggZWFzaWx5IHJlYWQgZGVzY3JpcHRpb25zLgpgYGB7cn0KZ2dwbG90KGRhdGEsIGFlcyh4PWFzLmZhY3RvcihzZXgpLCB5PW1hdGhfdmVyYmFsLCBjb2xvcj1zZXgpKSArIAogIGdlb21fYm94cGxvdCgpICsKICB4bGFiKCJHZW5kZXIiKSArCiAgeWxhYigiU2NvcmUgb2YgTWF0aCBFeGFtIGluIE1pZC10ZXJtIikgKwogIGdndGl0bGUoIkNsYXNzIDEwLUIncyBNaWR0ZXJtIFJlc3VsdCBieSBHZW5kZXIiKQp0YWJsZShkYXRhJHNleCkKYGBgCgogIAoKCgojIFNlbGVjdCB0aGUgYXBwcm9wcmlhdGUgc3RhdGlzdGljYWwgdGVjaG5pcXVlIHRvIGNvbXB1dGUgbGluZWFyIHJlZ3Jlc3Npb25zLCBjcm9zc3RhYnVsYXRpb25zLCBhbmQgZGlmZmVyZW5jZXMgYmV0d2VlbiBtZWFucy4KCiMjIENvbmR1Y3QgYW5kIGludGVycHJldCBudWxsIGh5cG90aGVzaXMgdGVzdHMKdC10ZXN0cyAoSG86IO+BrTEgPSDvga0yKSwgY3Jvc3N0YWJ1bGF0aW9ucyBvZiB0d28gZGlzY3JldGUgdmFyaWFibGVzIChIbzog74GyciB4IGMgPSAwKSwgYW5kIGluIGxpbmVhciByZWdyZXNzaW9uIChIbzog74GyeXggPSAwIGFzIHdlbGwgYXMgSG86IO+BoiA9IDApLgoKIyMjIFQtdGVzdCAKYGBge3J9CmRhdGEubWFsZTwtZmlsdGVyKGRhdGEsIHNleD09MSkKZGF0YS5mZW1hbGU8LWZpbHRlcihkYXRhLCBzZXg9PTIpCmRhdGFfdHRldHM8LXQudGVzdChkYXRhLm1hbGUkbWF0aF92ZXJiYWwsIGRhdGEuZmVtYWxlJG1hdGhfdmVyYmFsKQpkYXRhX3R0ZXRzCgpgYGAKCgoKIyMjIENyb3NzdGFidWxhdGlvbgpgYGB7cn0KZGF0YS5jcm9zc3RhYnVsYXRpb24gPC0geHRhYnMofmRhdGEkcmFjZSArIGRhdGEkdm9jY29uYykKZGF0YS5jcm9zc3RhYnVsYXRpb24gCnN1bW1hcnkoZGF0YS5jcm9zc3RhYnVsYXRpb24pCgoKYGBgCgojIyMgUmVncmVzc2lvbgpgYGB7cn0KZGF0YS5saW5lYXI8LWxtKG1hdGhfdmVyYmFsfnZvY2NvbmMsIGRhdGE9ZGF0YSkKZGF0YS5saW5lYXIKc3VtbWFyeShkYXRhLmxpbmVhcikKCmBgYAoKCgoKIyMtIENhbGN1bGF0ZSBhbmQgaW50ZXJwcmV0IDk1JSBjb25maWRlbmNlIGludGVydmFscyAKYXJvdW5kIO+BrTEgPSDvga0yIGFuZCBhcm91bmQg74GiID0gMCBpbiB0LXRlc3RzIGFuZCBsaW5lYXIgcmVncmVzc2lvbiBhbmFseXNlcywgcmVzcGVjdGl2ZWx5LgoKIyMjIENvbmZpZGVudCBJbnRlcnZhbCBmcm9tIHQtdGVzdApgYGB7cn0KZGF0YV90dGV0cyRjb25mLmludFsxOjJdCgpgYGAKCgojIyMgQ29uZmlkZW50IEludGVydmFsIGZyb20gcmVncmVzc2lvbgpgYGB7cn0KY29uZmludChkYXRhLmxpbmVhcikKCmBgYAoKCgo=