# Basic R for Graphics"

###  Scott Karr
###  HW4 Test Score Classification
###  11.22.2015
###  HW4 School Expenditures by State
### Packages:  ggplot2, data.tables

###Perfect Colinearity:
####  see video [https://www.youtube.com/watch?v=DDRQYKVFoP0]

###Multilinearity:
####  see video [https://www.youtube.com/watch?v=O4jDva9B3fw]

###NYC Dept of Ed. standardized test score data Grades 3-8:  
####  see:  [http://schools.nyc.gov/Accountability/data/TestResults/ELAandMathTestResults]

Setup and Load data

  1. Load the New York City English Language Arts test scores dataset by District, Grade and Year. ggplot2 package. Identify variables as numeric or categorical (factors).
## Load Data Frame from website
require(data.table)
## Loading required package: data.table
require(ggplot2)
## Loading required package: ggplot2
#ELAByDistrictUrl <- "/Users/scottkarr/Downloads/District ELA Results 2013-2015.csv"
ELAByDistrictUrl <- "https://raw.githubusercontent.com/scottkarr/hw4/master/District%20ELA%20Results%202013-2015.csv"
df_ELAByDistrict <- read.table(file = ELAByDistrictUrl, header = TRUE, sep = ",")
dt_ELAByDistrict <- data.table(df_ELAByDistrict)

Generate Descriptive Statistics

  1. Generate summary level descriptive statistics:

r head(df_ELAByDistrict)

## District Grade Year Category Number.Tested Mean.Scale.Score Lvl1Cnt ## 1 1 3 2013 All Students 877 303 310 ## 2 1 3 2014 All Students 845 299 333 ## 3 1 3 2015 All Students 750 303 264 ## 4 1 4 2013 All Students 830 304 238 ## 5 1 4 2014 All Students 824 307 222 ## 6 1 4 2015 All Students 771 303 226 ## Lvl1Pct Lvl2Cnt Lvl2Pct Lvl3Cnt Lvl3Pct Lvl4Cnt Lvl4Pct PassCnt PassPct ## 1 35.3 266 30.3 224 25.5 77 8.8 301 34.3 ## 2 39.4 213 25.2 232 27.5 67 7.9 299 35.4 ## 3 35.2 205 27.3 191 25.5 90 12.0 281 37.5 ## 4 28.7 306 36.9 169 20.4 117 14.1 286 34.5 ## 5 26.9 274 33.3 174 21.1 154 18.7 328 39.8 ## 6 29.3 265 34.4 143 18.5 137 17.8 280 36.3

r str(df_ELAByDistrict)

## 'data.frame': 672 obs. of 16 variables: ## $ District : int 1 1 1 1 1 1 1 1 1 1 ... ## $ Grade : Factor w/ 7 levels "3","4","5","6",..: 1 1 1 2 2 2 3 3 3 4 ... ## $ Year : int 2013 2014 2015 2013 2014 2015 2013 2014 2015 2013 ... ## $ Category : Factor w/ 1 level "All Students": 1 1 1 1 1 1 1 1 1 1 ... ## $ Number.Tested : int 877 845 750 830 824 771 773 757 790 875 ... ## $ Mean.Scale.Score: int 303 299 303 304 307 303 302 300 304 303 ... ## $ Lvl1Cnt : int 310 333 264 238 222 226 267 264 251 248 ... ## $ Lvl1Pct : num 35.3 39.4 35.2 28.7 26.9 29.3 34.5 34.9 31.8 28.3 ... ## $ Lvl2Cnt : int 266 213 205 306 274 265 270 237 228 342 ... ## $ Lvl2Pct : num 30.3 25.2 27.3 36.9 33.3 34.4 34.9 31.3 28.9 39.1 ... ## $ Lvl3Cnt : int 224 232 191 169 174 143 144 144 160 90 ... ## $ Lvl3Pct : num 25.5 27.5 25.5 20.4 21.1 18.5 18.6 19 20.3 10.3 ... ## $ Lvl4Cnt : int 77 67 90 117 154 137 92 112 151 195 ... ## $ Lvl4Pct : num 8.8 7.9 12 14.1 18.7 17.8 11.9 14.8 19.1 22.3 ... ## $ PassCnt : int 301 299 281 286 328 280 236 256 311 285 ... ## $ PassPct : num 34.3 35.4 37.5 34.5 39.8 36.3 30.5 33.8 39.4 32.6 ...

Prepare Dataset for Graphics

  1. Index by District
## Create column class as key column for indexing
setkey(dt_ELAByDistrict, District)
  1. Rename columns as needed

  2. Filter on “2015” and limit dataset to percentages by level and District, Grade, Year.

ans2 <- dt_ELAByDistrict[ Year == 2015 , .(District, Grade, Year, Lvl1Pct, Lvl2Pct, Lvl3Pct, Lvl4Pct, PassPct)]
  1. Histogram showing Passing percentages for 2015 by District. Note 4 distinct groupings which are related to levels
ggplot(data = ans2) + geom_histogram(aes(x = PassPct))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

  1. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level1 percentages to passing percentages(level3 & level4). Note the negative correlation since passing and level 1 scores are mutually exclusive, one would expect an inverse relationship between passing and level1 1 percentages.
  2. Histogram showing count of states by increasing per-capita-spending by state
g <- ggplot(ans2, aes(x = PassPct, y = Lvl1Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g

7 a. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level2 percentages to passing percentages(level3 & level4). Note the lack of correlation between passing and level2 percentages. Passing and level2 scores are still mutually exclusive but their percentages arent correlated. Note however that for Grades that have higher percentages of level2 scores they have lower percentages of level3 scores. (Compare to 7.) 7 b. Grades 3 & 8 appear to have lower scores (higher percentage passing correlate to level2 than level3 scores) perhaps attributable to less experience with the newer tests. The graphics help identify this pattern which can be explored further

g <- ggplot(ans2, aes(x = PassPct, y = Lvl2Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g

  1. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level3 percentages to passing percentages(level3 & level4). Note the positive correlation since passing and level 3 & 4 scores are colinear. Also note that lower level2 scores show up as higher level 3 scores.
g <- ggplot(ans2, aes(x = PassPct, y = Lvl3Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g

  1. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level4 percentages to passing percentages(level3 & level4). Note the positive correlation between passing and level4 percentages.
g <- ggplot(ans2, aes(x = PassPct, y = Lvl4Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g

  1. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level4 percentages to passing percentages(level3 & level4). Note the positive correlation between passing and level4 percentages.
g <- ggplot(ans2, aes(x = PassPct, y = Lvl4Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = District)) + facet_wrap(~Grade)
g

```