# Basic R for Graphics"
### Scott Karr
### HW4 Test Score Classification
### 11.22.2015
### HW4 School Expenditures by State
### Packages: ggplot2, data.tables
###Perfect Colinearity:
#### see video [https://www.youtube.com/watch?v=DDRQYKVFoP0]
###Multilinearity:
#### see video [https://www.youtube.com/watch?v=O4jDva9B3fw]
###NYC Dept of Ed. standardized test score data Grades 3-8:
#### see: [http://schools.nyc.gov/Accountability/data/TestResults/ELAandMathTestResults]
## Load Data Frame from website
require(data.table)
## Loading required package: data.table
require(ggplot2)
## Loading required package: ggplot2
#ELAByDistrictUrl <- "/Users/scottkarr/Downloads/District ELA Results 2013-2015.csv"
ELAByDistrictUrl <- "https://raw.githubusercontent.com/scottkarr/hw4/master/District%20ELA%20Results%202013-2015.csv"
df_ELAByDistrict <- read.table(file = ELAByDistrictUrl, header = TRUE, sep = ",")
dt_ELAByDistrict <- data.table(df_ELAByDistrict)
r head(df_ELAByDistrict)
## District Grade Year Category Number.Tested Mean.Scale.Score Lvl1Cnt ## 1 1 3 2013 All Students 877 303 310 ## 2 1 3 2014 All Students 845 299 333 ## 3 1 3 2015 All Students 750 303 264 ## 4 1 4 2013 All Students 830 304 238 ## 5 1 4 2014 All Students 824 307 222 ## 6 1 4 2015 All Students 771 303 226 ## Lvl1Pct Lvl2Cnt Lvl2Pct Lvl3Cnt Lvl3Pct Lvl4Cnt Lvl4Pct PassCnt PassPct ## 1 35.3 266 30.3 224 25.5 77 8.8 301 34.3 ## 2 39.4 213 25.2 232 27.5 67 7.9 299 35.4 ## 3 35.2 205 27.3 191 25.5 90 12.0 281 37.5 ## 4 28.7 306 36.9 169 20.4 117 14.1 286 34.5 ## 5 26.9 274 33.3 174 21.1 154 18.7 328 39.8 ## 6 29.3 265 34.4 143 18.5 137 17.8 280 36.3
r str(df_ELAByDistrict)
## 'data.frame': 672 obs. of 16 variables: ## $ District : int 1 1 1 1 1 1 1 1 1 1 ... ## $ Grade : Factor w/ 7 levels "3","4","5","6",..: 1 1 1 2 2 2 3 3 3 4 ... ## $ Year : int 2013 2014 2015 2013 2014 2015 2013 2014 2015 2013 ... ## $ Category : Factor w/ 1 level "All Students": 1 1 1 1 1 1 1 1 1 1 ... ## $ Number.Tested : int 877 845 750 830 824 771 773 757 790 875 ... ## $ Mean.Scale.Score: int 303 299 303 304 307 303 302 300 304 303 ... ## $ Lvl1Cnt : int 310 333 264 238 222 226 267 264 251 248 ... ## $ Lvl1Pct : num 35.3 39.4 35.2 28.7 26.9 29.3 34.5 34.9 31.8 28.3 ... ## $ Lvl2Cnt : int 266 213 205 306 274 265 270 237 228 342 ... ## $ Lvl2Pct : num 30.3 25.2 27.3 36.9 33.3 34.4 34.9 31.3 28.9 39.1 ... ## $ Lvl3Cnt : int 224 232 191 169 174 143 144 144 160 90 ... ## $ Lvl3Pct : num 25.5 27.5 25.5 20.4 21.1 18.5 18.6 19 20.3 10.3 ... ## $ Lvl4Cnt : int 77 67 90 117 154 137 92 112 151 195 ... ## $ Lvl4Pct : num 8.8 7.9 12 14.1 18.7 17.8 11.9 14.8 19.1 22.3 ... ## $ PassCnt : int 301 299 281 286 328 280 236 256 311 285 ... ## $ PassPct : num 34.3 35.4 37.5 34.5 39.8 36.3 30.5 33.8 39.4 32.6 ...
## Create column class as key column for indexing
setkey(dt_ELAByDistrict, District)
Rename columns as needed
Filter on “2015” and limit dataset to percentages by level and District, Grade, Year.
ans2 <- dt_ELAByDistrict[ Year == 2015 , .(District, Grade, Year, Lvl1Pct, Lvl2Pct, Lvl3Pct, Lvl4Pct, PassPct)]
ggplot(data = ans2) + geom_histogram(aes(x = PassPct))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
g <- ggplot(ans2, aes(x = PassPct, y = Lvl1Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g
7 a. Scatterplot superimposing scatterplot and box diagram, grouped by Grade. Data compares 2015 ELA level2 percentages to passing percentages(level3 & level4). Note the lack of correlation between passing and level2 percentages. Passing and level2 scores are still mutually exclusive but their percentages arent correlated. Note however that for Grades that have higher percentages of level2 scores they have lower percentages of level3 scores. (Compare to 7.) 7 b. Grades 3 & 8 appear to have lower scores (higher percentage passing correlate to level2 than level3 scores) perhaps attributable to less experience with the newer tests. The graphics help identify this pattern which can be explored further
g <- ggplot(ans2, aes(x = PassPct, y = Lvl2Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g
g <- ggplot(ans2, aes(x = PassPct, y = Lvl3Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g
g <- ggplot(ans2, aes(x = PassPct, y = Lvl4Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = Grade)) + facet_wrap(~Grade)
g
g <- ggplot(ans2, aes(x = PassPct, y = Lvl4Pct)) + geom_point() + geom_boxplot()
g <- g + geom_point(aes(color = District)) + facet_wrap(~Grade)
g
```