The societal impact of income inequality is well known. Can school clubs, sports and other after school programs shape a childs future who otherwise would not be exposed to the positive influences additional school activities can bring?
In this data anaylsis we try to answer the question What is the impact of children who are exposed to extra school programs vs children are not exposed to additional programs. This markup will be used to analyze the data and a shiny app will be built to display additional details at this link:
We will use data from City of New York government site from 2011-2012. One dataset will list schools in NYC with their sports programs and the second dataset will show school grades
https://data.cityofnewyork.us/Education/DOE-High-School-Directory-2013-2014/u553-m549/data
Education is one of biggest keys for a child’s success in life. In this project we will try to measure additional school programs impact on a childs life by camparing it with schools graduation and college enrollment rates. The theory is if schools that have more programs have a higher rate of grads and college enrollment then schools that do not, schools with more programs give a child better chance to succeed as an adult.
During the import we replace missing values with N/A by setting na.strings=c("“,”N/A").
The below visualizaitons show the columns that are missing data.
In data prepartion we are renaming DBN columns that were changed during the import. We get a subset of data of interest, rename some columns to shorter names then merge the sports dataset with the grade dataset. Below is a glimpse of the 2 datasets
## Observations: 422
## Variables: 5
## $ DBN <fct> 01M292, 01M448, 01M450, 01M509, 01M539, 01M696, 02M0...
## $ grad2011 <fct> 56.30%, 70.70%, 71.60%, 56.40%, 95.30%, 97.60%, 69.6...
## $ collenroll2011 <fct> 51.90%, 36.30%, 69.20%, 47.70%, 87.00%, 95.70%, 47.1...
## $ grad2012 <fct> 50.70%, 74.40%, 72.70%, 59.50%, 97.50%, 96.60%, 54.8...
## $ collenroll2012 <fct> 40.00%, 67.30%, 55.70%, 47.90%, 82.80%, 96.80%, 30.4...
## Observations: 422
## Variables: 22
## $ DBN <fct> 01M292, 01M448, 01M450, 01M509, 01M539, 01M696, 02M...
## $ BN <fct> M292, M448, M450, M509, M539, M696, M047, M288, M29...
## $ Printed_Name <fct> "Henry Street School for International Studies", "U...
## $ TotStudnts <fct> "388", "331", "636", "367", "1,672", "601", "189", ...
## $ School_Type <fct> NA, NA, NA, NA, NA, NA, NA, CTE School, NA, NA, NA,...
## $ LangClass <fct> "Chinese (Mandarin), Spanish", "Chinese, Spanish", ...
## $ APCourses <fct> "Psychology", "Calculus AB, Chinese Language and Cu...
## $ LeaderSupport <fct> "Math through Card Play, Poetry Club, Drama Club, B...
## $ Sports <fct> "Boxing, CHAMPS, Double Dutch, Table tennis, Track ...
## $ OnlineAP <fct> "Chinese Language and Culture, Spanish Literature",...
## $ OnlineLang <fct> "Chinese (Mandarin), Spanish", NA, "American Sign L...
## $ School_Type2 <fct> NA, NA, NA, NA, NA, NA, NA, CTE School, NA, NA, NA,...
## $ neighborhood <fct> Lower East Side, Lower East Side, East Village, Low...
## $ postalCode <fct> "10,002", "10,002", "10,009", "10,002", "10,002", "...
## $ precinct <int> 7, 7, 9, 7, 7, 7, 13, 18, 7, 18, 5, 18, 18, 7, 7, 1...
## $ school_district <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ lat <dbl> 40.71348, 40.71223, 40.72985, 40.72055, 40.71949, 4...
## $ lon <dbl> -73.98532, -73.98382, -73.98315, -73.98568, -73.979...
## $ Location <fct> "220 Henry Street\nNew York, NY 10002\n(40.7134809,...
## $ Boro <fct> MANHATTAN, MANHATTAN, MANHATTAN, MANHATTAN, MANHATT...
## $ CouncilDistrict <int> 1, 1, 2, 1, 2, 2, 2, 3, 1, 3, 1, 3, 3, 1, 1, 1, 2, ...
## $ CensusTract <fct> "201", "202", "34", "3,001", "2,201", "1,002", "64"...
To prepare our model we remove % columns from grad and college columns and merge the 2 datasets. We then create sprtscode,apcode,langcode,onlinelangcode and onlineapcode columns to show a 1 for schools with the programs and 0 with schools wihtout the programs. Lastly, we write the new dataframe to a csv for later use in our shiny app dashboard. We end with a glimpse of the newly combined dataset.
## Observations: 422
## Variables: 32
## $ DBN <fct> 01M292, 01M448, 01M450, 01M509, 01M539, 01M696, 02M...
## $ BN <fct> M292, M448, M450, M509, M539, M696, M047, M135, M13...
## $ Printed_Name <fct> "Henry Street School for International Studies", "U...
## $ TotStudnts <fct> "388", "331", "636", "367", "1,672", "601", "189", ...
## $ School_Type <fct> NA, NA, NA, NA, NA, NA, NA, "CTE, New School", "CTE...
## $ LangClass <fct> "Chinese (Mandarin), Spanish", "Chinese, Spanish", ...
## $ APCourses <fct> "Psychology", "Calculus AB, Chinese Language and Cu...
## $ LeaderSupport <fct> "Math through Card Play, Poetry Club, Drama Club, B...
## $ Sports <fct> "Boxing, CHAMPS, Double Dutch, Table tennis, Track ...
## $ OnlineAP <fct> "Chinese Language and Culture, Spanish Literature",...
## $ OnlineLang <fct> "Chinese (Mandarin), Spanish", NA, "American Sign L...
## $ School_Type2 <fct> NA, NA, NA, NA, NA, NA, NA, "CTE, New School", "CTE...
## $ neighborhood <fct> Lower East Side, Lower East Side, East Village, Low...
## $ postalCode <fct> "10,002", "10,002", "10,009", "10,002", "10,002", "...
## $ precinct <int> 7, 7, 9, 7, 7, 7, 13, 18, 5, 18, 7, 18, 5, 18, 18, ...
## $ school_district <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ lat <dbl> 40.71348, 40.71223, 40.72985, 40.72055, 40.71949, 4...
## $ lon <dbl> -73.98532, -73.98382, -73.98315, -73.98568, -73.979...
## $ Location <fct> "220 Henry Street\nNew York, NY 10002\n(40.7134809,...
## $ Boro <fct> MANHATTAN, MANHATTAN, MANHATTAN, MANHATTAN, MANHATT...
## $ CouncilDistrict <int> 1, 1, 2, 1, 2, 2, 2, 3, 1, 3, 1, 3, 1, 3, 3, 1, 1, ...
## $ CensusTract <fct> "201", "202", "34", "3,001", "2,201", "1,002", "64"...
## $ grad2011 <dbl> 56.3, 70.7, 71.6, 56.4, 95.3, 97.6, 69.6, NA, NA, 8...
## $ collenroll2011 <dbl> 51.9, 36.3, 69.2, 47.7, 87.0, 95.7, 47.1, NA, NA, 6...
## $ grad2012 <dbl> 50.7, 74.4, 72.7, 59.5, 97.5, 96.6, 54.8, NA, NA, 7...
## $ collenroll2012 <dbl> 40.0, 67.3, 55.7, 47.9, 82.8, 96.8, 30.4, NA, NA, 5...
## $ sprtcode <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ apcode <fct> 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, ...
## $ langcode <fct> 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ leadcode <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ onlineapcode <fct> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, ...
## $ onlinelangcode <fct> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
The summary below gives summary statistics of each column of the merged dataset as well as a subset summary that splits the sports code of 1 and 0
## vars n mean sd median trimmed mad min max
## DBN* 1 422 211.50 121.97 211.50 211.50 156.41 1.00 422.00
## BN* 2 422 211.50 121.97 211.50 211.50 156.41 1.00 422.00
## Printed_Name* 3 422 211.50 121.97 211.50 211.50 156.41 1.00 422.00
## TotStudnts* 4 406 162.40 85.40 167.50 163.37 100.82 1.00 316.00
## School_Type* 5 84 4.63 2.30 3.00 4.47 2.97 1.00 9.00
## LangClass* 6 392 58.46 17.38 70.00 62.26 0.00 1.00 72.00
## APCourses* 7 304 138.90 74.70 143.50 141.23 97.85 1.00 252.00
## LeaderSupport* 8 420 210.50 121.39 210.50 210.50 155.67 1.00 420.00
## Sports* 9 299 140.70 79.58 137.00 140.63 100.82 1.00 281.00
## OnlineAP* 10 46 21.57 11.72 22.50 21.76 15.57 1.00 40.00
## OnlineLang* 11 52 21.44 9.74 26.00 22.48 7.41 1.00 32.00
## School_Type2* 12 84 4.63 2.30 3.00 4.47 2.97 1.00 9.00
## neighborhood* 13 422 52.75 30.40 52.00 51.93 34.10 1.00 109.00
## postalCode* 14 422 53.07 31.61 51.00 52.11 31.13 1.00 117.00
## precinct 15 422 59.90 33.85 52.00 59.65 39.29 1.00 123.00
## school_district 16 422 13.91 9.22 12.00 13.40 10.38 1.00 32.00
## lat 17 422 40.75 0.08 40.74 40.75 0.11 40.53 40.89
## lon 18 422 -73.92 0.07 -73.93 -73.93 0.07 -74.19 -73.73
## Location* 19 422 137.29 75.07 136.50 138.32 97.85 1.00 258.00
## Boro* 20 420 2.39 1.14 2.00 2.34 1.48 1.00 5.00
## CouncilDistrict 21 420 21.78 14.40 18.00 21.17 17.79 1.00 51.00
## CensusTract* 22 420 101.46 57.15 106.50 101.25 74.87 1.00 198.00
## grad2011 23 310 73.80 14.34 72.60 73.61 16.38 41.20 100.00
## collenroll2011 24 291 53.12 19.31 48.90 51.70 18.24 14.10 100.00
## grad2012 25 337 73.70 15.12 73.50 73.99 15.57 25.50 100.00
## collenroll2012 26 314 53.93 19.09 50.35 52.73 17.94 8.90 100.00
## sprtcode* 27 422 1.71 0.45 2.00 1.76 0.00 1.00 2.00
## apcode* 28 422 1.72 0.45 2.00 1.78 0.00 1.00 2.00
## langcode* 29 422 1.93 0.26 2.00 2.00 0.00 1.00 2.00
## leadcode* 30 422 2.00 0.07 2.00 2.00 0.00 1.00 2.00
## onlineapcode* 31 422 1.11 0.31 1.00 1.01 0.00 1.00 2.00
## onlinelangcode* 32 422 1.12 0.33 1.00 1.03 0.00 1.00 2.00
## range skew kurtosis se
## DBN* 421.00 0.00 -1.21 5.94
## BN* 421.00 0.00 -1.21 5.94
## Printed_Name* 421.00 0.00 -1.21 5.94
## TotStudnts* 315.00 -0.08 -1.00 4.24
## School_Type* 8.00 0.50 -0.92 0.25
## LangClass* 71.00 -1.62 1.82 0.88
## APCourses* 251.00 -0.20 -1.23 4.28
## LeaderSupport* 419.00 0.00 -1.21 5.92
## Sports* 280.00 0.00 -1.15 4.60
## OnlineAP* 39.00 -0.11 -1.30 1.73
## OnlineLang* 31.00 -0.68 -0.99 1.35
## School_Type2* 8.00 0.50 -0.92 0.25
## neighborhood* 108.00 0.23 -0.95 1.48
## postalCode* 116.00 0.19 -0.79 1.54
## precinct 122.00 0.13 -1.08 1.65
## school_district 31.00 0.38 -1.04 0.45
## lat 0.36 -0.11 -0.85 0.00
## lon 0.46 0.05 1.06 0.00
## Location* 257.00 -0.08 -1.19 3.65
## Boro* 4.00 0.31 -0.99 0.06
## CouncilDistrict 50.00 0.28 -1.12 0.70
## CensusTract* 197.00 -0.01 -1.24 2.79
## grad2011 58.80 0.12 -0.88 0.81
## collenroll2011 85.90 0.59 -0.35 1.13
## grad2012 74.50 -0.15 -0.46 0.82
## collenroll2012 91.10 0.53 -0.46 1.08
## sprtcode* 1.00 -0.91 -1.17 0.02
## apcode* 1.00 -0.98 -1.04 0.02
## langcode* 1.00 -3.33 9.09 0.01
## leadcode* 1.00 -14.37 205.02 0.00
## onlineapcode* 1.00 2.50 4.26 0.02
## onlinelangcode* 1.00 2.28 3.23 0.02
## mdata$sprtcode: 0
## grad2011 collenroll2011 grad2012 collenroll2012
## Min. : 45.70 Min. :14.10 Min. : 25.50 Min. : 8.90
## 1st Qu.: 66.00 1st Qu.:43.10 1st Qu.: 64.97 1st Qu.:40.95
## Median : 73.80 Median :51.70 Median : 73.20 Median :51.80
## Mean : 75.05 Mean :54.13 Mean : 74.71 Mean :53.06
## 3rd Qu.: 84.95 3rd Qu.:65.30 3rd Qu.: 85.30 3rd Qu.:63.20
## Max. :100.00 Max. :99.00 Max. :100.00 Max. :98.70
## NA's :28 NA's :32 NA's :21 NA's :27
## ------------------------------------------------------------
## mdata$sprtcode: 1
## grad2011 collenroll2011 grad2012 collenroll2012
## Min. : 41.20 Min. : 18.30 Min. : 35.40 Min. : 19.50
## 1st Qu.: 61.50 1st Qu.: 37.75 1st Qu.: 62.45 1st Qu.: 39.83
## Median : 71.60 Median : 47.80 Median : 73.50 Median : 50.00
## Mean : 73.24 Mean : 52.66 Mean : 73.27 Mean : 54.31
## 3rd Qu.: 85.10 3rd Qu.: 65.75 3rd Qu.: 84.60 3rd Qu.: 67.05
## Max. :100.00 Max. :100.00 Max. :100.00 Max. :100.00
## NA's :84 NA's :99 NA's :64 NA's :81
The plot summaries displays a boxplot of graduation and college enrollment split by sports code. The linear plot compares graduation vs college enrollments and plots the sports codes of 1 and 0. The plot shows the grad and college enrollment have a positive correlation.
We begin with a GLM model of sports code vs all of the college grad columns. The linearl model shows that only college enroll 2012 is significant when sports is the response variable.
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.33 0.865 2.70 0.00700
## 2 grad2011 -0.0158 0.0220 -0.716 0.474
## 3 collenroll2011 -0.0101 0.0141 -0.718 0.473
## 4 grad2012 -0.0226 0.0194 -1.17 0.243
## 5 collenroll2012 0.0343 0.0174 1.97 0.0483
## # A tibble: 1 x 7
## null.deviance df.null logLik AIC BIC deviance df.residual
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1 359. 289 -177. 364. 382. 354. 285
We now use a GLM model with a combination of grad and college enrollment with sports code. In this model sports code is slightly significant.
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 258. 6.61 39.0 5.38e-117
## 2 sprtcode1 -5.04 7.96 -0.633 5.27e- 1
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.00139 -0.00208 62.7 0.400 0.527 2 -1611. 3228. 3239.
## # ... with 2 more variables: deviance <dbl>, df.residual <int>
In this next section we switch to a linear model and try to correlate college 2012 enrollment with language, leader, onlineAP, onlineLanguage and sports code. This model says leader code is significant and language code is slightly significant.
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 88.7 18.9 4.68 0.00000425
## 2 sprtcode1 2.18 2.35 0.929 0.354
## 3 apcode1 4.64 2.89 1.60 0.110
## 4 langcode1 11.4 6.17 1.84 0.0664
## 5 leadcode1 -51.1 20.1 -2.55 0.0114
## 6 onlineapcode1 -2.24 3.90 -0.575 0.566
## 7 onlinelangcode1 0.574 3.96 0.145 0.885
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.0344 0.0156 18.9 1.82 0.0938 7 -1366. 2747. 2777.
## # ... with 2 more variables: deviance <dbl>, df.residual <int>
This last model removes all other insignificant predictors and compare college 2012 enroll to leader and language code. This model shows that both leader and language are significant.
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 88.7 18.9 4.69 0.00000415
## 2 langcode1 12.3 6.08 2.03 0.0437
## 3 leadcode1 -46.8 19.8 -2.36 0.0190
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.0235 0.0172 18.9 3.74 0.0247 3 -1367. 2743. 2758.
## # ... with 2 more variables: deviance <dbl>, df.residual <int>
The comparison of graduate to college enroll for 2011 and 2012 shows there is an positive correlation. Both the xyplot and ggplots separte sports , language and leader code. From the visualizaiton you can see sports show a similar trend whether there is sports or no sports. In the language plot, the trend is clear in schools with language class than schools with no language classes. The leader plot show the majority of schools with a positive trend has some type of leader program.
The ggplot confirms the analysis in the xyplot. The sports code plot has a mix of sports and no sports, the language plots show low number of schools with no language classes and leader plot have nearly all schools with a leader program.
These qq density plots hint that in schools where >80% graduate and enroll in college, there is a sports or language or leader program.
With the disovery of the density plot we narrow the histogram plot to schools with >80% students graduates and enroll in college, and split 3 programs. This is one more look at the split of the 3 codes where language and lead code has a higher significance.
Below we setup a subset of schools that have > 80% graduation or college enroll rate and include the location, number of students and program information on the drill down of the map.
This basic modelling showed in NYC in 2011 and 2012 leadership and language courses may have had a significant impact on a child sucess in graduation and applying to college.
As an additional part of this project I created a shiny app to anaylyze more variables in the dataset.
https://apag101.shinyapps.io/final_project/
The shiny app does the following;
Code used in analysis
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE
)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(plotly)
library(effects)
library(leaflet)
library(tidyr)
library(tibble)
library(broom)
library(grid)
library(gridExtra)
library(shiny)
schlsprts <- read.csv("DOE_High_School_Directory_2013-2014.csv",na.strings=c("","N/A"),header=TRUE)
schlgrds <- read.csv("2013_-_2014_DOE_High_School_Performance-Directory.csv",na.strings=c("","N/A"),header=TRUE)
vis_miss(schlsprts)
gg_miss_upset(schlsprts)
vis_miss(schlgrds)
gg_miss_upset(schlgrds)
#Set N/A rows for school sports column to NA
#nrow(subset(schlsprts,is.na(schlsprts$School.Sports)))
#is.na(schlsprts$School.Sports)<-schlsprts$School.Sports == "N/A"
#nrow(subset(schlsprts,is.na(schlsprts$School.Sports)))
#Rename DBA Columns
names(schlsprts)[names(schlsprts) == "ï..DBN"] <- "DBN"
names(schlgrds)[names(schlgrds) == "ï..DBN"] <- "DBN"
#Select subset columns and rename
sub_schlsprts<-subset(schlsprts, select = c('DBN','BN','Printed_Name','Total.Student.10.26','School_Type','Language.Classes','Advanced.Placement.Courses','Extra_Leadership.Support','School.Sports','Online.AP.Courses','Online.Language.Courses','School_Type2','neighborhood','postalCode','precinct','school_district','latitude','longitude','Location.1','Borough','Council.District','Census.Tract'))
names(sub_schlsprts)<-c('DBN','BN','Printed_Name','TotStudnts','School_Type','LangClass','APCourses','LeaderSupport','Sports','OnlineAP','OnlineLang','School_Type2','neighborhood','postalCode','precinct','school_district','lat','lon','Location','Boro','CouncilDistrict','CensusTract')
sub_schlgrds<-subset(schlgrds, select = c('DBN','graduation.2010.11','college.enroll.2010.11','graduation.2011.12','college.enroll.2011.12'))
names(sub_schlgrds)<-c('DBN','grad2011','collenroll2011','grad2012','collenroll2012')
glimpse(sub_schlgrds)
glimpse(sub_schlsprts)
#get subset of rows with School sports NA or N/A values
#subset(schlsprts,is.na(schlsprts$School.Sports)|schlsprts$School.Sports == "N/A")
#set sumarry columns from characters to numeric
sub_schlgrds[,-1]<-lapply(sub_schlgrds[,-1], function(x) as.numeric(sub("%","",x)))
#Merge School and sports details with summary data from performance details
mdata<-merge(sub_schlsprts, sub_schlgrds, by="DBN")
#Add column to show if a school as sports programs
mdata$sprtcode<- as.factor(ifelse(is.na(mdata$Sports), 0, 1))
mdata$apcode<- as.factor(ifelse(is.na(mdata$APCourses), 0, 1))
mdata$langcode<- as.factor(ifelse(is.na(mdata$LangClass), 0, 1))
mdata$leadcode<- as.factor(ifelse(is.na(mdata$LeaderSupport), 0, 1))
mdata$onlineapcode<- as.factor(ifelse(is.na(mdata$OnlineAP), 0, 1))
mdata$onlinelangcode<- as.factor(ifelse(is.na(mdata$OnlineLang), 0, 1))
write.csv(mdata,"schooldataupdated.csv")
glimpse(mdata)
describe(mdata)
by(subset(mdata, select =colnames(sub_schlgrds[-1])),mdata$sprtcode,summary)
par(mfrow = c(2,2))
plot(grad2011~sprtcode,mdata)
plot(collenroll2011~grad2011, pch=as.character(sprtcode), mdata)
abline(lm(collenroll2011~grad2011, data = mdata))
plot(grad2012~sprtcode,mdata)
plot(collenroll2011~grad2011, pch=as.character(sprtcode), mdata)
abline(lm(collenroll2011~grad2011, data = mdata))
glmod<-glm(sprtcode~grad2011+collenroll2011+grad2012+collenroll2012,data=mdata,family="binomial"(link="logit"))
tidy(glmod)
glance(glmod)
par(mfrow = c(2,2))
plot(glmod)
glmod<-lm(grad2011+collenroll2011+grad2012+collenroll2012~sprtcode,data=mdata,family="binomial"(link="logit"))
tidy(glmod)
glance(glmod)
par(mfrow = c(2,2))
plot(glmod)
lmod1<-lm(collenroll2012~sprtcode+apcode+langcode+leadcode+onlineapcode+onlinelangcode,data=mdata)
tidy(lmod1)
glance(lmod1)
par(mfrow = c(2,2))
plot(lmod1)
lmod2<-lm(collenroll2012~langcode+leadcode,data=mdata)
tidy(lmod2)
glance(lmod2)
par(mfrow = c(2,2))
plot(lmod2)
xyplot(grad2011 ~ collenroll2011 | sprtcode,mdata, type="l")
xyplot(grad2011 ~ collenroll2011 | langcode,mdata, type="l")
xyplot(grad2011 ~ collenroll2011 | leadcode,mdata, type="l")
g1<-ggplot(mdata, aes(x=grad2011, y=collenroll2011, color = sprtcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
g2<-ggplot(mdata, aes(x=grad2012, y=collenroll2012, color = sprtcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
g3<-ggplot(mdata, aes(x=grad2011, y=collenroll2011, color = langcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
g4<-ggplot(mdata, aes(x=grad2012, y=collenroll2012, color = langcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
g5<-ggplot(mdata, aes(x=grad2011, y=collenroll2011, color = leadcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
g6<-ggplot(mdata, aes(x=grad2012, y=collenroll2012, color = leadcode)) +geom_point() +stat_smooth(method="lm", se=TRUE)
grid.arrange(g1, g2, g3, g4, g5, g6, ncol = 2)
q1<-qplot(grad2011, data = mdata, geom = "density",
fill = sprtcode)
q2<-qplot(grad2012, data = mdata, geom = "density",
fill = sprtcode)
q3<-qplot(collenroll2011, data = mdata, geom = "density",
fill = sprtcode)
q4<-qplot(collenroll2012, data = mdata, geom = "density",
fill = sprtcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
q1<-qplot(grad2011, data = mdata, geom = "density",
fill = langcode)
q2<-qplot(grad2012, data = mdata, geom = "density",
fill = langcode)
q3<-qplot(collenroll2011, data = mdata, geom = "density",
fill = langcode)
q4<-qplot(collenroll2012, data = mdata, geom = "density",
fill = langcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
q1<-qplot(grad2011, data = mdata, geom = "density",
fill = leadcode)
q2<-qplot(grad2012, data = mdata, geom = "density",
fill = leadcode)
q3<-qplot(collenroll2011, data = mdata, geom = "density",
fill = leadcode)
q4<-qplot(collenroll2012, data = mdata, geom = "density",
fill = leadcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
q1<-qplot(grad2011, data = subset(mdata,grad2011 > 80), geom = "histogram",fill = sprtcode)+facet_wrap(~ sprtcode)
q2<-qplot(grad2012, data = subset(mdata,grad2012 > 80), geom = "histogram",
fill = sprtcode)+facet_wrap(~ sprtcode)
q3<-qplot(collenroll2011, data = subset(mdata,collenroll2011 > 80), geom = "histogram",
fill = sprtcode)+facet_wrap(~ sprtcode)
q4<-qplot(collenroll2012, data = subset(mdata,collenroll2012 > 80), geom = "histogram",
fill = sprtcode)+facet_wrap(~ sprtcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
q1<-qplot(grad2011, data = subset(mdata,grad2011 > 80), geom = "histogram",fill = langcode)+facet_wrap(~ langcode)
q2<-qplot(grad2012, data = subset(mdata,grad2012 > 80), geom = "histogram",
fill = langcode)+facet_wrap(~ langcode)
q3<-qplot(collenroll2011, data = subset(mdata,collenroll2011 > 80), geom = "histogram",
fill = langcode)+facet_wrap(~ langcode)
q4<-qplot(collenroll2012, data = subset(mdata,collenroll2012 > 80), geom = "histogram",
fill = langcode)+facet_wrap(~ langcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
q1<-qplot(grad2011, data = subset(mdata,grad2011 > 80), geom = "histogram",fill = leadcode)+facet_wrap(~ leadcode)
q2<-qplot(grad2012, data = subset(mdata,grad2012 > 80), geom = "histogram",
fill = leadcode)+facet_wrap(~ leadcode)
q3<-qplot(collenroll2011, data = subset(mdata,collenroll2011 > 80), geom = "histogram",
fill = leadcode)+facet_wrap(~ leadcode)
q4<-qplot(collenroll2012, data = subset(mdata,collenroll2012 > 80), geom = "histogram",
fill = leadcode)+facet_wrap(~ leadcode)
grid.arrange(q1, q2, q3, q4, ncol = 2)
#install.packages("leaflet")
mdata80<-subset(mdata,grad2011 > 80|grad2012>80|collenroll2012>80|collenroll2011>80)
#Set Details of data
content <- paste(sep = "<br/>",
"------------------------------",
'<b>School Information:</>',
mdata80$Printed_Name,
mdata80$Boro,
mdata80$Location,
"------------------------------",
'<b>Total Students:</>',
mdata80$TotStudnts,
"------------------------------",
'<b>Sports:</>',
mdata80$Sports,
"------------------------------",
'<b>Language Classes</>',
mdata80$LangClass,
"------------------------------",
'<b>AP Courses</>',
mdata80$APCourses,
"------------------------------",
'<b>Leader Support</>',
mdata80$LeaderSupport)
df = data.frame(Lat = mdata80$lat, Long = mdata80$lon)
m <- leaflet(df, width = 900, height = 900) %>%
setView(lng = mdata80$lon[nrow(mdata80)], lat = mdata80$lat[nrow(mdata80)], zoom = 12)
m %>%
addTiles() %>%
#addCircles()%>%
addMarkers(clusterOptions = markerClusterOptions(),mdata80$lon, mdata80$lat, popup = paste(sep = "<br/>", content))
shinyAppDir(
system.file('..//Final Project/', package="shiny"),
options = list(width = "100%", height = 700)
)