#Assignment-10: Second & Third Part

School Dropout Rate of Population aged 16 to 19 Years in Different Counties of USA

#Introduction Dropout is an ongoing phenomenon in the schools of United States. On average the estimated graduation rate was between 70% and 80% nationally (Balfanz et al. 2010; Cataldi, Laird, and KewalRamani 2009; Kaufman 2004). However, the rate varies from school to school depending on location and contexts (Balfanz et al. 2010; Balfanz and Legters 2006; Swanson 2004). Graduation rates have been found to be 50% or less in some schools in urban and poor contexts, (Balfanz, et al., 2010; Balfanz & Legters, 2006; Swanson 2004). Many demographic factors such as gender, race, and families in low socioeconomic (SES) have been associated with dropping out of school as well as schools in urban and rural context (Rumberger, 1987, 2004).

##Research questions The purpose of the present study is to answer following research questions using both spatial and non-spacial ways.

  1. Whether or not some counties have relatively higher school dropout rates than other counties in the USA.

  2. Is there any gender differences in county-level school dropout rates.

  3. Whether or not counties with higher poverty rates have relatively higher dropout rates compared to counties with lower poverty rate.

  4. Whether or not non-White concentrated counties have higher dropout rate than White concentrated counties.

#Research Method

##Data County-level information of American Community Survey (ACS) 2011–2015 (5-Year Estimates) database was downloaded from www.socialexplorer.com for analysis. Following variables were used for analyzing the data.

School dropout rate: Total number of school dropout was divided by total number of population (aged 16 to 19 years) and multiplied by 100 to obtain the percentage of total school dropout. Similarly, male and female dropout rates were calculated by using the total number of male dropout and the total number of female dropout respectively and were divided by the total number of population (aged 16 to 19 years) and multiplied by 100.

Poverty rate: Total number of people living in below poverty level was divided by total number of population in each county and was multiplied by 100 to obtain the poverty rate.

Percentage of White race: Total number of White people living in each county was divided by total number of population in each county and was multiplied by 100 to obtain the percentage of the white population.

#Data Analysis Both spatial and non-spatial ways were untilized to analyze and present information in terms of reserach questions.

library(tidyverse)
library(sf)
library(tmap)
library(tigris)
library(spdep)
library(data.world)
dropout <- read_csv("poverty_dropout.csv")
head(dropout)
dropout<-dropout %>% 
  mutate(Total_population=SE_T004_001) %>% 
 mutate(White_prop=(SE_T013_002/SE_T013_001)*100 ) %>% 
 mutate(percent_drop=(SE_T030_002/SE_T030_001)*100)%>% 
 mutate(male_dropped_out=SE_T031_002)  %>% 
mutate(male_percent_drop=(male_dropped_out/SE_T031_001)*100)%>% 
  mutate(female_dropped_out=SE_T032_002) %>% 
  mutate(female_percent_drop=(female_dropped_out/SE_T032_001)*100) %>% 
  mutate(percent_poverty =(SE_T115_002/SE_T115_001)*100)
school_dropout<-school_dropout %>% 
  mutate(GEOID=parse_integer(Geo_FIPS))
t_comb_data_sub <- t_comb_data %>% 
  filter(STATEFP != "02") %>% 
  filter(STATEFP != "15") %>% 
  filter(STATEFP!= "60") %>% 
  filter(STATEFP != "66") %>% 
  filter(STATEFP != "69") %>% 
  filter(STATEFP!= "72") %>% 
  filter(STATEFP!= "78")
library(tmaptools)
us_states <- t_comb_data_sub %>% 
  aggregate_map(by = "STATEFP")
t_comb_data_sub<-t_comb_data_sub %>% 
mutate(maleabove30_drop=male_percent_drop-30)%>% 
mutate(femaleabove30_drop=female_percent_drop-30) %>% 
  mutate(dropout_rate=percent_drop-10)%>% 
mutate(White_race=White_prop-80)

#Spatial Presentation of Results

##Differences of dropout rate between counties

R<-tm_shape(t_comb_data_sub, projection = 2163)+ tm_polygons(col = "percent_drop", palette="RdYlGn", border.col = "grey", border.alpha = .4, title=" Dropout Rate")+tm_shape(us_states)+ tm_borders(lwd = .50, col = "black", alpha = 1)
R

##Gender difference in dropout rate between counties

tm_shape(t_comb_data_sub, projection = 2163)+tm_polygons(c("maleabove30_drop", "femaleabove30_drop"), border.col = "grey", border.alpha = .4,
      title=c("Male Dropout Rate", "Female Dropout Rate"))+tm_style_natural() 

##Proportion of White population and dropout rate

R1<-tm_shape(t_comb_data_sub, projection = 2163) + tm_polygons(col = "White_race") +
tm_shape(t_comb_data_sub, projection = 2163) + tm_bubbles(size = "percent_drop", col = "purple")
R1

##Poverty rate and dropout rate

R2<-tm_shape(t_comb_data_sub, projection = 2163) + tm_polygons(col = "percent_poverty")+
tm_shape(t_comb_data_sub, projection = 2163) + tm_bubbles(size = "percent_drop", col = "blue")+tm_shape(us_states)+ tm_borders(lwd = .50, col = "black", alpha = 1)
R2

#Non-Spatial Presentation of Results

##Differences of dropout rate between counties

p1<-plot(density(na.omit(t_comb_data_sub$percent_drop)), main="Total Dropout Rate")
rug(na.omit(t_comb_data_sub$percent_drop))

##Gender difference in dropout rate between counties

t_comb_data_sub<-t_comb_data_sub %>% 
  mutate(gender_diff=(male_percent_drop-female_percent_drop))
p2<-plot(density(na.omit(t_comb_data_sub$gender_diff)), main="Gender Difference in Dropout Rate")
rug(na.omit(t_comb_data_sub$gender_diff))

##Proportion of White population and dropout rate

library(ggplot2)
g1 <- ggplot(na.omit(t_comb_data_sub), aes(White_prop, percent_drop))
g1 + geom_point() + 
  geom_smooth(method="lm", se=F) +
  labs( y="percent_drop ", 
       x="White_percent", 
       title="Proportion of White Population and Dropout Rate")

##Poverty rate and dropout rate

g2 <- ggplot(na.omit(t_comb_data_sub), aes(percent_poverty, percent_drop))
g2 + geom_point() + 
  geom_smooth(method="lm", se=F) +
  labs( y="percent_drop ", 
       x="percent_poverty", 
       title="Poverty and Dropout Rate")

#Comparative Analysis of Spatial and Non-Spatial Presentation of Results

  1. Results from both spatial and non-spatial ways of analysis suggested that school dropout rates vary between counties of USA. Although non-spatial ways suggested that some counties had relatively higher school dropout rate than others, it was not possible to visualize the position of that county in the map. On the other hand, spatial way provided much richer information in this regard. It easily visualized the relative differences in school dropout rates among counties.Results from spatial presentation suggested not only the variance of school dropout rates in county context but also in State and Regional context. It was obvious from the spatial analysis that some states within the Western region had relatively lower school dropout rates than States within Southeast and Northeast regions. It was also possible to identify the dropout rate of specific counties or states or regions of interest within very short time and very little effort. However, this kind of information was missing in non-spatial ways of presenting information. So, it is obvious that Spatial presentation of information provides information beyond research questions and might be helpful to gain a better understanding to interpret research results.

  2. In terms of information on gender differences in school dropout rate, both spatial and non-spatial presentations suggested that gender differences in dropout rates were not same in all counties. Some counties had higher gender difference of school dropout than others. However, it was not possible to visualize gender differences in dropout for specific county/State/Region of interest in the non-spatial way of presentation. Although it could be possible to obtain information for specific county/State/Region of interest from non-spacial presentation, further steps of analysis would require for that. However, the spatial presentation provided information in very easy and compact ways which were easy to understand and visualize.

  3. In terms of information on dropout rates and their association with the proportion of the White population and the poverty rate, spacial presentation suggested some kind of associations in both cases, it was not possible to obtain the statistical base of those associations. However, the non-spacial analysis provided statistical base and suggested a negative association between the dropout rate and proportion of the white population and positive association between dropout rate and poverty rate. In addition, the non-spatial presentation provided a more generalized picture about the associations and did not provide the counties-specific picture. However, the counties-specific picture was obtained from the spatial presentation.

#Conclusion

Both spatial and non-spatial presentation have strength and weakness of presenting information and could be used in combination or separately depending on research questions of interest.