This data set contain research information census from Doctorate recipients by historical major fields. This data requires some data manipulation to transform it from wide to long. What are the fields that have the highest numbers of doctorate recipients in 2022. We only selected all the major fields and excluded any field that summarize all the fields. This is how the data look when enter our data platform.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(gt)
raw <- read_csv("https://raw.githubusercontent.com/joewarner89/CUNY-607/main/Project%202/nsf24300-tab001-0033.csv",skip = 3)
## New names:
## Rows: 45 Columns: 15
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (15): Field of doctorate, 1992, ...3, 1997, ...5, 2002, ...7, 2007, ...9...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...3`
## • `` -> `...5`
## • `` -> `...7`
## • `` -> `...9`
## • `` -> `...11`
## • `` -> `...13`
## • `` -> `...15`
# read the data set
raw <- data.frame(raw)
head(raw)
## Field.of.doctorate X1992 ...3 X1997 ...5
## 1 <NA> Number Percent Number Percent
## 2 All fields 38,886 100.0 42,539 100.0
## 3 Life sciences 7,172 18.4 8,421 19.8
## 4 Agricultural sciences and natural resources 1,261 3.2 1,212 2.8
## 5 Biological and biomedical sciences 4,799 12.3 5,788 13.6
## 6 Health sciences 1,112 2.9 1,421 3.3
## X2002 ...7 X2007 ...9 X2012 ...11 X2017 ...13 X2022 ...15
## 1 Number Percent Number Percent Number Percent Number Percent Number Percent
## 2 40,031 100.0 48,132 100.0 50,943 100.0 54,552 100.0 57,596 100.0
## 3 8,478 21.2 10,702 22.2 11,964 23.5 12,554 23.0 13,211 22.9
## 4 1,129 2.8 1,321 2.7 1,255 2.5 1,493 2.7 1,434 2.5
## 5 5,695 14.2 7,238 15.0 8,322 16.3 8,566 15.7 9,218 16.0
## 6 1,654 4.1 2,143 4.5 2,387 4.7 2,495 4.6 2,559 4.4
This data set comes with numerical columns which R Studio would have difficulty to tidy so We need to rename the variables. R studio put a X in front of all the number-like columns and dots and numbers for empty Column names. For Examples: If a column name starts with 1992, R would represent that column like X1992.
# deleted unnecessary rows
raw <- raw[-c(1),]
# Rename the variable so there are no numerical variable in the raw data
raw <- raw %>% rename(Doctorate_Field = 1,
y1992 = 2,
y1992_ = 3,
y1997 = 4,
y1997_ = 5,
y2002 = 6,
y2002_ = 7,
y2007 = 8,
y2007_ = 9,
y2012 = 10,
y2012_ = 11,
y2017 = 12,
y2017_ = 13,
y2022 = 14,
y2022_ = 15)
# Transforming the data set
head(raw)
## Doctorate_Field y1992 y1992_ y1997 y1997_
## 2 All fields 38,886 100.0 42,539 100.0
## 3 Life sciences 7,172 18.4 8,421 19.8
## 4 Agricultural sciences and natural resources 1,261 3.2 1,212 2.8
## 5 Biological and biomedical sciences 4,799 12.3 5,788 13.6
## 6 Health sciences 1,112 2.9 1,421 3.3
## 7 Physical sciences and earth sciences 4,517 11.6 4,550 10.7
## y2002 y2002_ y2007 y2007_ y2012 y2012_ y2017 y2017_ y2022 y2022_
## 2 40,031 100.0 48,132 100.0 50,943 100.0 54,552 100.0 57,596 100.0
## 3 8,478 21.2 10,702 22.2 11,964 23.5 12,554 23.0 13,211 22.9
## 4 1,129 2.8 1,321 2.7 1,255 2.5 1,493 2.7 1,434 2.5
## 5 5,695 14.2 7,238 15.0 8,322 16.3 8,566 15.7 9,218 16.0
## 6 1,654 4.1 2,143 4.5 2,387 4.7 2,495 4.6 2,559 4.4
## 7 3,875 9.7 4,956 10.3 5,419 10.6 6,082 11.1 6,649 11.5
Tidy package is useful for grouping certain rows and turn them as column. This package simplifies the process of transforming a data set from making them wide or long or spreading the rows into columns. It is essential to creating year, numbers and percent.
# Creating pivot pramaters var1 and var2
var1 <- c("y1992","y1997","y2002","y2007","y2012","y2017","y2022")
var2 <- c("y1992_","y1997_","y2002_","y2007_","y2012_","y2017_","y2022_")
# Creating 2 data sets to fully transform the problem
# fist data set transform data for year that match all the numbers
rawdata <- pivot_longer(
data = raw,
cols = all_of(var1),
names_to = "year",
values_to = "numbers"
)
# 2nd data set transform data for year that match all the percentage
raw_data1 <- pivot_longer(
data = rawdata,
cols = all_of(var2),
names_to = "year_",
values_to = "percent"
)
raw_data1$year <- str_extract(raw_data1$year,"\\d+")
raw_data1$year_ <- str_extract(raw_data1$year_,"\\d+")
# creating data with the correct alignment
data <- raw_data1 %>% filter(year == year_) %>% select(Doctorate_Field,year,numbers,percent)
data$numbers <- as.numeric(data$numbers)
## Warning: NAs introduced by coercion
data$percent <- as.double(data$percent)
#Subset the data to get all major in 2022
final <- data %>% filter(Doctorate_Field!="All fields",
year == '2022' ) %>% arrange(desc(numbers) )
head(final)
## # A tibble: 6 × 4
## Doctorate_Field year numbers percent
## <chr> <chr> <dbl> <dbl>
## 1 Civil engineering 2022 898 1.6
## 2 Teaching fields 2022 890 1.5
## 3 History 2022 750 1.3
## 4 Education administration 2022 734 1.3
## 5 Political science and government 2022 678 1.2
## 6 Sociology 2022 611 1.1
We want to determine what Doctorate Program have the Highest numbers of recipients in 2022. this questions can help us understand in the near future, what fields that is going to have a higher employment rate.
# Number of PhD Graduates
final %>% head(10) %>% filter(Doctorate_Field!="Other") %>%
ggplot( aes(x=Doctorate_Field, y=numbers) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("Top 10 Doctorate Major in 2022") + ggtitle("Highest Doctorate Receipients")+
ylab("Number of Doctorate Recipients by Major in 2022")
# Increase percentage of Graduate compare to previous year in 2022
final %>% head(10) %>% filter(Doctorate_Field!="Other") %>%
ggplot( aes(x=Doctorate_Field, y=percent) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("Top 10 Doctorate Majors in 2022") + ggtitle("Highest Doctorate Recipients Graduate prior previous year")+
ylab("Percentage of Doctorate Recipients by Major in 2022")
What we learn form this data from the National Science Foundation is that Teaching fields major and Civil Engineering have the highest number of graduates in 2022 and seem that the numbers of graduates increase every year. This suggests that these majors would be very important in near future. United of America lays out its plan for national infrastructure. According to New York Times, Biden Details $2 Trillion Plan to Rebuild Infrastructure and Reshape the Economy. The president plan is to fix 20000 miles of roads and 10000 bridges. Civil engineering and Teaching Fields will be essential in the upcoming infrastructure project.
According to United Nation World Tourism (UNWTO), Tourism is a social, cultural and economic phenomenon which entails the movement of people to countries or places outside their usual environment for personal or business/professional purposes. These people are called visitors (which may be either tourists or excursionists; residents or non-residents) and tourism has to do with their activities, some of which involve tourism expenditure. This data sets has information about how many visitors travel per country.
This is how the data look:
# load the data from Github
tourist <- read.csv("https://raw.githubusercontent.com/joewarner89/CUNY-607/main/Project%202/OECD%20-%20Tourism%20data.csv", stringsAsFactors = F, skip = 1)
# Delete Empty row
tourist$X <- NULL
# Delete unwanted rows at the bottom of the files
tourist <- tourist[-c(208:213),]
head(tourist)
## Country Variable X2008 X2009
## 1 Australia Total domestic trips 210,753,700 215,845,100
## 2 Overnight visitors (tourists) 72,008,700 67,669,600
## 3 Same-day visitors (excursionists) 138,745,000 148,175,500
## 4 Nights in all types of accommodation 277,865,400 262,235,600
## 5 Hotels and similar establishments 75,646,100 70,740,800
## 6 Other collective establishments 68,603,000 69,302,700
## X2010 X2011 X2012 X2013 X2014 X2015
## 1 225,239,200 233,126,300 248,377,400 240,118,400 260,362,000 269,481,500
## 2 69,296,800 71,894,700 74,472,200 75,796,400 84,480,800 87,523,200
## 3 155,942,400 161,231,600 173,905,200 164,322,000 175,881,200 181,958,300
## 4 265,393,200 270,573,300 281,732,900 282,679,700 310,532,600 317,535,400
## 5 71,929,600 74,367,000 71,529,900 71,977,900 79,604,700 81,120,800
## 6 70,154,400 69,891,400 72,239,000 72,161,500 70,625,100 83,114,200
## X2016 X2017 X2018 X2019 X2020 X2021
## 1 280,324,300 291,796,700 310,166,200 365,796,900 236,705,700 242,543,000
## 2 90,741,700 98,483,900 104,821,900 117,447,700 72,513,800 82,074,100
## 3 189,582,600 193,312,800 205,344,300 248,349,200 164,191,900 160,468,900
## 4 334,798,200 352,085,100 371,527,500 417,906,500 275,403,600 321,108,900
## 5 83,092,300 87,519,400 95,297,500 101,046,800 51,588,600 67,162,100
## 6 90,291,300 94,045,100 94,416,100 111,904,800 79,557,400 90,006,300
This data set has a lot of empty spaces. We are going to fill out
with the country column with the respective country every observation
belong to then creating two data set to create the year and visitors
column. both newly created columns can be constructed using
gather() or pivot_long() depending on how you
select your parameters
# fill the values in country so the right country aligned with his population visitors
tourist <- tourist %>% mutate(Country = as.character(na_if(Country,""))) %>% fill(Country,.direction = 'down')
var_t <- c("X2008","X2009","X2010","X2011","X2012","X2013"
,"X2014","X2015","X2016","X2017","X2018",
"X2019","X2020","X2021")
# creating new variables.
prep <- pivot_longer(
data = tourist,
cols = all_of(var_t),
names_to = "year",
values_to = "visitors"
)
# replace the missing value with o
prep <- prep %>%
mutate(Country = str_replace(Country, "T�rkiye","Turkey"))
prep$visitors[prep$visitors == '..'] <- 0
prep$visitors <- as.numeric(gsub(",", "", prep$visitors))
# re-transform the dataset so
prep2 <- spread(prep,
key = "Variable",
value = "visitors")
## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
## `.name_repair` is omitted as of tibble 2.0.0.
## ℹ Using compatibility `.name_repair`.
## ℹ The deprecated feature was likely used in the tidyr package.
## Please report the issue at <https://github.com/tidyverse/tidyr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
prep2$year <- str_extract(prep2$year,"\\d+")
prep2$V1 <- NULL
### Rename the variables
tourist_final <- prep2 %>% rename(Hotel_Establishments = 3,
Collective_Establishments = 4,
Overnights_visitors = 5,
Private_Accomodation = 6,
SameDay_Accomodation = 7,
Nights_Accomodations = 8,
Total_Domestic_Trips = 9)
tourist_final[is.na(tourist_final)] = 0
tourist_final$Country[tourist_final$Country == 'T<fc>rkiye'] <- 'Turkey'
# look at tourism in 2021
tourist21 <- tourist_final %>% filter(year == 2021)
finaldata <- tourist21 %>% arrange(desc(Hotel_Establishments)) %>% filter(Hotel_Establishments != 0) %>%
select(Country,year,Hotel_Establishments) %>% head(10)
head(finaldata)
## # A tibble: 6 × 3
## Country year Hotel_Establishments
## <chr> <chr> <dbl>
## 1 "United States" 2021 1157502833
## 2 "France" 2021 277538829
## 3 " Saudi Arabia" 2021 204645844
## 4 " Indonesia" 2021 185971368
## 5 "Germany" 2021 148135171
## 6 "Spain" 2021 67637768
Now We are going to look at countries that have a lot visitor booking Hotels.
tour_tbl <- gt(finaldata)
tour_tbl <-
tour_tbl |>
tab_header(
title = md("**Top 10 Touristic Country in the World**"),
subtitle = md("countries with Highest Hotel Accomodation by Tourists")
)
tour_tbl
| Top 10 Touristic Country in the World | ||
| countries with Highest Hotel Accomodation by Tourists | ||
| Country | year | Hotel_Establishments |
|---|---|---|
| United States | 2021 | 1157502833 |
| France | 2021 | 277538829 |
| Saudi Arabia | 2021 | 204645844 |
| Indonesia | 2021 | 185971368 |
| Germany | 2021 | 148135171 |
| Spain | 2021 | 67637768 |
| Australia | 2021 | 67162100 |
| Poland | 2021 | 42968711 |
| Turkey | 2021 | 41654000 |
| Denmark | 2021 | 28989789 |
finaldata %>% ggplot( aes(x=Country, y=Hotel_Establishments)) +
geom_segment( aes(xend=Country, yend=0)) +
geom_point( size=4, color="orange") + scale_y_continuous(trans='log2') +
theme_bw() +
xlab("Top 10 Touristic Countries in the World") + ggtitle("Top 10 Touristic Countries By Hotel Establishments in 2021")+
ylab("Number of visitors by Hotel Establishments in 2021")
## Warning: Transformation introduced infinite values in continuous y-axis
More people have traveled to USA than any other countries. Big Cities in the USA have a lot great establishments such as Hotels, Private Properties and so on. Tourism has a great effect on any country economy. The top 10 countries revealed in the data sets has the most decorative hotels and can the one of the best accommodations in the World.USA and France are considered one of the most popular place to visit.
According to the city of Chicago, This data set shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. Metrics are described as follows (also available for download at http://bit.ly/uhbzah): NDA indicates “No Data Available.”
We are going to transform the data set so we can look into which high school perform well and how impacted is the teacher involvement in the school year.
First look of the data set
raw <- read.csv("https://raw.githubusercontent.com/joewarner89/CUNY-607/main/Project%202/schools_by_performance_level.csv", stringsAsFactors = F,sep = ",")
head(raw)
## School.ID Name.of.School
## 1 610539 Marvin Camras Elementary School
## 2 609852 Eliza Chappell Elementary School
## 3 609818 Luther Burbank Elementary School
## 4 609680 Walter Payton College Preparatory High School
## 5 610038 Abraham Lincoln Elementary School
## 6 609749 Northside College Preparatory High School
## Elementary..Middle..or.High.School Street.Address Phone.Number
## 1 ES 3000 N Mango Ave (773) 534-2960
## 2 ES 2135 W Foster Ave (773) 534-2390
## 3 ES 2035 N Mobile Ave (773) 534-3000
## 4 HS 1034 N Wells St (773) 534-0034
## 5 ES 615 W Kemper Pl (773) 534-5720
## 6 HS 5501 N Kedzie Ave (773) 534-3954
## Link
## 1 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610539.pdf
## 2 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609852.pdf
## 3 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609818.pdf
## 4 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609680.pdf
## 5 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf
## 6 http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_609749.pdf
## Network.Manager Adequate.Yearly.Progress.Made.
## 1 Fullerton Elementary Network No
## 2 Ravenswood-Ridge Elementary Network Yes
## 3 Fullerton Elementary Network No
## 4 North-Northwest Side High School Network Yes
## 5 Fullerton Elementary Network No
## 6 North-Northwest Side High School Network Yes
## Track.Schedule CPS.Performance.Policy.Status CPS.Performance.Policy.Level
## 1 Standard Not on Probation Not Enough Data
## 2 Standard Not on Probation Level 1
## 3 Standard Not on Probation Level 2
## 4 Standard Not on Probation Level 1
## 5 Standard Not on Probation Level 1
## 6 Standard Not on Probation Level 1
## Healthy.Schools.Certified. Safety.Icon Safety.Score Family.Involvement.Icon
## 1 No Average 54 Average
## 2 No Strong 70 Strong
## 3 No Weak 37 NDA
## 4 No Very Strong 98 NDA
## 5 Yes Very Strong 99 Very Strong
## 6 No Very Strong 99 NDA
## Family.Involvement.Score Environment.Icon Environment.Score Instruction.Icon
## 1 58 Weak 37 Average
## 2 65 Average 53 Average
## 3 NDA Average 42 Weak
## 4 NDA Very Strong 80 Strong
## 5 99 Strong 74 Strong
## 6 NDA Very Strong 99 Very Strong
## Instruction.Score Leaders.Icon Leaders.Score Teachers.Icon Teachers.Score
## 1 41 Very Strong 83 Very Strong 88
## 2 51 Average 56 Average 48
## 3 34 NDA NDA NDA NDA
## 4 77 NDA NDA NDA NDA
## 5 66 Strong 65 Strong 70
## 6 88 NDA NDA NDA NDA
## Parent.Engagement.Icon Parent.Engagement.Score Parent.Environment.Icon
## 1 Average 51 Strong
## 2 Average 50 Average
## 3 Average 47 Average
## 4 NDA NDA NDA
## 5 Strong 56 Average
## 6 Strong 57 Strong
## Parent.Environment.Score Average.Student.Attendance
## 1 55 95.1
## 2 52 95.1
## 3 49 95.2
## 4 NDA 93.4
## 5 47 96.0
## 6 62 95.7
## Rate.of.Misconducts..per.100.students. Average.Teacher.Attendance
## 1 4.3 0.0
## 2 2.9 96.7
## 3 9.8 95.5
## 4 0.7 96.1
## 5 2.0 96.4
## 6 2.8 96.8
## Individualized.Education.Program.Compliance.Rate Pk.2.Literacy.. Pk.2.Math..
## 1 97.7 47.5 39.9
## 2 98.9 72.6 57
## 3 100.0 60.8 45.2
## 4 100.0 NDA NDA
## 5 95.8 80.1 43.3
## 6 98.4 NDA NDA
## Gr3.5.Grade.Level.Math.. Gr3.5.Grade.Level.Read.. Gr3.5.Keep.Pace.Read..
## 1 30.8 27.8 47.8
## 2 56 54.4 55.2
## 3 42.8 34.9 61.2
## 4 NDA NDA NDA
## 5 89.6 84.9 60.7
## 6 NDA NDA NDA
## Gr3.5.Keep.Pace.Math.. Gr6.8.Grade.Level.Math.. Gr6.8.Grade.Level.Read..
## 1 38.6 41.1 38.5
## 2 69 53.2 63.9
## 3 63.8 41 34.4
## 4 NDA NDA NDA
## 5 62.6 81.9 85.2
## 6 NDA NDA NDA
## Gr6.8.Keep.Pace.Math. Gr6.8.Keep.Pace.Read.. Gr.8.Explore.Math..
## 1 49.2 55.4 15.3
## 2 60 59.8 12.8
## 3 55 54.5 13.1
## 4 NDA NDA NDA
## 5 52 62.4 66.3
## 6 NDA NDA NDA
## Gr.8.Explore.Read.. ISAT.Exceeding.Math.. ISAT.Exceeding.Reading..
## 1 30.5 11.2 12.0
## 2 41 19.6 17.6
## 3 17.2 20.6 11.3
## 4 NDA NA NA
## 5 77.9 69.7 64.4
## 6 NDA NA NA
## ISAT.Value.Add.Math ISAT.Value.Add.Read ISAT.Value.Add.Color.Math
## 1 -1.8 -0.3 Red
## 2 0.4 1.0 Yellow
## 3 -0.2 0.2 Yellow
## 4 NA NA NDA
## 5 0.2 0.9 Yellow
## 6 NA NA NDA
## ISAT.Value.Add.Color.Read Students.Taking..Algebra..
## 1 Yellow NDA
## 2 Green 27.5
## 3 Yellow NDA
## 4 NDA NDA
## 5 Green 67.1
## 6 NDA NDA
## Students.Passing..Algebra.. X9th.Grade.EXPLORE..2009.
## 1 NDA NDA
## 2 63.6 NDA
## 3 NDA NDA
## 4 NDA 21.2
## 5 54.5 NDA
## 6 NDA 22.4
## X9th.Grade.EXPLORE..2010. X10th.Grade.PLAN..2009. X10th.Grade.PLAN..2010.
## 1 NDA NDA NDA
## 2 NDA NDA NDA
## 3 NDA NDA NDA
## 4 21.8 23.1 23.2
## 5 NDA NDA NDA
## 6 22.2 24.5 24.7
## Net.Change.EXPLORE.and.PLAN X11th.Grade.Average.ACT..2011.
## 1 NDA NDA
## 2 NDA NDA
## 3 NDA NDA
## 4 2 27
## 5 NDA NDA
## 6 2.3 28.8
## Net.Change.PLAN.and.ACT College.Eligibility.. Graduation.Rate..
## 1 NDA NDA NDA
## 2 NDA NDA NDA
## 3 NDA NDA NDA
## 4 3.9 96.4 96.9
## 5 NDA NDA NDA
## 6 4.3 98 97.6
## College.Enrollment.Rate.. College.Enrollment..number.of.students.
## 1 NDA 826
## 2 NDA 451
## 3 NDA 1139
## 4 82.4 881
## 5 NDA 813
## 6 90.7 1053
## General.Services.Route Freshman.on.Track.Rate.. X_COORDINATE Y_COORDINATE
## 1 30 NDA 1137482 1919395
## 2 31 NDA 1161017 1934467
## 3 29 NDA 1134123 1913042
## 4 33 90.7 1174485 1907490
## 5 33 NDA 1171699 1915829
## 6 31 95.9 1154091 1936414
#subseting the data set
# We want only high school that their grade benchmark
raw <- raw %>% select(School.ID:Phone.Number,Teachers.Icon,Teachers.Score,
ISAT.Exceeding.Math..:X10th.Grade.PLAN..2010.) %>%
filter(Elementary..Middle..or.High.School == "HS")
# Rename some variable names
raw2 <- raw %>% select(1:19) %>% rename(School_id = 1,
School = 2,
School_Category = 3,
Street_Address = 4,
Phone_Number = 5,
Teachers = 6,
Teachers_Score = 7,
Algebra = Students.Taking..Algebra..,
Algebra_Benchmark = Students.Taking..Algebra..,
Explore_9th_Grade_2009 = X9th.Grade.EXPLORE..2009.,
Explore_9th_Grade_2010 = X9th.Grade.EXPLORE..2010.,
Plan_10th_Grade_2009 = X10th.Grade.PLAN..2009.,
Plan_10th_Grade_2010 = X10th.Grade.PLAN..2010.)
# reselect important rows
raw2 <- raw2 %>% select (1:7,16:19)
# Replace String with Another Stirng
raw2$Teachers[raw2$Teachers == 'NDA'] <- 'NA'
raw2$Teachers_Score[raw2$Teachers_Score == 'NDA'] <- '0'
raw2$Explore_9th_Grade_2009[raw2$Explore_9th_Grade_2009 == 'NDA'] <- 0
raw2$Explore_9th_Grade_2010[raw2$Explore_9th_Grade_2010 == 'NDA'] <- 0
raw2$Plan_10th_Grade_2009[raw2$Plan_10th_Grade_2009 == 'NDA'] <- 0
raw2$Plan_10th_Grade_2010[raw2$Plan_10th_Grade_2010 == 'NDA'] <- 0
head(raw2)
## School_id School
## 1 609680 Walter Payton College Preparatory High School
## 2 609749 Northside College Preparatory High School
## 3 610394 Uplift Community High School
## 4 609737 Friedrich W von Steuben Metropolitan Science High School
## 5 609695 Roald Amundsen High School
## 6 609708 Edwin G Foreman High School
## School_Category Street_Address Phone_Number Teachers Teachers_Score
## 1 HS 1034 N Wells St (773) 534-0034 NA 0
## 2 HS 5501 N Kedzie Ave (773) 534-3954 NA 0
## 3 HS 900 W Wilson Ave (773) 534-2875 Average 53
## 4 HS 5039 N Kimball Ave (773) 534-5100 Weak 29
## 5 HS 5110 N Damen Ave (773) 534-2320 Average 41
## 6 HS 3235 N LeClaire Ave (773) 534-3400 NA 0
## Explore_9th_Grade_2009 Explore_9th_Grade_2010 Plan_10th_Grade_2009
## 1 21.2 21.8 23.1
## 2 22.4 22.2 24.5
## 3 13.6 13.9 14.9
## 4 15.9 15.7 17.3
## 5 13.8 13.9 14.9
## 6 13 12.7 14.1
## Plan_10th_Grade_2010
## 1 23.2
## 2 24.7
## 3 14.7
## 4 16.7
## 5 14.8
## 6 13.7
we are going transform the data set so we can create new variables for Performance Column, along with survey data fields.
# Transforming the data set to Long using the
# performance benchmark for all teachers
temp <- pivot_longer(
data = raw2,
cols = Teachers,
names_to = "Performance",
values_to = "Teaching_Performance"
)
## Spred the performace acroos different columns
temp_prep <- spread(temp, key = "Teaching_Performance",value = "Teachers_Score")
temp_prep$Not_Recorded <- temp_prep$`NA`
temp_prep$`NA` <- NULL
# Replace All na with 0
temp_prep$Average[temp_prep$Average == NA] <- "0"
temp_prep <- temp_prep %>% mutate_at(c('Average','Strong','Very Strong','Weak','Not_Recorded',
'Explore_9th_Grade_2009','Explore_9th_Grade_2010','Plan_10th_Grade_2009',
'Plan_10th_Grade_2010'), as.numeric)
temp_prep$`Very Strong`[temp_prep$`Very Strong` == NA] <- 0
head(temp_prep)
## # A tibble: 6 × 15
## School_id School School_Category Street_Address Phone_Number
## <int> <chr> <chr> <chr> <chr>
## 1 609680 Walter Payton College P… HS "1034 N Wells… (773) 534-0…
## 2 609749 Northside College Prepa… HS "5501 N Kedzi… (773) 534-3…
## 3 610394 Uplift Community High S… HS "900 W Wilson… (773) 534-2…
## 4 609737 Friedrich W von Steuben… HS "5039 N Kimba… (773) 534-5…
## 5 609695 Roald Amundsen High Sch… HS "5110 N Damen… (773) 534-2…
## 6 609708 Edwin G Foreman High Sc… HS "3235 N LeCla… (773) 534-3…
## # ℹ 10 more variables: Explore_9th_Grade_2009 <dbl>,
## # Explore_9th_Grade_2010 <dbl>, Plan_10th_Grade_2009 <dbl>,
## # Plan_10th_Grade_2010 <dbl>, Performance <chr>, Average <dbl>, Strong <dbl>,
## # `Very Strong` <dbl>, Weak <dbl>, Not_Recorded <dbl>
# High School that have the highest average of Student performing well in their test
school <- temp_prep %>% arrange(desc(Explore_9th_Grade_2009)) %>% select(2,3,Performance,Explore_9th_Grade_2009:Plan_10th_Grade_2010,Average:Not_Recorded ) %>%
head(10)
school <- school %>% replace(is.na(.), 0)
head(school)
## # A tibble: 6 × 12
## School School_Category Performance Explore_9th_Grade_2009
## <chr> <chr> <chr> <dbl>
## 1 Northside College Preparat… HS Teachers 22.4
## 2 Walter Payton College Prep… HS Teachers 21.2
## 3 Albert G Lane Technical Hi… HS Teachers 19.1
## 4 Lincoln Park High School HS Teachers 17.7
## 5 Friedrich W von Steuben Me… HS Teachers 15.9
## 6 Charles Allen Prosser Care… HS Teachers 15.5
## # ℹ 8 more variables: Explore_9th_Grade_2010 <dbl>, Plan_10th_Grade_2009 <dbl>,
## # Plan_10th_Grade_2010 <dbl>, Average <dbl>, Strong <dbl>,
## # `Very Strong` <dbl>, Weak <dbl>, Not_Recorded <dbl>
Let look at the schools that performed well in 2009 because of their teacher performances. We are going to check the top school in Niche.com to have an idea why City of Chicago survey range this particular school as one to excel in taking state exam.
# the choosen graph
school %>% ggplot(aes(x = School, y = Explore_9th_Grade_2009)) +
geom_col() +
facet_grid(~ Performance) +
coord_flip() +
ylim(c(0, 90)) +
geom_text(aes(label = round(Explore_9th_Grade_2009, 1)), hjust = -.1) +
labs(x = "Top 10 High School In Chigago Distrist", y = "Highest Test Score Benchmark for year 2009")
According to Nice.com, Northside College Preparatory High School is a
top rated, public, magnet school located in CHICAGO, IL. It has 1,027
students in grades 9-12 with a student-teacher ratio of 16 to 1.
According to state test scores, 94% of students are at least proficient
in math and 95% in reading. These Schools has performing since 2009 to
present day and the involvements of teachers and parents have been one
of the main reasons they do well academically. The picture below shows a
comparison of some of the best High School revealed in the top 10 list.
Schools, C. P. (2018, September 14). NNW -all schools by Performance Level + AYP: City of chicago: Data Portal. Chicago Data Portal. https://data.cityofchicago.org/Education/NNW-all-schools-by-performance-level-AYP/wefw-pbqz
Senior, Freshman, & Junior. (2021a, December 31). Northside College Preparatory High School in Chicago, IL. Niche. https://www.niche.com/k12/northside-college-preparatory-high-school-chicago-il/
Kelly Kang, J. F. (n.d.). Data tables. NSF. https://ncses.nsf.gov/pubs/nsf24300/table/1-3
Tankersley, J. (2021, March 31). Biden details $2 trillion plan to rebuild infrastructure and reshape the economy. The New York Times. https://www.nytimes.com/2021/03/31/business/economy/biden-infrastructure-plan.html#:~:text=677-,Biden%20Details%20%242%20Trillion%20Plan%20to%20Rebuild%20Infrastructure%20and%20Reshape,inequities%20and%20raising%20corporate%20taxes.