The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. ##Upload dataset cod<-read.csv(C:/Users/hangr/Documents/Acquisition and data management/Tap.csv)
tap<-read.csv("C:/Users/hangr/Documents/Acquisition and data management/Tap.csv")
Tap dataset is an open data dataset found in the opendata website. Tap stands for The New York State Tuition Assistance Program (TAP). It helps eligible New York residents pay tuition at approved schools in New York State. Depending on the academic year in which you begin study, an annual TAP award can be up to $5,165. Because TAP is a grant, it does not have to be paid back. In our analysis we will try to determine how many student take advantages of the stipend and how it is ditributed.
Since the data is already in a structured format we will rename the headers and choose some variables to do the analysis.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Select the variables needed to do the analysis
#Rename the columns
names(tap)[1]="Year"
names(tap)[4]="Sector"
names(tap)[6]="Age"
names(tap)[11]="Income"
names(tap)[14]="Count"
names(tap)[16]="Amount"
head(tap)
Let’s get the variable needed for the analysis
tap %>%
select(Year,Sector,Age,Income,Count,Amount)
head(tap)
Get the voulme of dollars distributed every year
amount_year<-tap %>%
group_by(Year) %>%
summarise(Total=sum(Amount)) %>%
arrange(desc(Total))
amount_year
How many people receive the tap every year
count_year<- tap %>%
group_by(Year, Amount) %>%
summarise(total=sum(Count)) %>%
arrange(desc(total))
head(count_year)
Let’s get the highest amount distributed for a year
top_amount<- tap %>%
group_by(Age) %>%
top_n(1, Amount)
top_amount
Under age 22 has received the most amount distributed by TAP.
plots_top<-tail(top_amount,20)
ggplot(plots_top, aes(plots_top$Age, plots_top$Amount)) + geom_bar(stat="identity")
The program has expected give young people without income the opportunity to go to college as illustrated in the graph.