In this assignment, I used the Health Data for the year 2016 from Social Explorer. I will be running a simple comparison between New York and California to find out the total healthcare providers in both states. The three healthcare providers covered in the dataset are;
library(readr)
Health_Providers <- read.csv("/users/sharanbhamra/Desktop/SOC 712/R11590948_SL050.csv")
head(Health_Providers)
Loading the dataset from the desktop in to R studio by using the “readr” library to read csv file.
library(dplyr)
Health_Providers$Geo_STATE <- recode(Health_Providers$Geo_STATE,
'6' = "California",
'36' = "New York")
Recoded the two variables in the dataset to differenciate which states are California and New York.
HealthProviders<- Health_Providers%>%
rename(County_Name = Geo_NAME,
State = Geo_STATE,
PCP = SE_T004_001,
MHP = SE_T004_002,
Dentists = SE_T004_003,
PCP_Rate = SE_NV003_001,
MHP_Rate = SE_NV003_002,
Dentists_Rate = SE_NV003_003) %>%
select(County_Name,
State,
PCP_Rate,
MHP_Rate,
Dentists_Rate) %>%
mutate(Total_Providers = PCP_Rate + MHP_Rate + Dentists_Rate)
In the above code, I renamed all the variables according to the code book that was provided. Therefore, I selected the variables to keep so as to run my analysis and creat a new variable called “Total Providers” by adding all 3 providers Where:
* PCP = Rate per 100k
* MHP = Rate per 100k
* Dentists = Rate per 100k
library(ggplot2)
ggplot(data = HealthProviders) +
geom_col(aes(x=State, y=Total_Providers, fill = State))
As it can be noticed from the above graph, California has the highest total providers as compared to New York City. Therefore, we can assume that more people in the State of California have insurance as compared to people in New York City.