Assignment 6
1. Mental Health Clinics
a. This data set is a survey of every known healthcare facility that offers mental health services in the United States in 2015. Navigate to https://datafiles.samhsa.gov/study-dataset/nationalmental-health-services-survey-2015-n-mhss-2015-ds0001-nid17098 and select the R download. Look through the codebook PDF for an explanation on certain variables. Upon opening the RDA file, the data set should be inserted into your global environment, which you can then reference.
View(mh2015_puf)
b. Please create code which lists the State abbreviations without their counts, one abbreviation per State value. It does not have to in data frame format. A vector is fine.
list(mh2015_puf$LST)
summary(mh2015_puf$LST)
```r{} paste(mh2015_puf$LST)
*c. Filter the data.frame from 1A. We are only interested in the Veterans Administration (VA) medical centers in the mainland United States-create a listing of counts of these centers by state, including only mainland locations. Alaska, Hawaii, and U.S. territories should be omitted. DC, while not a state, is in the mainland, so it should remain included. Convert this to data.frame()
StateVet <- cbind.data.frame(mh2015_puf$LST, mh2015_puf$SRVC113)
grep("No", StateVet$`mh2015_puf$SRVC113`)
StateVetOnly <- StateVet[-grep("No",StateVet$`mh2015_puf$SRVC113`),]
count(StateVetOnly$`mh2015_puf$LST`)
StateVetOnlyNoAlaska <-StateVetOnly[-grep("AK",StateVetOnly$`mh2015_puf$LST`),]
FinalStateVet <-StateVetOnlyNoAlaska[-grep("HI",StateVetOnlyNoAlaska$`mh2015_puf$LST`),]
count(FinalStateVet$`mh2015_puf$LST`)
> names(StatesforVetCenter) = c("States", "Total # Hospitals")
d. Create a ggplot barchart of this filtered data set. Vary the bar’s colors by what State it has listed. Give it an appropriately professional title that is centered. Make sure you have informative axis labels. The State axis should be readable, not layered over each other. You’re welcome to have a legend or not.
ggplot(StatesforVetCenter, aes(x=States, y=TotalHospital, fill=States, title = "VA Hospitals"")) +
geom_bar(stat="identity", position = "dodge")
2. Cleaning & Bringing in New Features
a. This graph (1D) might be somewhat misleading, as bigger states may have more hospitals, but could be more sparsely located. Read statesize.csv into your R environment. This contains essentially a vector of square miles for each state. In trying to merge it with your data.frame() from 1C, you find that they don’t match. Use paste() on your LST column in 1C to see what the matter is, and write what you observe in a comment. names need to match in both documents in order to merge and the correct number of rows
b. Correct the problem with the LST column using any method in R that is programmatic and easily understandable. Once you have made these state abbreviations identical to statesize.csv’s Abbrev column, merge the data.frame() from 1C and statesize.csv in order to add size information.
MergeData2 <- merge(StatesforVetCenter, Statesize, union("Name"), all=TRUE)
c. Calculate a new variable in your combined data.frame() which indicates the VA hospitals per thousand square miles.
d. Create another ggplot which considers the VAs per square thousand miles, rather than just frequency. . Make sure the State axis is readable, like before. Change the title and axes as appropriate.
Two charts below
> ggplot(Exercise, aes(x=Region, y=SqMiles, fill=Region, title("VA Hospitals Per Square Mile"))) +
+ geom_bar(stat="identity", position = "dodge")
> ggplot(Exercise, aes(x=State, y=SqMiles, fill=Region, title("VA Hospitals Per Square Mile"))) +
+ geom_bar(stat="identity", position = "dodge")
Regions in the West appear to have the largest amount of medical centers per square mile.