INSTRUCTIONS
Following the instructions below to create charts from the prescriber-info.csv file used in HW1. Submit the .rmd R notebook on Sakai by the deadline: Wednesday November 8 at 11:55pm.
Data processing
For each of these charts, you will need to aggregate the data set to import it into ggplot2. Use the dplyr package to summarise the data for each question. All the charts should only include data from the Continental U.S., so the data will need to be filtered appropriately. You do not need to process the data separately for each question, so feel free to aggregate the data as efficiently as possible.
Charts
Use the ggplot2 package to create all the following charts. All charts should include a chart title, axis labels, a black-and-white theme. Where appropriate, the design elements should be altered to improve readability. For example, long axis labels should be wrapped to prevent overlap using scale_x_discrete().
Load data and packages
The following packages should be sufficient to manipulate the data and create the graphs. The original dataset should be named “dat”, so the command to import the data is included for standardization. (1 pt)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)
dat <- read.csv("prescriber-info.csv")
dat1 <- dat %>%
filter(!State %in% c("AA", "AE", "AK", "DC", "GU", "HI", "PR", "VI", "ZZ")) %>%
select(State,Specialty,FENTANYL,OXYCONTIN,Opioid.Prescriber)
HW3 Question 1: Sales by State (Bar Chart)
Recreate the Q3 bar chart from the first homework, HW1, except that FL should be highlighted. Create a bar chart of the sum of the Oxycontin prescriptions by State. The bars should be sorted in descending order, and the State of FL should be highlighted. Only the continental U.S. entities should be included, so you’ll need to preprocess the dataset to exclude those values and summarize the data at the appropriate level.(8 points)
m.plot <- dat1 %>%
select(State, OXYCONTIN) %>%
group_by(State) %>%
summarise(TotalPrescriptions = sum(OXYCONTIN))
m <- ggplot(m.plot, aes(x=reorder(State, -TotalPrescriptions),y=TotalPrescriptions,fill=State=='FL')) +
geom_bar(stat="identity") +
ggtitle("Which state has the highest prescriptions of Oxycontin?")
m <- m + xlab("State") + ylab("Oxycontin Prescriptions")
m<- m + theme_bw()
m
HW3 Question 2: Oxycontin vs Fentanyl (Bubble Chart)
Recreate the Q4 scatterplot from the first homework assignment, HW1. Compare the number of prescriptions for Oxycontin and Fentanyl, with Oxycontin on the x-axis and Fentanyl on the y-axis, and the bubble size and color should reflect the sum of Opioid Prescribers. (8 points)
dat1 %>% group_by(State) %>%
summarise(Oxycontin = sum(OXYCONTIN), Fentanyl = sum(FENTANYL), OpioidPrescriber = sum(Opioid.Prescriber)) %>%
ggplot(aes(x=Oxycontin, y=Fentanyl)) +
geom_point(aes(size=OpioidPrescriber,colour=OpioidPrescriber)) +
ggtitle("Is there a relationship between Oxycontin and Fentanyl, and opioid prescribers?") +
xlab("# Oxycontin Prescriptions") + ylab("# Fentanyl Prescriptions") +
scale_colour_gradient(name="Opioid Prescribers", low = "light blue", high = "dark blue") + theme_bw() +
theme(legend.position="bottom")
HW3 Question 3: Distribution of Oxycontin by Specialty (Boxplot)
Recreate the Q5 boxplot from the first homework HW1, except for the annotations. Each boxplot shows the distribution of the sum of Oxycontin prescriptions by state for each specialty. The specialty should be filtered by the set that made than 2000+ prescriptions. (8 points)
dat1 %>% group_by(Specialty) %>%
filter((sum(OXYCONTIN)>2000)) %>%
group_by(State,Specialty) %>%
summarise_all(sum) %>%
ggplot(aes(reorder(Specialty,-OXYCONTIN),OXYCONTIN))+
geom_boxplot()+
scale_y_continuous(breaks=seq(0,2000,by=200))+
scale_x_discrete(labels=wrap_format(4))+
xlab("Specialty") + ylab("Oxycontin") +
ggtitle("Family Practice and Internal Medicine have the most Oxycontin prescriptions")