A5

#Setup

We begin this assignment by adding the data from midwest in the ggplot2 package. We attach the data to add the column fields into the catalog.

library(ggplot2)
View(midwest)
attach(midwest)

#Graph 1

Create a histogram of the percent of the population below poverty (“percbelowpoverty”) using a binwidth of 5.

To create this graph we used ggplot using midwest as the data source and set the aesthetic as x = percbelowpoverty. The second line is geom_histogram with a binwidth = 5 to create a histogram of the data. Note: You must use + between the lines of code as they run together.

ggplot(midwest, aes(x= percbelowpoverty))+
geom_histogram(binwidth = 5)

#Graph 2

Create a kernel density plot of the percent of the population below poverty (“percbelowpoverty”) and include a fill color of your choosing.

This plot uses the same first line of code as Graph 1, but in the second line we use geom_density to create a kernel density map of the data. To give it color as I did you can use the stipulation fill = “colour”.

ggplot(midwest, aes(x= percbelowpoverty))+
geom_density(fill = "blue")

#Graph 3

Create a combined histogram and kernel density plot of the percent of the population below poverty (“percbelowpoverty”). Make sure to define the y-axis as density. For the histogram, use a binwidth of 5, a color of your choosing, and a fill of you choosing. For the kernel density component, set a color, line weight, fill color, and transparency.

In this graph we will use the first two lines of code from Graph 1 with the edition of a fill colour. Again using + after the first and second line. Now we must add the third line of code which is the same as the second line of code from Graph 2 with the adition of the modifiers color which assigns a color you can stipulate to the outline of the plot, size which designate the thickness of the plot’s outline, and alpha which will set a level of transparency that you designate.

ggplot(midwest, aes(x= percbelowpoverty, ..density..))+
geom_histogram(binwidth = 5, fill = "red")+
geom_density(color = "green", size = 1.0, fill = "blue", alpha=.5)

#Graph 4

Create a kernel density plot of the percent of the population below poverty (“percbelowpoverty”) with separate curves for each state. Assign density to the y-axis and use a fill color to differentiate the states. Also provide a transparency so that curves can be more easily visualized where they overlap.

This graph will be set up the same as Graph 2 but with changes in the first line, with the addition of ..density.. to the asthetic field. This designates density as the y-axis of the graph. And finally in order to create a curve for each state in the data frame we must change fill to equal state in the asthetic field also. Then to finish it off add a transparency to line two.

ggplot(midwest, aes(x= percbelowpoverty, ..density.., fill=state))+
geom_density(alpha=.5)

#Graph 5

Create a kernel density plot with the same settings as Graph 4; however, only include curves for the states of Indiana and Michigan.

This graph is essentially th same as Graph 4 but with the addition of a filter. First we must add dplyr to the library. Now we must create a new data frame from which to draw from, we shall call it “A”. In the filter we will designate state == “IN” and state == “MI”, this will filter out only the data from Indiana and Michigan. The last two lines of code will be the same as Graph 4, but we will now be pulling from data frame “A” instead of midwest.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

A <- midwest %>% dplyr::filter(state == "IN" | state == "MI")
ggplot(A, aes(x= percbelowpoverty, ..density.., fill= state))+
geom_density(alpha=.5)

#Graph 6

Create a kernel density plot of percent of the population with a college degree (“percollege”) with density mapped to the y-axis and states differentiated using a fill color. Also provide a transparency so that curves can be more easily visualized where they overlap.

This graph is exactly the same as Graph 4 with the exception of x being changed from percbelowpoverty to percollege.

ggplot(midwest, aes(x= percollege, ..density.., fill= state))+
geom_density(alpha=.5)

#Graph 7

Create a kernel density plot with the same settings as Graph 6; however, only include curves for the states of Indiana and Michigan.

This graph is the same as Graph 6, but with the same filter and data frame as Graph 5. Now giving us just the data for the two states again.

A <- midwest %>% dplyr::filter(state == "IN" | state == "MI")
ggplot(A, aes(x= percollege, ..density.., fill= state))+
geom_density(alpha=.5)

#Graph 8

Create a box plot of percent of the population with a college degree (“precollege”) with different plots for each state. Also, differentiate the states using a fill color.

Now we are going to make a box plot of the percent of college degrees per state. We will use the same ggplot code as before for the first line, but in the aesthetic field we are going to set x = state and y = percollege and set the fill equal to state. The second line of code will simply for the time being just be geom_boxplot.

ggplot(midwest, aes(x = state, y = percollege , fill = state))+
geom_boxplot()

#Graph 9

Create a combined violin and box plot of percent of the population with a college degree (“percollege”) with different plots for each state. Differentiate the states using a fill applied to the violin plot. Alter the box plots so that they are all the same color (fill not used to differentiate states) and fit within the associated violin plot.

Now we are going to add a Violin Plot to our graph. The first line of code will be the sane as Graph 8 but the second line will simply be geom_violin. In order to make the graph neat for the third line geom_boxplot we will set width = 0.1 and give them fill = “gray”.

ggplot(midwest, aes(x = state, y = percollege , fill = state))+
geom_violin()+
geom_boxplot(width = 0.1, fill = "gray")

#Graph 10

Create a bar graph of mean percent of the population with professional employment (“percprof”) by state. You will need to summarize the county-level data by state to obtain the state means. Within geom_bar() you will need to set stat equal to “identity.”

#Graph 11

Create a scatter plot of percent of the population with a college degree (“percollege”) mapped to the x-axis and percent of the population below the poverty line (“percbelowpoverty”) mapped to the y-axis.

Now we are going to create a Scatter Plot. The first line of code is similar to before but x is changed to percollege and y changed to percbelowpoverty. For the second line we now use geom_point to create the scatter plot.

ggplot(midwest, aes(x = percollege, y = percbelowpoverty))+
geom_point()

#Graph 12

Add a loess curve to the graph created in 12.

Now we are going to add a Loess Curve to the graph. The first two lines of the code will remain the same as Graph 11. We will though add a third line geom_smooth with the modifyer method = loess.

ggplot(midwest, aes(x = percollege, y = percbelowpoverty))+
geom_point()+
geom_smooth(method = loess)

## `geom_smooth()` using formula 'y ~ x'

#Graph 13

Create a scatter plot with percent of the population with a college education (“precollege”) mapped to the x-axis, percent of the population below the poverty line (“percbelowpoverty”) mapped to the point color and population density (“popdensity”) mapped to the point size.

For this scatter plot we will be mapping the color to the percentage and the population density to the point size. We achieve this by simply changing the aesthetics as follows x = percollege, y = percbelowpoverty, color = percbelowpoverty, and size = popdensity. The second line of code will be the same as in Graph 11.

ggplot(midwest, aes(x = percollege,y = percbelowpoverty, color = percbelowpoverty, size = popdensity))+
geom_point()

#Now we swithch to the economics data included in the ggplot2 package.

View(economics)
attach(economics)

#Graph 14

Create a new variable of unemployment rate using the number of unemployed people (“unemploy”) and the total population (“pop”). Create a time series line graph with the date (“date”) mapped to the x-axis and your new unemployment rate variable mapped to the y-axis.

To create this graph we need to create a data frame called Unemployment_Rate. The we must use the aesthetics x = date and y = Unemployment_Rate, with the final line being geom_line.

Unemployment_Rate <- (unemploy/pop)*100
ggplot(economics, aes(x = date, y = Unemployment_Rate))+
geom_line()

#For the final graphs we now need to switch to the mpg data in the ggplot2 package.

View(mpg)
attach(mpg)

#Graph 15

Create a box plot with the class of the vehicles (“class”) mapped to the x-axis and fill color and the highway fuel economy (“hwy”) mapped to the y-axis.

Now we are going to create a boxplot for the “mpg” data. The aesthetics that we will designate in the first line are as follows, x = class, y = hwy, fill = class. The second line will be geom_boxplot , thus creating a boxplot as requested above.

ggplot(mpg, aes(x = class, y = hwy, fill = class))+
geom_boxplot()

#Graph 16

Reproduce the box plot created in Graph 15, but only include data from 2008.

For this graph we need to create a data frame and filter using dplyr as before. We will call this data frame “C”, and set the filter year == “2008”. Then run the function as before but with the filter to produce only results from the year 2008.

C <- mpg %>% dplyr::filter(year == "2008")
ggplot(C, aes(x = class, y = hwy, fill = class))+
geom_boxplot()

#Graph 17

Generate a kernel density plot of city fuel economy (“cty”). Make sure to map density to the y-axis. To differentiate the model year, map the year (“year”) to the fill color. Note that you will need to define it as a factor. Also, provide a transparency so that data can be visualized in overlapping areas.

ggplot(mpg, aes(x = cty, ..density.., fill = "year"))+
geom_density(alpha = 0.5)

#Graph 18

Use just models from 2008, create a scatter plot with city fuel economy (“cty”) mapped to the x-axis and highway fuel economy (“hwy”) mapped to the y-axis. Also, add a loess curve.

For this graph we are going to comine methods used in Graphs 11 & 12, along with the same filter from Graph 16. We will use the filter and its resulting data frame, then set the aesthetics x = cty, y = hwy, use the function geom_point, and finally using geom_smooth with the method set to loess to add the Loess Curve to the scatter plot.

C <- mpg %>% dplyr::filter(year == "2008")
ggplot(C, aes(x = cty, y = hwy))+
geom_point()+
geom_smooth(method = loess)

## `geom_smooth()` using formula 'y ~ x'

#Graph 19

Create a box plot for just models from 2008 to compare highway fuel economy (“hwy”) between different drive systems (4-wheel, front-wheel, rear-wheel) using the “drv” field. Assign “drv” to the x-axis and fill color. Map the highway fuel economy (“hwy”) to the y-axis.

Again we are creating a filtered boxplot as in Graph 16 but only changing the asthetics to x = drv, y = hwy, fill = drv, giving us a boxplot on the types of drive systems used by cars for 2008.

C <- mpg %>% dplyr::filter(year == "2008")
ggplot(C, aes(x = drv, y = hwy, fill = drv))+
geom_boxplot()

#Graph 20

Add a violin plot to the graph generated in Graph 19 to produce a combined violin and box plot. The fill color for the violin plot should differentiate the drive type (“drv”). The box plots should all have the same fill color and should fit within the associated violin plot.

For this graph we are using the same Graph as Graph 19 but we are adding a Violin Plot exactly as we did in Graph 9. Using geom_violin and geom_boxplot with a width = 0.1 and fill = “gray”.

C <- mpg %>% dplyr::filter(year == "2008")
ggplot(C, aes(x = drv, y = hwy, fill = drv))+
geom_violin()+
geom_boxplot(width = 0.1, fill = "gray")

A5

Jacob Hartwell

3/2/2021