ggplot2
basicsDuring ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R
and the
packages within R
to assemble data and construct many
different types of visualizations. We begin by studying some of the
theoretical aspects of visualization. To do that we must appreciate the
basic steps in the process of making a visualization.
The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be pre-processing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy and expository nature of your graphics. Make sure your axis labels are easy to understand and are comprised of full words with units if necessary.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.
nasaweather
package, create a
scatter plot between wind and pressure, with color being used to
distinguish the type of storm.# data(package = "nasaweather")
sdata = storms
head(sdata)
## # A tibble: 6 × 11
## name year month day hour lat long pressure wind type seasday
## <chr> <int> <int> <int> <int> <dbl> <dbl> <int> <int> <chr> <int>
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical D… 3
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical D… 3
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical S… 3
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical S… 3
## 5 Allison 1995 6 4 0 22 -86 997 50 Tropical S… 4
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical S… 4
summary(sdata)
## name year month day
## Length:2747 Min. :1995 Min. : 6.000 Min. : 1.00
## Class :character 1st Qu.:1995 1st Qu.: 8.000 1st Qu.: 9.00
## Mode :character Median :1997 Median : 9.000 Median :18.00
## Mean :1997 Mean : 8.803 Mean :16.98
## 3rd Qu.:1999 3rd Qu.:10.000 3rd Qu.:25.00
## Max. :2000 Max. :12.000 Max. :31.00
## hour lat long pressure
## Min. : 0.000 Min. : 8.30 Min. :-107.30 Min. : 905.0
## 1st Qu.: 3.500 1st Qu.:17.25 1st Qu.: -77.60 1st Qu.: 980.0
## Median :12.000 Median :25.00 Median : -60.90 Median : 995.0
## Mean : 9.057 Mean :26.67 Mean : -60.87 Mean : 989.8
## 3rd Qu.:18.000 3rd Qu.:33.90 3rd Qu.: -45.80 3rd Qu.:1004.0
## Max. :18.000 Max. :70.70 Max. : 1.00 Max. :1019.0
## wind type seasday
## Min. : 15.00 Length:2747 Min. : 3.0
## 1st Qu.: 35.00 Class :character 1st Qu.: 84.0
## Median : 50.00 Mode :character Median :103.0
## Mean : 54.68 Mean :102.6
## 3rd Qu.: 70.00 3rd Qu.:125.0
## Max. :155.00 Max. :185.0
ggplot(sdata, aes(x=wind,y=pressure,color=type))+
geom_point(alpha=0.5)+
labs(title="Wind vs Pressure by Storm Type", x="Wind",y="Pressure",color="Type")+
theme_bw()
MLB_teams
data in the mdsr
package
to create an informative data graphic that illustrates the relationship
between winning percentage and payroll in context.mdata = MLB_teams
summary(mdata)
## yearID teamID lgID W L
## Min. :2008 Length:210 AA: 0 Min. : 51.00 Min. : 59.00
## 1st Qu.:2009 Class :character AL:100 1st Qu.: 73.00 1st Qu.: 72.00
## Median :2011 Mode :character FL: 0 Median : 81.00 Median : 81.00
## Mean :2011 NA: 0 Mean : 80.99 Mean : 80.99
## 3rd Qu.:2013 NL:110 3rd Qu.: 90.00 3rd Qu.: 89.00
## Max. :2014 PL: 0 Max. :103.00 Max. :111.00
## UA: 0
## WPct attendance normAttend payroll
## Min. :0.3148 Min. :1335076 Min. :0.3106 Min. : 17890700
## 1st Qu.:0.4506 1st Qu.:1940441 1st Qu.:0.4514 1st Qu.: 67325266
## Median :0.5000 Median :2418204 Median :0.5625 Median : 85803966
## Mean :0.5000 Mean :2481715 Mean :0.5773 Mean : 94365324
## 3rd Qu.:0.5556 3rd Qu.:3041615 3rd Qu.:0.7076 3rd Qu.:114741109
## Max. :0.6358 Max. :4298655 Max. :1.0000 Max. :231978886
##
## metroPop name
## Min. : 1572245 Length:210
## 1st Qu.: 2785874 Class :character
## Median : 4541584 Mode :character
## Mean : 6014841
## 3rd Qu.: 6490180
## Max. :20092883
##
ggplot(mdata,aes(x=(payroll/1000000),y=(WPct*100)))+
geom_point()+
geom_smooth(method="lm")+
labs(title="Relationship Between MLB Teams' Winning Percentage and Payroll",x="Payroll ($ in Millions)",y="Winning Percentage (%)")+
theme_bw()
RailTrail
data set from the mosaicData
package describes the usage of a rail trail in Western Massachusetts.
Use these data to answer the following questions.volume
against the high temperature that dayweekday
(an indicator
of weekend/holiday vs. weekday)rdata=RailTrail
summary(rdata)
## hightemp lowtemp avgtemp spring
## Min. :41.00 Min. :19.00 Min. :33.00 Min. :0.0000
## 1st Qu.:59.25 1st Qu.:38.00 1st Qu.:48.62 1st Qu.:0.0000
## Median :69.50 Median :44.50 Median :55.25 Median :1.0000
## Mean :68.83 Mean :46.03 Mean :57.43 Mean :0.5889
## 3rd Qu.:77.75 3rd Qu.:53.75 3rd Qu.:64.50 3rd Qu.:1.0000
## Max. :97.00 Max. :72.00 Max. :84.00 Max. :1.0000
## summer fall cloudcover precip
## Min. :0.0000 Min. :0.0000 Min. : 0.000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 3.650 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median : 6.400 Median :0.00000
## Mean :0.2778 Mean :0.1333 Mean : 5.807 Mean :0.09256
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.: 8.475 3rd Qu.:0.02000
## Max. :1.0000 Max. :1.0000 Max. :10.000 Max. :1.49000
## volume weekday dayType
## Min. :129.0 Mode :logical Length:90
## 1st Qu.:291.5 FALSE:28 Class :character
## Median :373.0 TRUE :62 Mode :character
## Mean :375.4
## 3rd Qu.:451.2
## Max. :736.0
rtplot = ggplot(rdata,aes(x=volume,y=hightemp))+
geom_point()+
labs(title="Number of Crossings vs High Temperature Per Day",x="Number of Crossings Per Day",y="High Temperature")+
theme_bw()
rtplot
rtplot+
geom_smooth(method="lm")+
facet_wrap(~dayType)
nasaweather
package, use the
geom_path
function to plot the path of each tropical storm
in the storms
data table. Use color to distinguish the
storms from one another, and use faceting to plot each year in its own
panel.ggplot(sdata,aes(x=long,y=lat,color=name))+
geom_path()+
facet_wrap(~year)+
labs(title="Tropical Storms' Paths by Year",x="Longtitude",y="Latitude",color="Storm Name")+
theme_bw()
penguins
data set from the
palmerpenguins
package.pdata=penguins
summary(pdata)
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
pengplot = ggplot(pdata, aes(x=bill_length_mm,y=bill_depth_mm,color=species))+
geom_point()+
geom_smooth(method="lm")+
labs(title="Bill Length vs Bill Depth by Penguin Species", x="Bill Length (mm)",y="Bill Depth (mm)",color="Species")+
theme_bw()
pengplot
We can see that each a penguin’s species can be distinguished by using a ratio of bill length and bill depth. The regression lines’ slopes suggest that there is a direct correlation between a penguin’s bill length and bill depth.
pengplot+
facet_wrap(~species)
When separated, the differences become clearer. Adelie species have shorter bills compared to the other two. Gentoo have less deep bills than Chinstrap and Adelie species. Chinstrap penguins have both bill length and depth on a bigger side.