Using data from the nasaweather package, create a
scatterplot between wind and pressure, with color being used to
distinguish the type of storm.
From the below chart, we can see that wind and pressure tend to have an inverse relationship, meaning higher pressure tend to be associated with lower wind speed and lower pressure tend to be associated with higher wind speed. Among the four storm types, we find that Hurricane has the highest wind speed and lowest pressure on average; Tropical Depression has the highest pressure and lowest wind speed on average; then Extratropical and Tropical Storm have middling pressure and wind speed.
storms <- storms
ggplot(storms, aes(x = wind, y = pressure, color = type)) +
theme_bw() +
geom_point() +
labs(title = "Scatterplot of Wind vs Pressure", x = "Wind", y = "Pressure")

ggsave("scatter1.png")
knitr::include_graphics("scatter1.png")
Use the MLB_teams data in the mdsr package
to create an informative data graphic that illustrates the relationship
between winning percentage and payroll in context.
The below scatterplot plots Winning Percentage vs Payroll and overlays a best fitted line using simple linear regression on top of it along with standard error bands. Since the slope of the best fitted line is positive, there seems to be a positive relationship between winning percentage and payroll, meaning the players who have a higher winning percentage get to be paid higher.
MLB <- MLB_teams
ggplot(MLB, aes(x = WPct, y = payroll)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = TRUE) +
labs(title = "Scatterplot of Winning Percentage vs Payroll", x = "Winning Percentage", y = "Payroll")
# ggsave("scatter2.png")
# knitr::include_graphics("scatter2.png")
The RailTrail data set from the mosaicData
package describes the usage of a rail trail in Western Massachusetts.
Use these data to answer the following questions.
volume against the high temperature that dayweekday (an indicator
of weekend/holiday vs. weekday)Based on the scatterplot of Volume vs High Temperature, there seems to be a positive relationship between volume and highest temperature during a day for both weekdays and weekends. In addition, this relationship is more positive (slope is steeper) on weekdays compared to weekends.
railtrail <- RailTrail
ggplot(railtrail, aes(x = volume, y = hightemp, color = weekday)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
labs(title = "Scatterplot of Volume vs High Temperature", x = "Volume", y = "High Temperature")

ggsave("scatter3.png")
knitr::include_graphics("scatter3.png")
Using data from the nasaweather package, use the
geom_path function to plot the path of each tropical storm
in the storms data table. Use color to distinguish the
storms from one another, and use faceting to plot each year in its own
panel.
In the graph below, we plotted the path of each storm by their longitude
and latitude locations across time, and faceted the data into different
panel by year. It looks like most storms tend to go from southwest to
northeast.
ggplot(storms, aes(x=lat, y=long)) +
geom_path(aes(col=name)) +
facet_wrap(~year)

ggsave("path1.png")
knitr::include_graphics("path1.png")
Using the penguins data set from the
palmerpenguins package.
Create a scatterplot of bill_length_mm against bill_depth_mm
where individual species are colored and a regression line is added to
each species. Add regression lines to all of your facets. What do you
observe about the association of bill depth and bill length?
Based on the first scatterplot, bill length and bill depth are
positively associated for all three species, meaning the larger bill
length is, the larger bill depth is. It also looks like Chinstrap has
the steepest slope while Adelie as the flattest.
Repeat the same scatterplot but now separate your plot into
facets by species. How would you summarize the association between bill
depth and bill length.
Scatterplot 2 also shows that bill length and bill depth are positively
associated for all three species. On top of that, while Adelie and
Chinstrap have the same range for bill depth, Adelie has shorter bill
length than Chinstrap on average. Also, Chinstrap and Gentoo have the
same range for bill length, but Chinstrap has longer bill depth than
Gentoo on average.
penguins <- penguins
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
labs(title = "Scatterplot 1 of Bill Length vs Bill Depth", x = "Bill Length", y = "Bill Depth")

ggsave("scatter4.png")
knitr::include_graphics("scatter4.png")
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
theme_bw() +
geom_point(alpha = 0.5, color = "blue") +
geom_smooth(method = 'lm', se = FALSE) +
facet_wrap(~species) +
labs(title = "Scatterplot 2 of Bill Length vs Bill Depth", x = "Bill Length", y = "Bill Depth")

ggsave("scatter5.png")
knitr::include_graphics("scatter5.png")