Questions

  1. Using data from the nasaweather package, create a scatter plot between wind and pressure, with color being used to distinguish the type of storm.
library("nasaweather")
library("ggplot2")
head(storms)
## # A tibble: 6 × 11
##   name     year month   day  hour   lat  long pressure  wind type        seasday
##   <chr>   <int> <int> <int> <int> <dbl> <dbl>    <int> <int> <chr>         <int>
## 1 Allison  1995     6     3     0  17.4 -84.3     1005    30 Tropical D…       3
## 2 Allison  1995     6     3     6  18.3 -84.9     1004    30 Tropical D…       3
## 3 Allison  1995     6     3    12  19.3 -85.7     1003    35 Tropical S…       3
## 4 Allison  1995     6     3    18  20.6 -85.8     1001    40 Tropical S…       3
## 5 Allison  1995     6     4     0  22   -86        997    50 Tropical S…       4
## 6 Allison  1995     6     4     6  23.3 -86.3      995    60 Tropical S…       4
summary(storms)
##      name                year          month             day       
##  Length:2747        Min.   :1995   Min.   : 6.000   Min.   : 1.00  
##  Class :character   1st Qu.:1995   1st Qu.: 8.000   1st Qu.: 9.00  
##  Mode  :character   Median :1997   Median : 9.000   Median :18.00  
##                     Mean   :1997   Mean   : 8.803   Mean   :16.98  
##                     3rd Qu.:1999   3rd Qu.:10.000   3rd Qu.:25.00  
##                     Max.   :2000   Max.   :12.000   Max.   :31.00  
##       hour             lat             long            pressure     
##  Min.   : 0.000   Min.   : 8.30   Min.   :-107.30   Min.   : 905.0  
##  1st Qu.: 3.500   1st Qu.:17.25   1st Qu.: -77.60   1st Qu.: 980.0  
##  Median :12.000   Median :25.00   Median : -60.90   Median : 995.0  
##  Mean   : 9.057   Mean   :26.67   Mean   : -60.87   Mean   : 989.8  
##  3rd Qu.:18.000   3rd Qu.:33.90   3rd Qu.: -45.80   3rd Qu.:1004.0  
##  Max.   :18.000   Max.   :70.70   Max.   :   1.00   Max.   :1019.0  
##       wind            type              seasday     
##  Min.   : 15.00   Length:2747        Min.   :  3.0  
##  1st Qu.: 35.00   Class :character   1st Qu.: 84.0  
##  Median : 50.00   Mode  :character   Median :103.0  
##  Mean   : 54.68                      Mean   :102.6  
##  3rd Qu.: 70.00                      3rd Qu.:125.0  
##  Max.   :155.00                      Max.   :185.0
ggplot(storms, aes(x = pressure, y = wind, color = type)) +
  geom_point(alpha = 0.6) +
  labs(title = "Wind and Pressure by Type of Storm",
       x = "Pressure (mb)",
       y = "Wind (knots)",
       color = "Type") +
  theme_classic()

  1. Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
library("mdsr")
library("ggplot2")
data("MLB_teams")

ggplot(MLB_teams, aes(x = payroll, y = WPct)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Relationship between Winning Percentage and Payroll",
       x = "Payroll (millions of dollars)",
       y = "Winning Percentage") +
  scale_x_continuous(labels = scales::dollar_format(scale = 1e-6)) +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

  1. The RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.
  1. Create a scatterplot of the number of crossings per day volume against the high temperature that day
  2. Separate your plot into facets by weekday (an indicator of weekend/holiday vs. weekday)
  3. Add regression lines to the two facets
library("mosaicData")
library("ggplot2")
data("RailTrail")

ggplot(RailTrail, aes(x = hightemp, y = volume)) +
  geom_point(alpha = 0.6) +
  labs(title = "Number of Crossings per Day by High Temperature",
       x = "High Temperature (°F)",
       y = "Number of Crossings per Day") +
  theme_classic()

ggplot(RailTrail, aes(x = hightemp, y = volume)) +
  geom_point(alpha = 0.6) +
  facet_wrap(~ weekday, ncol = 1) +
  labs(title = "Number of Crossings per Day by High Temperature and Weekday",
       x = "High Temperature (°F)",
       y = "Number of Crossings per Day") +
  theme_classic()

ggplot(RailTrail, aes(x = hightemp, y = volume, color = weekday)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ weekday, ncol = 1) +
  labs(title = "Number of Crossings per Day by High Temperature and Weekday",
       x = "High Temperature (°F)",
       y = "Number of Crossings per Day") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

  1. Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.
library("nasaweather")
library("ggplot2")

data("storms")
storms_df <- as.data.frame(storms)

ggplot(storms_df, aes(x = long, y = lat, color = name)) +
  geom_path() +
  facet_wrap(~ year, ncol = 3) +
  scale_color_discrete(name = "Storm Name") +
  labs(title = "Paths of Tropical Storms",
       x = "Longitude",
       y = "Latitude") +
  theme_classic()

  1. Using the penguins data set from the palmerpenguins package.
  1. Create a scatterplot of bill_length_mm against bill_depth_mm where individual species are colored and a regression line is added to each species. Add regression lines to all of your facets. What do you observe about the association of bill depth and bill length?
  2. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.
library("palmerpenguins")
library("ggplot2")

data("penguins")

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Association between Bill Length and Bill Depth",
       x = "Bill Length (mm)",
       y = "Bill Depth (mm)",
       color = "Species") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point(aes(color = species)) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ species, ncol = 1) +
  labs(title = "Association between Bill Length and Bill Depth by Species",
       x = "Bill Length (mm)",
       y = "Bill Depth (mm)",
       color = "Species") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).

#Based on this modified plot, we can observe that there is a positive association between bill length and bill depth for each of the three penguin species, as indicated by the positive slope of the regression lines within each facet. However, there also appear to be differences between the species in the strength and shape of this association. For example, the association appears to be strongest and most linear for the Adelie penguins, while it appears weaker and more curved for the Gentoo penguins. Overall, these differences suggest that there may be species-specific relationships between bill length and depth in penguins, which could reflect differences in diet, foraging behavior, or other ecological factors.