Questions

  1. Using data from the nasaweather package, create a scatterplot between wind and pressure, with color being used to distinguish the type of storm.
ggplot(storms, aes(x = wind, y = pressure, color = type)) +
  geom_point() +
  labs(
    col = 'Type',
    title = 'Scatterplot Between Wind and Pressure with Each Type of Storm',
    x = 'Wind',
    y = 'Pressure',
  )

From the scatterplot above, we can see the patterns of each type of storm. For example, hurricane’s wind speed is high but the pressure is high, etc.

  1. Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
ggplot(MLB_teams, aes(x = WPct, y = payroll / 1000000)) +
  geom_point() +
  geom_smooth(color = "blue") +
  labs(
    title = 'Show the Relationship Between Winning Percentage and  Payroll',
    x = 'Winning Perecntage (%)',
    y = 'Payroll (Million $)',
  )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

As we can see from the graph above, there doesn’t seem to have any clear relationship between winning percentage and payroll.

  1. The RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.
  1. Create a scatterplot of the number of crossings per day volume against the high temperature that day
  2. Separate your plot into facets by weekday (an indicator of weekend/holiday vs. weekday)
  3. Add regression lines to the two facets
ggplot(RailTrail, aes(x = hightemp, y = volume)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Trail Volume against the High Temperature",
    x = "High Temperature (deg. F)",
    y = "Trail Volume")
## `geom_smooth()` using formula 'y ~ x'

RailTrail %>%
  mutate(weekday_nice = ifelse(weekday, "Weekday", "Weekend/Holiday")) %>%
  ggplot(aes(x = hightemp, y = volume)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    facet_wrap(~ weekday_nice, nrow = 1) +
    labs(
      title = "Trail Volume against the High Temperature (Weekday vs Weekend/Holiday)",
      x = "High Temperature (deg. F)",
      y = "Trail Volume")
## `geom_smooth()` using formula 'y ~ x'

  1. Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

Using the geom_path function and the facets to plot each year between 1995 and 2000, we can see the path of each storm.

ggplot(storms, aes(x = lat, y = long)) +
  geom_path(aes(color = name)) +
  facet_wrap(~year, nrow = 3) +
  labs(
    title = "The Path of Storms (1995 - 2000)",
    col = "Name",
    x = "Longitude",
    y = "Latitude"
  )

  1. Using the penguins data set from the palmerpenguins package.
  1. Create a scatterplot of bill_length_mm against bill_depth_mm where individual species are colored and a regression line is added to each species. Add regression lines to all of your facets. What do you observe about the association of bill depth and bill length?
  2. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.

By separating species, we can see that each species has a similar slopes. In other others each graph has a linear relationship between bill length and bill depth across species. However, the ranges of each species is different. We can see this even more clearly using the facets by species. To illustrate this, Gentoo penguins tend to have smaller bill depth but large bill length, while Adeli penguins tend to have smaller bill length but greater bill depth.

#a
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  geom_smooth(method = "lm", aes(color = species)) +
  labs(
    col = 'Species',
    title = 'Scatterplot Between Bill Length and Depth By Species',
    x = 'Bill Length (mm)',
    y = 'Bill Depth (mm)',
  )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

# b
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  geom_smooth(method = "lm", color = "blue") +
  facet_wrap(~species) +
  labs(
    col = 'Species',
    title = 'Scatterplot Between Bill Length and Depth By Species',
    x = 'Bill Length (mm)',
    y = 'Bill Depth (mm)',
  )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Removed 2 rows containing missing values (geom_point).