Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

  1. Using data from the nasaweather package, create a scatterplot between wind and pressure, with color being used to distinguish the type of storm.
nasa <- storms

plot1 <- ggplot(nasa,aes(x=wind, y=pressure, color = type))+
          geom_point()+
          labs(title = 'Wind Vs. Pressure')
plot1

  1. Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
mlb <- MLB_teams

plot2 <- ggplot(mlb,aes(x = WPct,y=payroll)) +
          theme_bw() +
          scale_fill_brewer(palette = 'Dark2') +
          geom_point() + 
          labs(title= 'Relationship between Winning Percentage and Payroll',
               x = 'Winning %',
               y = 'Payroll')+
          geom_smooth(method = 'lm')

plot2

  1. The RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.
  1. Create a scatterplot of the number of crossings per day volume against the high temperature that day
  2. Separate your plot into facets by weekday (an indicator of weekend/holiday vs. weekday)
  3. Add regression lines to the two facets
rail <- RailTrail
plot3 <- ggplot(rail,aes(x=volume,y=hightemp)) +
          geom_point() +
          labs(title= 'High Temp Vs. Volume',
               x = 'Number Of Crossings per Day',
               y = 'High Temp(F)')
plot4<- ggplot(rail,aes(x=volume,y=hightemp))+
        geom_point() +
        geom_smooth(method = 'lm') +
        facet_wrap(~dayType, ncol = 2) +
        labs(title =  'High Temp Vs. Volume by weekday/weekends',
        x = "Number Of Crossings per Day",
        y = "High Temp(F)") 
#a.
plot3

#b & c.
plot4

  1. Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.
plot5<- ggplot(nasa,aes(x=long,y=lat))+
        geom_path(aes(color = name))+
        facet_wrap(~year)+
        labs(title = "Path of Tropical Storms",
        col = "Strom Names",
        x = "Longitude",
        y = "Latitude")

plot5

  1. Using the penguins data set from the palmerpenguins package.
  1. Create a scatterplot of bill_length_mm against bill_depth_mm where individual species are colored and a regression line is added to each species. Add regression lines to all of your facets. What do you observe about the association of bill depth and bill length?
  2. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.
penguins
## # A tibble: 344 x 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ... with 334 more rows, and 2 more variables: sex <fct>, year <int>
#a. bill depth and bill length have postive correlation amaong three species
plot6<- ggplot(penguins,aes(x=bill_length_mm ,y=bill_depth_mm, color = species)) +
        geom_point() +
        geom_smooth(method='lm')
        labs(title =  "Scatterplot of Bill Length and Bill Depth by Species",
        x = "Bill Length",
        y = "Bill Depth")
## $x
## [1] "Bill Length"
## 
## $y
## [1] "Bill Depth"
## 
## $title
## [1] "Scatterplot of Bill Length and Bill Depth by Species"
## 
## attr(,"class")
## [1] "labels"
plot6

#b.
plot7<- ggplot(penguins,aes(x=bill_length_mm ,y=bill_depth_mm)) +
        geom_point() +
        facet_wrap(~species)+
        geom_smooth(method='lm')
        labs(title =  "Scatterplot of Bill Length and Bill Depth faceted by Species",
       x = "Bill Length",
       y = "Bill Depth")
## $x
## [1] "Bill Length"
## 
## $y
## [1] "Bill Depth"
## 
## $title
## [1] "Scatterplot of Bill Length and Bill Depth faceted by Species"
## 
## attr(,"class")
## [1] "labels"
plot7