Submission: Complete your solutions in the
provided R Markdown file, named
midterm-<your-name>.Rmd (replace
<your-name>). Submit both the .Rmd and
the HTML/PDF by the deadline.
Permitted Resources: You may use R help, class
materials, your own previous code, and the following packages:
tidyverse, openintro,
nycflights13 (and dependencies). No Internet
search, AI tools, or outside communication (except with the instructor)
allowed.
Late Policy: No late submissions.
Code Requirements: Use only the allowed packages. For each question, include relevant code and concise written answers. Use graphs/tables as instructed.
Graph Formatting: Polish your graphs.
You should use codes, graphs or results to answer each question unless noted otherwise.
mpg data setAfter loading tidyverse library, a data set named
mpg should be ready to explore. The following questions are
based on this data set.
Create a new variable mpg_overall which is the
average of city and highway fuel consumption in miles per gallon. Then
create a histogram of this new variable with each group covering values
of 20-22, 22-24 etc.
Create a graph to study the relationship between drive train
types and mpg_overall.
Create a table to find out which car class has the highest mean
mpg_overall.
Create a proper graph to study the composite effect of
year and cyl to mpg_overall. You
shall treat year and cyl as categorical
variables in your graph.
flights data setFor the following tasks, use data set flights of the
nycflights13 package.
For JFK airport, which day in November 2013 has the biggest average arrival delay? Create a table to answer the question.
Create a new variable cancel_flight which is
Cancelled if the departure time or arrival time is
NA, otherwise Not Cancelled.
Create a density graph that compares the distribution of
distance between cancelled flights and non-cancelled
flights.
How many unique flight routes are there in the data set? That is, each unique combination of an origin airport and a destination airport (such as from EWR to ORD) is considered as a route. Create a table to answer the question.
Add distance as a column to the table you created in
d). Hint: You should go back to the original flights data
set and reconstruct the table with distance included. Create a histogram
of distance for the route table.
Which route has the highest rate of flight cancellation? Create a table to answer the question.
flights data setThe following questions are also from flights data set.
Each question is worth 5% bonus points if answered correctly.
Create a proper graph to show the rate of cancellation flights for each airline. Answer which airline has the lowest rate of cancellation.
If multiple airlines run the same route, they can be considered as competitors. Which route is most competitive (has the most number of carriers)? List all of them in a table.