I, tianlu sui, hereby state that I have not gained information in any way not allowed by the exam rules during this exam, and that all work is my own.
# load required packages here
library(tidyverse)
library(openintro)
library(nycflights13)
library(dplyr)
library(ggplot2)
mpg data setAfter loading tidyverse library, a data set named
mpg should be ready to explore. The following questions are
based on this data set.
mpg_overall which is the
average of city and highway fuel consumption in miles per gallon. Then
create a histogram of this new variable with each group covering values
of 20-22, 22-24 etc.?mpg
## 打开httpd帮助服务器… 好了
# Enter code here.
mpg%>%
mutate(mpg_overall=(hwy+cty)/2)%>%
ggplot(aes(x=mpg_overall))+
geom_histogram(binwidth = 2)
mpg_overall.# Enter code here.
mpg%>%
mutate(mpg_overall=(hwy+cty)/2)%>%
ggplot(aes(x=drv,y=mpg_overall))+
geom_boxplot()
Answer:Front-wheel drive vehicles achieve the highest miles
per gallon, while rear-wheel drive and four-wheel drive models are
comparable. However, the median fuel efficiency of four-wheel drive
vehicles is lower than that of rear-wheel drive vehicles.
mpg_overall.# Enter code here.
mpg%>%
mutate(mpg_overall=(hwy+cty)/2)%>%
group_by(class,mpg_overall)%>%
summarise(avg_mpg_overall= mean(mpg_overall, na.rm = TRUE))%>%
arrange(desc(avg_mpg_overall))
Answer:subcompact
year and cyl to mpg_overall. You
shall treat year and cyl as categorical
variables in your graph.table(mpg$cyl)
##
## 4 5 6 8
## 81 4 79 70
# Enter code here.
mpg%>%
mutate(mpg_overall=(hwy+cty)/2)%>%
ggplot(aes(x = mpg_overall)) +
geom_histogram() +
facet_grid(cyl ~ year)
Answer:Five-cylinder vehicles were only produced in limited
numbers during 2008. Four-cylinder models achieved fuel efficiency of
20-30 miles per gallon, six-cylinder variants managed 15-25 miles per
gallon, while eight-cylinder engines delivered 10-20 miles per gallon.
This demonstrates that, fundamentally, the greater the number of
cylinders, the higher the fuel consumption. When comparing by year,
vehicles with the same number of cylinders from 2008 are more
fuel-efficient than those from 1999.
flights data setFor the following tasks, use data set flights of the
nycflights13 package.
flights
# Enter code here.
flights %>%
filter(origin == "JFK", month == 11, !is.na(arr_delay)) %>%
group_by(day) %>%
summarise(
total_arr_delay = sum(arr_delay),
flights = n(),
avg_arr_delay = total_arr_delay / flights
) %>%
arrange(desc(avg_arr_delay))
Answer:27
cancel_flight which is
Cancelled if the departure time or arrival time is
NA, otherwise Not Cancelled.# Enter code here.
my_flights<-mutate(flights, cancel_flight = ifelse(is.na(dep_time) | is.na(arr_time), "cancelled", "non-cancelled"))
my_flights
Answer:
distance between cancelled flights and non-cancelled
flights.# Enter code here.
my_flights%>%
ggplot( aes(x = distance, fill = cancel_flight)) +
geom_density(adjust = 2, alpha = 0.5)
# Enter code here.
flights %>%
distinct(origin, dest) %>%
arrange(origin, dest)
Answer:224 routes
distance as a column to the table you created in
d).Hint: You should go back to the original flights data
set and reconstruct the table with distance included. Create a histogram
of distance for the route table.
# Enter code here.
flights_route <- flights %>%
distinct(origin, dest, distance)
ggplot(flights_route,aes(x = distance)) +
geom_histogram(binwidth = 200)
# Enter code here.
my_flights%>%
group_by(origin, dest)%>%
summarise(
total_flights = n(),
cancelled_flights = sum(cancel_flight == "cancelled"),
cancel_rate = cancelled_flights / total_flights
) %>%
arrange(desc(cancel_rate))
Answer:EWR–LGA
flights data setThe following questions are also from flights data set.
Each question is worth 5% bonus points if answered correctly.
# Enter code here.
carrier_cancel<-my_flights%>%
group_by(carrier)%>%
summarise(
total_flights = n(),
cancelled_flights = sum(cancel_flight == "cancelled"),
cancel_rate = cancelled_flights / total_flights
)
ggplot(carrier_cancel,aes(x=carrier,y=cancel_rate))+
geom_col()
Answer:HAHA
# Enter code here.
flights%>%
group_by(origin,dest)%>%
summarise(total_carrier=n())%>%
arrange(desc(total_carrier))
Answer:JFK–LAX