Submit your report in a pdf file, or html link knitted from R markdown along with the .rmd file.
Organize your report clearly by tasks, questions using different level of headers.
For each question, include the question itself, the code/result/graph to answer the question, and your answer in plain language.
You need to polish your graph details to reasonable visual comfort.
If you use any functions not taught in class, be prepared to explain them during class.
You are not allowed to use AI in any means.
Working with the data set flights in the package
nycflights13, answer the following questions by performing
necessary data transformation/visualization.
Create a histogram of arrival delays (excluding NAs) for all flights in June and July. Summarize your findings.
Create a smooth line graph of arrival delays vs departure delays for all flights departing from EWR on the first day of each month. Summarize your findings.
Find the flights that actually departed with the shortest travel distance. What is its origin and destination airport?
Create a new categorical variable with two labels. Flights with a travel distance shorter than 500 miles are marked as “short-distance”, and otherwise “long-distance”. Create a bar plot to compare the number of flights in each category. Summarize your findings.
Find the destination airport that has the longest average departure delay by creating a graph.
Answer the question in (e) without creating a graph.
Find the carriers with the highest and the lowest average flight speed for all their flights in the data set.
(Bonus - Self-study required) Find flights on which weekday (from Monday to Sunday) had the longest departure delay on average.
For the following questions, analyze the data set
seattlepets in the package openintro. Read the
help document and make sure that you understand the basic information
about the data set before analysis.
How many species are there in the data set? What are they?
What are the most popular primary breeds for cats and dogs, respectively?
What are the three most common pet names in Seattle?
What are the ten most common pet names for cats? What are the ten most common pet names for dogs? Write a code to print the result and their frequencies.
How many names appear more than 100 times in the data set excluding “NA”?
For all names that appear more than 100 times in the data set, which has the highest “cat_to_dog” ratio? Which has the lowest? The “cat_to_dog” ratio can be computed this way - if a name appears 200 times, in which 150 are for cats and 50 are for dogs, the ratio is 150/50 = 3.
(Bonus) Present a question of your own interest related to this data set. Answer your question with analysis or visualization.