Loading Libraries:
knitr::opts_chunk$set(warning = FALSE, fig.align = "center", out.width = "85%",
message = FALSE, cache = TRUE)
library(openintro)
library(tidyverse)
library(nycflights13)
library(maps)
Global Theme Set Up:
my_theme <- theme(
plot.title = element_text(size = rel(1.8), hjust = 0.5,
margin = margin(10,15,10,10)),
axis.title = element_text(size = rel(1.5), hjust = 0.5),
axis.title.x = element_text(margin = margin(10,5,5,5)),
axis.title.y = element_text(margin = margin(5,10,5,5)),
axis.text = element_text(size = rel(1.1)))
Lab 1:
Explain how you understand the diagram above.
Answer: The diagram represents a relational database
schema for flights
data, illustrating the relationships
between multiple tables. At the center is the flights
table, which contains essential flight details such as
year
, month
, day
,
hour
, flight
, origin
,
dest
, tailnum
, and carrier
. The
airports
table stores airport information and is linked to
the flights table through the origin
and dest
columns, which reference the faa
(airport code). The
planes
table contains aircraft details and connects via the
tailnum
. Additionally, the airlines
table
provides airline names and is related to the flights
table
through the carrier
column. Lastly, the
weather
table stores meteorological data and links to the
flights
table using year
, month
,
day
, hour
, and origin
, allowing
analysis of weather conditions affecting flights. These relationships
indicate that the schema is designed to analyze flight performance,
delays, and external influences by integrating flights
,
weather
, airport
, and airline
data.
Lab 2:
Answer whether there exists a primary key in the mpg data set
with codes.
Code:
sum(duplicated(mpg))
## [1] 9
Answer: There is no primary key in mpg
data set.
Lab 3:
Identify the following relationship to be one-to-many,
one-to-one, or many-to-many
1. origi
n in flights
and
faa
in airports
Answer: Many to one.
2. The variable student_name
in a table
student information
for the college and a table
grade score
of our course.
Answer: The relationship between student_name and
grade score depends on the structure of the table. If student_name
contains unique values, the relationship is one-to-many, meaning each
student can have multiple grade scores. However, if each student has
only one grade score recorded, the relationship is one-to-one. In cases
where there are multiple students with the same name, such as Anna or
Linda, and each student has multiple scores across different subjects,
the relationship becomes many-to-many.
Lab 4:
Add airports
information to the data set
flights2
for both origin and destination. Rename your
columns to differentiate information for dest
and
origin
.
Code:
flights2 <- flights %>%
select(year:day, hour, origin, dest, tailnum, carrier)
flights2 %>%
left_join(airports, c("origin" = "faa"))
## # A tibble: 336,776 × 15
## year month day hour origin dest tailnum carrier name lat lon alt
## <int> <int> <int> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2013 1 1 5 EWR IAH N14228 UA Newar… 40.7 -74.2 18
## 2 2013 1 1 5 LGA IAH N24211 UA La Gu… 40.8 -73.9 22
## 3 2013 1 1 5 JFK MIA N619AA AA John … 40.6 -73.8 13
## 4 2013 1 1 5 JFK BQN N804JB B6 John … 40.6 -73.8 13
## 5 2013 1 1 6 LGA ATL N668DN DL La Gu… 40.8 -73.9 22
## 6 2013 1 1 5 EWR ORD N39463 UA Newar… 40.7 -74.2 18
## 7 2013 1 1 6 EWR FLL N516JB B6 Newar… 40.7 -74.2 18
## 8 2013 1 1 6 LGA IAD N829AS EV La Gu… 40.8 -73.9 22
## 9 2013 1 1 6 JFK MCO N593JB B6 John … 40.6 -73.8 13
## 10 2013 1 1 6 LGA ORD N3ALAA AA La Gu… 40.8 -73.9 22
## # ℹ 336,766 more rows
## # ℹ 3 more variables: tz <dbl>, dst <chr>, tzone <chr>
flights2 %>%
left_join(airports, c("dest" = "faa")) -> flights_airport
flights_airport %>%
rename("daylight_saving" = `dst`, "airport_name" = `name`,
"departure_airport" = `origin`, "arrival_airport" = `dest`) %>%
glimpse()
## Rows: 336,776
## Columns: 15
## $ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013…
## $ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ hour <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6…
## $ departure_airport <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LG…
## $ arrival_airport <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IA…
## $ tailnum <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N…
## $ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6"…
## $ airport_name <chr> "George Bush Intercontinental", "George Bush Interco…
## $ lat <dbl> 29.98443, 29.98443, 25.79325, NA, 33.63672, 41.97860…
## $ lon <dbl> -95.34144, -95.34144, -80.29056, NA, -84.42807, -87.…
## $ alt <dbl> 97, 97, 8, NA, 1026, 668, 9, 313, 96, 668, 19, 26, 1…
## $ tz <dbl> -6, -6, -5, NA, -5, -6, -5, -5, -5, -6, -5, -5, -8, …
## $ daylight_saving <chr> "A", "A", "A", NA, "A", "A", "A", "A", "A", "A", "A"…
## $ tzone <chr> "America/Chicago", "America/Chicago", "America/New_Y…
Lab 5:
Create a airport map similar to the one above with the color
representing the number of flights per day from NYC to each
airport.
Code:
flights_per_day <- flights %>%
group_by(dest) %>%
summarise(avg_flights_per_day = n()/365) %>%
left_join(airports, by = c("dest" = "faa")) %>%
print()
## # A tibble: 105 × 9
## dest avg_flights_per_day name lat lon alt tz dst tzone
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 ABQ 0.696 Albuquerque I… 35.0 -107. 5355 -7 A Amer…
## 2 ACK 0.726 Nantucket Mem 41.3 -70.1 48 -5 A Amer…
## 3 ALB 1.20 Albany Intl 42.7 -73.8 285 -5 A Amer…
## 4 ANC 0.0219 Ted Stevens A… 61.2 -150. 152 -9 A Amer…
## 5 ATL 47.2 Hartsfield Ja… 33.6 -84.4 1026 -5 A Amer…
## 6 AUS 6.68 Austin Bergst… 30.2 -97.7 542 -6 A Amer…
## 7 AVL 0.753 Asheville Reg… 35.4 -82.5 2165 -5 A Amer…
## 8 BDL 1.21 Bradley Intl 41.9 -72.7 173 -5 A Amer…
## 9 BGR 1.03 Bangor Intl 44.8 -68.8 192 -5 A Amer…
## 10 BHM 0.814 Birmingham In… 33.6 -86.8 644 -6 A Amer…
## # ℹ 95 more rows
ggplot(flights_per_day, aes(x = lon, y = lat)) +
borders("state") +
geom_point(aes(color = avg_flights_per_day)) +
scale_color_gradient( low = "palegreen", high = "rosybrown1") +
coord_quickmap() +
labs(title = "How Many Flights Departs From NYC?",
x = "longitude",
y = "latitude") +
theme(plot.title = element_text(color = "palegreen4"),
axis.title = element_text(color = "rosybrown4")) +
my_theme