Load the CSV file FlightsWithAirlines.csv containing composite flight and airline information objects into a data frame called df.flights. Do not load the text (strings) attributes (columns) as factors, so use stringsAsFactors = FALSE as a parameter in your function that loads the data; load them as text. Do not add any path as part of the file name or in your code. Load the file from its URL – do not load from your computer. To get the URL for the CSV, right-click on the link and then select “Copy Link Address” or a similar menu option for your browser; do not click on the link as that will cause the browser to attempt to download and display the file. Do not echo the code into your knitted document.
df.flights <- read.csv(url("https://s3.us-east-2.amazonaws.com/artificium.us/datasets/FlightsWithAirlines.csv"), stringsAsFactors = FALSE)
Note: with R 4.0.0 and later, read.csv has been modified so that stringsAsFactors = FALSE is a default value, making string columns load automatically as text and not factors, but it is still good practice to include it for code clarity and compatibility, particularly if code is run with older versions of R!
Produce the following markdown in your document and embed code within the markdown so it is dynamic. Do not echo the code into the knitted document that produces the markdown; we just want the text (this is the essence of “literate programming”). You may add any additional code chunks to make calculations, but do not echo them in your knitted document. The highlighted numbers are calculated (you cannot “hard code” them as the data might change the next time you knit).You do not need to highlight them in your markdown, although you may.
# Calculate basic dataset properties
num_rows <- nrow(df.flights) # Get total number of rows
num_cols <- ncol(df.flights) # Get total number of columns
# Calculate statistics on flight delays
avg_delay <-round(mean(df.flights$dep_delay, na.rm=TRUE)) # Calculate mean delay, rounded to nearest min (to a whole number)
quantity_delayed_flights <- sum(df.flights$dep_delay > 60, na.rm=TRUE) # Count flights with delays longer than 60 minutes
The dataset contains 18 observations (rows) and 14 columns. Each row is information about a flight, including departure and departure delay information. The average mean departure delay is 70 minutes. A total of 3 flights experienced a delay of more than an hour.
Add a new column ‘isNight’ to the data frame with a value of T or F, depending whether the actual departure time was at night. Set to “T” if the time was after 7pm and before 5am. The time in the data file is in 24-hour form. Hint: Look up how to use the ifelse function. Use the dep_hr, dep_min, and dep_delay columns. You may simplify the question to only use the dep_hr column, but you will only get 50% of the credit
# Step 1: Convert scheduled departure times to total minutes since midnight so factors have same units for calculation
total_minutes_scheduled <- df.flights$dep_hr * 60 + df.flights$dep_min # Multiply hours by 60 to get minutes then add departure minutes
# Step 2: Add delay to total scheduled minutes since midnight to get actual departure time in minutes
total_minutes_actual <- total_minutes_scheduled + df.flights$dep_delay # Scheduled time plus delay time equals actual time of departure in minutes
# Step 3: Ensure time is expressed as minutes within a 24-hour day (0-1439). Use modulo 1440 to manage times past midnight, for example, 1455 minutes (12:15 AM the next day) becomes 0:15 for determining day/night flight, and the day when actual time > 1439, just identifying day or night.
total_minutes_actual <- total_minutes_actual %% 1440
# Step 4: Extract hour component from total minutes for night identification.
actual_hours <- total_minutes_actual %/% 60 # Use integer division to ensure whole number
# Step 5: Create isNight column in df.flights based on the actual hours calculation in step 4. Night is defined as 1900-0500 (7pm-5am)
df.flights$isNight <- ifelse(actual_hours >=19 | actual_hours < 5, TRUE, FALSE) # this vectorized conditional function returns TRUE if the hour is greater than or equal to 19, and less than 5, and returns FALSE if conditions are not met
And finally, print the dataframe to ensure the new column is there and is correct. You may just use the print() function or experiment with the kable() function.
#Printing the dataframe with kable() yields a more visually appealing table with proper column headers
library(knitr) # Load the knitr library to use kable()
kable(df.flights, caption = "Flight Data with Night Status")
| year | month | day | dep_hr | dep_min | dep_delay | carrier | airline | country | flight | equip | tailnum | origin | dest | isNight |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022 | 9 | 18 | 2 | 38 | 0 | UA | United | USA | 1545 | B737-8 | N14228 | EWR | IAH | TRUE |
| 2022 | 2 | 14 | 12 | 25 | 13 | AA | American Airlines | USA | 441 | B777 | N24211 | MIA | GRU | FALSE |
| 2022 | 1 | 7 | 13 | 56 | 9 | AA | American Airlines | USA | 1141 | A321 | N619AA | JFK | MIA | FALSE |
| 2022 | 6 | 7 | 22 | 56 | 7 | B6 | JetBlue | USA | 725 | A321 | N804JB | JFK | BQN | TRUE |
| 2022 | 1 | 30 | 24 | 2 | 14 | DL | Delta Airlines | USA | 461 | B757-2 | N668DN | LGA | ATL | TRUE |
| 2022 | 5 | 2 | 11 | 2 | 49 | UA | United | USA | 1696 | B737-MAX | N39463 | EWR | ORD | FALSE |
| 2022 | 5 | 5 | 5 | 50 | 13 | B6 | JetBlue | USA | 507 | A321 | N516JB | EWR | FLL | FALSE |
| 2022 | 10 | 24 | 20 | 23 | 50 | NK | Spirit Airways | USA | 5708 | A321 | N829AS | BOS | PBI | TRUE |
| 2022 | 6 | 16 | 5 | 31 | 30 | LH | Lufthansa | Germany | 411 | B747-4 | N593JB | CLT | MUC | FALSE |
| 2022 | 8 | 18 | 2 | 38 | 20 | UA | United | USA | 1545 | B737-8 | N14228 | EWR | IAH | TRUE |
| 2022 | 3 | 14 | 12 | 25 | 91 | AA | American Airlines | USA | 441 | B777 | N24211 | MIA | GRU | FALSE |
| 2022 | 2 | 7 | 13 | 56 | 0 | AA | American Airlines | USA | 1141 | A321 | N619AA | JFK | MIA | FALSE |
| 2022 | 5 | 7 | 22 | 56 | 2 | B6 | JetBlue | USA | 725 | A321 | N804JB | JFK | BQN | TRUE |
| 2023 | 2 | 30 | 24 | 2 | 0 | DL | Delta Airlines | USA | 461 | B757-2 | N668DN | LGA | ATL | TRUE |
| 2023 | 6 | 2 | 11 | 2 | 68 | UA | United | USA | 1696 | B737-MAX | N39463 | EWR | ORD | FALSE |
| 2023 | 6 | 5 | 5 | 50 | 850 | B6 | JetBlue | USA | 507 | A321 | N516JB | EWR | FLL | TRUE |
| 2023 | 11 | 24 | 20 | 23 | 20 | NK | Spirit Airways | USA | 5708 | A321 | N829AS | BOS | PBI | TRUE |
| 2024 | 7 | 16 | 5 | 31 | 18 | LH | Lufthansa | Germany | 411 | B747-4 | N593JB | CLT | MUC | FALSE |