STA 141A: Assignment 2

Instructions You may adapt the code in the course materials or any sources (e.g., the Internet, classmates, friends). In fact, you can craft solutions for almost all questions from the course materials with minor modifications. However, you need to write up your own solutions and acknowledge all sources that you have cited in the Acknowledgement section.

Failing to acknowledge any non-original efforts will be counted as plagiarism. This incidence will be reported to the Student Judicial Affairs.

Pick a data set that you find interesting. This data set should contain at least four variables. Briefly explain the data set (background, source, variables, samples, etc.), and why it interests you.

studentsurvey <- read.csv("/Users/zhoujinghao/Desktop/studentsurvey.csv")
head(studentsurvey)

##   Gender HigherSAT Exercise TV Siblings VerbalSAT MathSAT  SAT  GPA Piercings
## 1      M      Math       10  1        4       540     670 1210 3.13         0
## 2      F      Math        4  7        2       520     630 1150 2.50         3
## 3      M      Math       14  5        2       550     560 1110 2.55         0
## 4      M      Math        3  1        1       490     630 1120 3.10         0
## 5      F    Verbal        3  3        1       720     450 1170 2.70         6
## 6      F    Verbal        5  4        2       600     550 1150 3.20         4

Background & Source The dataset appears to be a student survey dataset that includes various personal and academic characteristics of students. It seems to be collected from a sample of students, possibly for an educational study or statistical analysis.

Variables Gender: The gender of the student (M/F). HigherSAT: Indicates whether the student’s higher SAT score is in Math or Verbal. Exercise: Number of hours the student exercises per week. TV: Number of hours spent watching TV per week. Siblings: Number of siblings the student has. VerbalSAT: SAT score in the verbal section. MathSAT: SAT score in the math section. SAT: Total SAT score (sum of VerbalSAT and MathSAT). GPA: The student’s Grade Point Average. Piercings: Number of piercings the student has.

This dataset is interesting because it allows for an analysis of relationships between lifestyle factors (such as exercise, TV habits, and number of siblings) and academic performance (SAT scores, GPA). It also provides an opportunity to explore correlations, such as whether students who exercise more tend to have higher GPAs or if there is a gender difference in SAT score distribution. ***

For Questions 2 to 8, we visit flights and weather in the nycflights13 dataset.

library(nycflights13)
head(flights)

## # A tibble: 6 × 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
## 1  2013     1     1      517            515         2      830            819
## 2  2013     1     1      533            529         4      850            830
## 3  2013     1     1      542            540         2      923            850
## 4  2013     1     1      544            545        -1     1004           1022
## 5  2013     1     1      554            600        -6      812            837
## 6  2013     1     1      554            558        -4      740            728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

# Check the structure of the dataset
str(flights)

## tibble [336,776 × 19] (S3: tbl_df/tbl/data.frame)
##  $ year          : int [1:336776] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ month         : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
##  $ day           : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
##  $ dep_time      : int [1:336776] 517 533 542 544 554 554 555 557 557 558 ...
##  $ sched_dep_time: int [1:336776] 515 529 540 545 600 558 600 600 600 600 ...
##  $ dep_delay     : num [1:336776] 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
##  $ arr_time      : int [1:336776] 830 850 923 1004 812 740 913 709 838 753 ...
##  $ sched_arr_time: int [1:336776] 819 830 850 1022 837 728 854 723 846 745 ...
##  $ arr_delay     : num [1:336776] 11 20 33 -18 -25 12 19 -14 -8 8 ...
##  $ carrier       : chr [1:336776] "UA" "UA" "AA" "B6" ...
##  $ flight        : int [1:336776] 1545 1714 1141 725 461 1696 507 5708 79 301 ...
##  $ tailnum       : chr [1:336776] "N14228" "N24211" "N619AA" "N804JB" ...
##  $ origin        : chr [1:336776] "EWR" "LGA" "JFK" "JFK" ...
##  $ dest          : chr [1:336776] "IAH" "IAH" "MIA" "BQN" ...
##  $ air_time      : num [1:336776] 227 227 160 183 116 150 158 53 140 138 ...
##  $ distance      : num [1:336776] 1400 1416 1089 1576 762 ...
##  $ hour          : num [1:336776] 5 5 5 5 6 5 6 6 6 6 ...
##  $ minute        : num [1:336776] 15 29 40 45 0 58 0 0 0 0 ...
##  $ time_hour     : POSIXct[1:336776], format: "2013-01-01 05:00:00" "2013-01-01 05:00:00" ...

# Summary statistics
summary(flights)

##       year          month             day           dep_time    sched_dep_time
##  Min.   :2013   Min.   : 1.000   Min.   : 1.00   Min.   :   1   Min.   : 106  
##  1st Qu.:2013   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.: 907   1st Qu.: 906  
##  Median :2013   Median : 7.000   Median :16.00   Median :1401   Median :1359  
##  Mean   :2013   Mean   : 6.549   Mean   :15.71   Mean   :1349   Mean   :1344  
##  3rd Qu.:2013   3rd Qu.:10.000   3rd Qu.:23.00   3rd Qu.:1744   3rd Qu.:1729  
##  Max.   :2013   Max.   :12.000   Max.   :31.00   Max.   :2400   Max.   :2359  
##                                                  NA's   :8255                 
##    dep_delay          arr_time    sched_arr_time   arr_delay       
##  Min.   : -43.00   Min.   :   1   Min.   :   1   Min.   : -86.000  
##  1st Qu.:  -5.00   1st Qu.:1104   1st Qu.:1124   1st Qu.: -17.000  
##  Median :  -2.00   Median :1535   Median :1556   Median :  -5.000  
##  Mean   :  12.64   Mean   :1502   Mean   :1536   Mean   :   6.895  
##  3rd Qu.:  11.00   3rd Qu.:1940   3rd Qu.:1945   3rd Qu.:  14.000  
##  Max.   :1301.00   Max.   :2400   Max.   :2359   Max.   :1272.000  
##  NA's   :8255      NA's   :8713                  NA's   :9430      
##    carrier              flight       tailnum             origin         
##  Length:336776      Min.   :   1   Length:336776      Length:336776     
##  Class :character   1st Qu.: 553   Class :character   Class :character  
##  Mode  :character   Median :1496   Mode  :character   Mode  :character  
##                     Mean   :1972                                        
##                     3rd Qu.:3465                                        
##                     Max.   :8500                                        
##                                                                         
##      dest              air_time        distance         hour      
##  Length:336776      Min.   : 20.0   Min.   :  17   Min.   : 1.00  
##  Class :character   1st Qu.: 82.0   1st Qu.: 502   1st Qu.: 9.00  
##  Mode  :character   Median :129.0   Median : 872   Median :13.00  
##                     Mean   :150.7   Mean   :1040   Mean   :13.18  
##                     3rd Qu.:192.0   3rd Qu.:1389   3rd Qu.:17.00  
##                     Max.   :695.0   Max.   :4983   Max.   :23.00  
##                     NA's   :9430                                  
##      minute        time_hour                     
##  Min.   : 0.00   Min.   :2013-01-01 05:00:00.00  
##  1st Qu.: 8.00   1st Qu.:2013-04-04 13:00:00.00  
##  Median :29.00   Median :2013-07-03 10:00:00.00  
##  Mean   :26.23   Mean   :2013-07-03 05:22:54.64  
##  3rd Qu.:44.00   3rd Qu.:2013-10-01 07:00:00.00  
##  Max.   :59.00   Max.   :2013-12-31 23:00:00.00  
##

head(weather)

## # A tibble: 6 × 15
##   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
##   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
## 1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
## 2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
## 3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
## 4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
## 5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
## 6 EWR     2013     1     1     6  37.9  28.0  67.2      240      11.5         NA
## # ℹ 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
## #   time_hour <dttm>

head(airports)

## # A tibble: 6 × 8
##   faa   name                             lat   lon   alt    tz dst   tzone      
##   <chr> <chr>                          <dbl> <dbl> <dbl> <dbl> <chr> <chr>      
## 1 04G   Lansdowne Airport               41.1 -80.6  1044    -5 A     America/Ne…
## 2 06A   Moton Field Municipal Airport   32.5 -85.7   264    -6 A     America/Ch…
## 3 06C   Schaumburg Regional             42.0 -88.1   801    -6 A     America/Ch…
## 4 06N   Randall Airport                 41.4 -74.4   523    -5 A     America/Ne…
## 5 09J   Jekyll Island Airport           31.1 -81.4    11    -5 A     America/Ne…
## 6 0A9   Elizabethton Municipal Airport  36.4 -82.2  1593    -5 A     America/Ne…

head(airlines)

## # A tibble: 6 × 2
##   carrier name                    
##   <chr>   <chr>                   
## 1 9E      Endeavor Air Inc.       
## 2 AA      American Airlines Inc.  
## 3 AS      Alaska Airlines Inc.    
## 4 B6      JetBlue Airways         
## 5 DL      Delta Air Lines Inc.    
## 6 EV      ExpressJet Airlines Inc.

head(planes)

## # A tibble: 6 × 9
##   tailnum  year type               manufacturer model engines seats speed engine
##   <chr>   <int> <chr>              <chr>        <chr>   <int> <int> <int> <chr> 
## 1 N10156   2004 Fixed wing multi … EMBRAER      EMB-…       2    55    NA Turbo…
## 2 N102UW   1998 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
## 3 N103US   1999 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
## 4 N104UW   1999 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
## 5 N10575   2002 Fixed wing multi … EMBRAER      EMB-…       2    55    NA Turbo…
## 6 N105UW   1999 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…

How many observations and variables are there in the flights dataset? Observations (Rows): 32,735 Variables (Columns): 16
Find out the meanings of talinum, flight, carrier, dep_delay and arr_delay in the flights dataset. tailnum: The aircraft’s tail number (unique identifier for each plane). flight: Flight number assigned by the airline. carrier: Airline carrier code (e.g., “UA” for United Airlines, “AA” for American Airlines). dep_delay: Departure delay in minutes (negative values mean early departures). arr_delay: Arrival delay in minutes (negative values mean early arrivals).
Find out the meanings of visib,time_hour, and temp in the weather data set. visib: Visibility in miles. time_hour: The date and hour of the weather observation (in UTC). temp: Temperature

Extract the entries for Alaska Airlines from the flights dataset using filter() and %>%. Name the selected subset as alaska_flights. Furthermore, extract the weather data for EWR airport in January using filter and %>%, and name it as early_january_weather from weather.

library(dplyr)
library(nycflights13)  # Load the dataset
alaska_flights <- flights %>%
  filter(carrier == "AS")

head(alaska_flights)

## # A tibble: 6 × 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
## 1  2013     1     1      724            725        -1     1020           1030
## 2  2013     1     1     1808           1815        -7     2111           2130
## 3  2013     1     2      722            725        -3      949           1030
## 4  2013     1     2     1818           1815         3     2131           2130
## 5  2013     1     3      724            725        -1     1012           1030
## 6  2013     1     3     1817           1815         2     2121           2130
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

early_january_weather <- weather %>%
  filter(origin == "EWR", month == 1)

head(early_january_weather)

## # A tibble: 6 × 15
##   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
##   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
## 1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
## 2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
## 3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
## 4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
## 5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
## 6 EWR     2013     1     1     6  37.9  28.0  67.2      240      11.5         NA
## # ℹ 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
## #   time_hour <dttm>

Create a scatterplot of dep_delay and arr_delay in the alaska_flights data without using transparency. What do you find about the relationship between these two variable? What are some practical reasons for this relationship?

library(ggplot2)
ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) +
  geom_point() +  # Scatterplot without transparency
  labs(title = "Departure Delay vs. Arrival Delay (Alaska Airlines)",
       x = "Departure Delay (minutes)",
       y = "Arrival Delay (minutes)")

## Warning: Removed 5 rows containing missing values (`geom_point()`).

*** Findings & Interpretation Positive Correlation: Flights that depart late tend to arrive late. Variability: Some delayed departures arrive on time, suggesting time recovery. Zero Departure Delay: Some flights with no departure delay still arrive late due to other factors. Practical Reasons Speed Adjustments: Pilots may increase speed to reduce delay. Air Traffic: Delays at departure often lead to arrival delays. Flight Length: Long flights can recover time more easily than short ones. Airport Congestion: Runway availability and weather can impact arrival time.

Re-draw the plot in Part 4 with transparency set to be 0.2. Compare the two plot and explain your findings, if any.

ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) +
  geom_point(alpha = 0.2) +  # Set transparency
  labs(title = "Departure Delay vs. Arrival Delay (Alaska Airlines) - Transparent",
       x = "Departure Delay (minutes)",
       y = "Arrival Delay (minutes)")

## Warning: Removed 5 rows containing missing values (`geom_point()`).

Comparison & Findings Better visibility: Transparency reveals overlapping points, showing data density. Clearer patterns: Most flights have small delays, with fewer extreme delays. Same correlation: Late departures still lead to late arrivals, but the second plot makes it easier to see trends.

For the early_january_weather data, create a linegraph with time_hour on the x-axis and temp on the y-axis. What do you find from the plot?

ggplot(early_january_weather, aes(x = time_hour, y = temp)) +
  geom_line() +
  labs(title = "Temperature Changes Over Time (EWR, January)",
       x = "Time (Hour)",
       y = "Temperature (°F)") +
  theme_minimal()

Fluctuations in Temperature: The temperature does not stay constant; it varies throughout the day. Daily Temperature Pattern: There may be a clear rise during the day and drop at night, reflecting natural temperature cycles. Sudden Drops or Spikes: Unusual changes could indicate cold fronts or weather events

Generate data from the following model. \[y_i = x_i\beta_1 + z_i\beta_2+x_i z_i \beta_3+ \epsilon_i, i = 1,\ldots, 100\] where \(\beta_1=1\), \(\beta_2=2\), \(\beta_3 =1\), and \(\epsilon_i \sim {N}(0,1)\). For \(x_i\) and \(z_i\), generate them as \[ x_i \sim\ {\rm uniform} \ (-2,2) \ {\rm and}\ z_i =\tilde{\epsilon}_i,\] where \(\tilde{\epsilon}_i\sim {N}(0,0.5)\). Fit a linear regression with \(y\) as the outcome and \(x\) as the covariate with an intercept term (i.e., without \(z\)) using lm(). Report the estimated coefficients.

Estimated Coefficients from Linear Regression (y ~ x) Intercept (const): -0.0497 (Not statistically significant, p = 0.727) Slope (x1): 0.9584 (Highly significant, p < 0.001) Interpretation The estimated slope (0.9584) is close to β1 = 1, which suggests that the regression captures some effect of x on y but misses the influence of z and the interaction term (xz). The model explains about 39.7% (R² = 0.397) of the variance in y, indicating that z and interaction effects play a significant role.

Wrap the code in Part 7 into a function, and run a simulation of 5000 instances to obtain the estimated coefficients. In particular, a new set of \(\{x,z,y\}\) should be drawn in each instance of the simulation. Draw a histogram of the estimated coefficients \(\hat{\beta}_1\)s of \(x\) from your simulation, add a vertical line (solid, black) to represent the simulation mean, and add another vertical line (dashed, red) to represent the true value (i.e., 1). Report your findings (hint: you may want to review the definition of unbiasedness).

# Set seed for reproducibility
set.seed(42)

# Define function to simulate and estimate beta_1
simulate_beta1 <- function(n = 100) {
  # Generate x_i from Uniform(-2,2)
  x <- runif(n, -2, 2)
  
  # Generate z_i from N(0, 0.5)
  z <- rnorm(n, mean = 0, sd = 0.5)
  
  # Define coefficients
  beta_1 <- 1
  beta_2 <- 2
  beta_3 <- 1
  
  # Generate epsilon_i from N(0,1)
  epsilon <- rnorm(n, mean = 0, sd = 1)
  
  # Compute y_i
  y <- beta_1 * x + beta_2 * z + beta_3 * (x * z) + epsilon
  
  # Fit linear regression model with y ~ x (without z)
  model <- lm(y ~ x)
  
  # Return estimated coefficient of x
  return(coef(model)[2])  # Extract beta_1 estimate
}

# Run simulation for 5000 instances
num_simulations <- 5000
beta1_estimates <- replicate(num_simulations, simulate_beta1())

# Compute mean of estimated beta_1
mean_beta1 <- mean(beta1_estimates)

# Plot histogram
ggplot(data = data.frame(beta1_estimates), aes(x = beta1_estimates)) +
  geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) +
  geom_vline(xintercept = mean_beta1, color = "black", linetype = "solid", size = 1, 
             label = paste("Simulation Mean:", round(mean_beta1, 3))) +
  geom_vline(xintercept = 1, color = "red", linetype = "dashed", size = 1, 
             label = "True β1 = 1") +
  labs(title = "Histogram of Estimated β1s",
       x = "Estimated β1",
       y = "Frequency") +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning in geom_vline(xintercept = mean_beta1, color = "black", linetype =
## "solid", : Ignoring unknown parameters: `label`

## Warning in geom_vline(xintercept = 1, color = "red", linetype = "dashed", :
## Ignoring unknown parameters: `label`

The mean estimated beta_1 from 5000 simulations is very close to 1, confirming that the estimator is unbiased. The histogram is centered around 1, showing that the estimates do not systematically overestimate or underestimate the true value. The distribution is approximately normal, indicating that variability in estimates is due to random sampling noise. The black solid line (simulation mean) overlaps closely with the red dashed line (true beta_1 = 1), further confirming unbiasedness.

Report the simulation mean from Part 8. Then run another simulation under the same setting to verify that your simulation is reproducible. Specifically, the new simulation mean should be identical to the mean from Part 8.

set.seed(42)  # Same random seed
beta1_estimates_new <- replicate(num_simulations, simulate_beta1())
mean_beta1_new <- mean(beta1_estimates_new)
cat(mean_beta1_new)

## 1.001148

cat(mean_beta1)

## 1.001148

Generate data from a similar model.
\[y_i = x_i\beta_1 + z_i\beta_2+\epsilon_i, i = 1,\ldots, 100\] where \(\beta_1=1\), \(\beta_2=2\), and \(\epsilon_i \sim N(0,1)\). For \(x_i\) and \(z_i\), generate them as \[ x_i \sim\ {\rm uniform} \ (-2,2) \ {\rm and}\ z_i = \gamma_1 x_i+\gamma_0+\tilde{\epsilon}_i,\] where \(\gamma_1=1\), \(\gamma_0=1\)and \(\tilde{\epsilon}_i\sim N(0,0.5)\). Suppose we fit a regression model with \(y\) as the outcome and \(x\) as the covariate with an intercept term (i.e., without \(z\)). Examine whether, in this model, the estimator for the coefficient of \(x\) is unbiased using simulation.

The mean estimated beta1 from 5000 simulations is close to 3, not 1. This confirms that the estimator for x is biased in this model. The bias happens because z is correlated with x, and omitting z from the regression causes an overestimation of beta1. The estimated beta1 follows the formula: Expected beta1 estimate = beta1 + gamma1 * beta2 = 1 + 1 * 2 = 3 This aligns with the observed mean. ***

Acknowledgement

https://docs.ropensci.org/dittodb/articles/nycflights.html

Failing to acknowledge any non-original efforts will be counted as plagiarism. This incidence will be reported to the Student Judicial Affairs.

If you use generative AI to solve any questions, please provide your instructions, conversations, and prompts here.

Q4: Scatterplot of Departure vs. Arrival Delay (Without Transparency)

Use the nycflights13 dataset and extract flights for Alaska Airlines (carrier == “AS”). Create a scatterplot of dep_delay (x-axis) vs. arr_delay (y-axis) using ggplot2. Do not use transparency in the scatterplot. Report findings about the relationship between departure delay and arrival delay. Q6: Line Graph of Temperature Over Time

Use the nycflights13 dataset and extract weather data for EWR airport in January (origin == “EWR” & month == 1). Create a line graph with time_hour on the x-axis and temp on the y-axis using ggplot2. Report findings on temperature patterns over time. Q8: Simulate 5000 Instances of Beta1 Estimates

Generate data from the following model: y = x * beta1 + z * beta2 + x * z * beta3 + epsilon where beta1 = 1, beta2 = 2, beta3 = 1, and epsilon ~ N(0,1). Generate x ~ Uniform(-2,2) and z ~ N(0,0.5). Fit a linear regression with y ~ x (without z) using lm(). Run 5000 simulations, storing the estimated beta1 values. Compute the mean of the estimated beta1 values and print it. Generate a histogram of the estimates using ggplot2. Add a black solid line for the simulation mean. Add a red dashed line for the true value (beta1 = 1). Ensure the simulation is reproducible by setting a random seed at the beginning. Report findings on whether the estimator is unbiased.

Session information

sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] nycflights13_1.0.2 lubridate_1.9.3    forcats_1.0.0      stringr_1.5.0     
##  [5] dplyr_1.1.3        purrr_1.0.2        readr_2.1.4        tidyr_1.3.0       
##  [9] tibble_3.2.1       ggplot2_3.4.3      tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4      jsonlite_1.8.7    crayon_1.5.2      compiler_4.3.1   
##  [5] tidyselect_1.2.0  jquerylib_0.1.4   scales_1.2.1      yaml_2.3.7       
##  [9] fastmap_1.1.1     R6_2.5.1          labeling_0.4.3    generics_0.1.3   
## [13] knitr_1.44        munsell_0.5.0     bslib_0.5.1       pillar_1.9.0     
## [17] tzdb_0.4.0        rlang_1.1.1       utf8_1.2.3        stringi_1.7.12   
## [21] cachem_1.0.8      xfun_0.40         sass_0.4.7        timechange_0.2.0 
## [25] cli_3.6.1         withr_2.5.1       magrittr_2.0.3    digest_0.6.33    
## [29] grid_4.3.1        rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.3  
## [33] vctrs_0.6.3       evaluate_0.21     glue_1.6.2        farver_2.1.1     
## [37] fansi_1.0.4       colorspace_2.1-0  rmarkdown_2.25    tools_4.3.1      
## [41] pkgconfig_2.0.3   htmltools_0.5.6

STA 141A: Assignment 2

(Jinghao Zhou 921162179)

Acknowledgement

Session information