discussion_week11

# load required packages
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# load airquality data
data("airquality")

# fit a linear regression model
lm_model <- lm(Ozone ~ Solar.R, data = airquality)

# print summary of the model
summary(lm_model)

## 
## Call:
## lm(formula = Ozone ~ Solar.R, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.292 -21.361  -8.864  16.373 119.136 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.59873    6.74790   2.756 0.006856 ** 
## Solar.R      0.12717    0.03278   3.880 0.000179 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31.33 on 109 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.1213, Adjusted R-squared:  0.1133 
## F-statistic: 15.05 on 1 and 109 DF,  p-value: 0.0001793

# plot the results
ggplot(airquality, aes(x = Solar.R, y = Ozone)) +
  geom_point() +                       # scatter plot
  geom_smooth(method = "lm", se = FALSE) +   # regression line
  labs(title = "Air Quality in New York City (May-September 1973)",
       x = "Solar Radiation (lang)", y = "Ozone Concentration (ppb)") +
  theme_bw()  # white background

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 42 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 42 rows containing missing values (`geom_point()`).

# calculate daily average ozone concentration
daily_avg <- airquality %>%
  mutate(Day = as.Date(paste(1973, Month, Day, sep = "-"))) %>%
  group_by(Day) %>%
  summarize(Avg_Ozone = mean(Ozone, na.rm = TRUE))


# create a bar graph of daily average ozone concentration
ggplot(daily_avg, aes(x = format(Day, "%b"), y = Avg_Ozone)) +
  geom_bar(stat = "identity") +
  labs(title = "Daily Average Ozone Concentration in New York City (May-September 1973)",
       x = "Month", y = "Average Ozone Concentration (ppb)") +
  theme_bw()  # white background

## Warning: Removed 37 rows containing missing values (`position_stack()`).

In conclusion, the regression analysis code written for the airquality dataset in R demonstrates the basic principles and usefulness of linear regression analysis. By fitting a linear regression model, we can identify the relationships between predictor variables and response variables, providing insights into the underlying processes and contributing factors. The summary() function allows us to obtain key statistical measures such as coefficients, standard errors, and R-squared values that help us to interpret the regression model and draw conclusions from the data. Ultimately, regression analysis is a powerful tool for data analysis and modeling, and is widely used in many fields, including economics, social sciences, and engineering, among others

discussion_week11

karmaGyatso

2023-04-12