The “airquality” dataset built into base R describes measurements of New York’s air quality using time-series data from May-September of 1973. It contains six variables: mean ozone (parts per billion), solar radiation, wind speed(mph), maximum daily temperature (F), month, and day of the month. Ozone concentration is the primary variable; higher ozone indicates worse air quality.
The “mtcars” dataset built into base R describes motor trend car road tests using cross-sectional data from the 1974 Motor Trend US magazine. It contains 32 observations with 11 numeric variables for different characteristics such as weight (1000 lbs), number of cylinders, gross horsepower, rear axle ratio, etc. The main variable of interest is miles per gallon (mpg), as it represents fuel efficiency.
3a.
“airquality” is a time series dataset, as it contains daily observations over a specific period of time (May-September 1973).
data(airquality)summary(airquality$Ozone)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 18.00 31.50 42.13 63.25 168.00 37
data(airquality)time <-1:nrow(airquality)plot(time, airquality$Ozone,type ="l",xlab ="Time (days)",ylab ="Ozone(ppb)",main ="1973 New York Ozone Levels Over Time")
The quality of air in New York is a single unit being observed repeatedly over time; it focuses on a single unit across time rather than comparing multiple units. If the dataset included air quality measurements from multiple cities at multiple time periods, it would instead be pooled-cross sectional.
ggplot2 attempt
library(ggplot2)data(airquality)ggplot(airquality, aes(x=time, y=Ozone)) +geom_line() +labs(title="1973 New York Ozone Levels Over Time")
Ozone concentration appears to peak near the end of the plot, most likely August. This makes sense that it would rise during the hotter summer months.
3b.
“mtcars” contains cross-sectional data because it measures individual cars at a single point in time.
data(mtcars)summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
data(mtcars)plot(mtcars$wt, mtcars$mpg,xlab ="Weight(1000 lbs)",ylab ="Miles per Gallon",main ="Cars MPG vs Weight")
Each point on the scatterplot represents an individual observation, so the fuel efficiency of each car can be compared. As weight increases, miles per gallon tends to decrease, aligning with the intuition that heavier cars are less fuel efficient.
ggplot2 attempt
library(ggplot2)data(mtcars)ggplot(mtcars, aes(x= wt, y = mpg)) +geom_point() +labs(title ="Cars MPG vs Weight",x="Weight(1000 lbs)",y="Miles per Gallon")
The scatterplot shows the relationship between a cars weight and mpg. Since the observations don’t span across time or involve different time periods, the data is cross-sectional.
II Slope Parameter
1a. Covariance measures how two variables move together, looking at how changing one variable changes another variable. This relationship could be positive or negative. In the example of mtcars, covariance between mpg and weight would examine how mpg changes when weight changes- a negative covariance.
1b. Variance measures how much the variable deviates from its mean; graphically it is the dispersion of the height from the best fit line. If there is high variance, values in a dataset are more spread out.
2. Dividing the covariance of y and x by the variance of x gives you the slope coefficient from a simple linear regression. Slope represents the change in y for every 1 unit change in x. Covariance measures how two variables move together, essentially capturing an unscaled version of the slope. However, covariance is influenced by the scale and variability of x. Dividing by the variance acts like a normalizing factor to measure how much y changes per unit change in x.