This dataset shows relationship between years and miles flown over time
The dataset contains years from the range 1937 to 1960
2025-03-17
This dataset shows relationship between years and miles flown over time
The dataset contains years from the range 1937 to 1960
Print out the first 6 rows of the data
data("airmiles") ?airmiles # find out about the data years <- 1937:1960 # the dataset contains the years from the range 1937 to 1960 df <- data.frame(years = years, miles = as.numeric(airmiles)) # create the dataframe for the airmiles head(df)
## years miles ## 1 1937 412 ## 2 1938 480 ## 3 1939 683 ## 4 1940 1052 ## 5 1941 1385 ## 6 1942 1418
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 412 1580 6431 10528 17532 30514
Important notes:
minimum miles flown in a year was 412
maximum miles flown in a year was 30,514
The average miles flown in all years of dataset was 10,528
The formula for the simple regression in regards to this dataset:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
The meanings of each variable in the formula:
- \(x\) is the year
- \(y\) is the total miles flown
- \(\beta_0\) is the intercept which is essentially what the starting miles is when the year is 0
- \(\beta_1\) is the slope which is essentially the rate of change in miles flown each year
- \(\epsilon\) is the random error which essentially is the difference between the actual and predicted miles flown
Simple regression formula:
$$ y = \beta_0 + \beta_1 x + \epsilon $$
If for example the predicted miles was 500 but the actual miles was 700 then we calculate epsilon as:
\[ \epsilon = 700 - 500 = 200 \]
This 200 value indicates that the prediction was off by 200 miles the the xth year
The blue line on the graph is the regression line which essentially is the best fit
The dots are the actual values from the dataset
## `geom_smooth()` using formula = 'y ~ x'