11-12-2023

Introduction

This presentation explores the trend in internet usage over time using the WWWusage dataset.

The WWWusage dataset contains time series data on the number of users connected to the Internet.

Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables.

Loading and Exploring the Dataset

data("WWWusage")
head(WWWusage)
## [1] 88 84 85 85 84 85

Data Visualization Code

usage_data <- data.frame(time = 1:length(WWWusage), usage = WWWusage)
ggplot(usage_data, aes(x = time, y = usage)) +
  geom_line() +
  labs(title = "Internet Usage Over Time",
       x = "Time (minutes)",
       y = "Number of Users") +
  theme_minimal()
## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.

Data Visualization

## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.

Performing Linear Regression

## 
## Call:
## lm(formula = usage ~ time, data = usage_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.003 -23.686   7.702  29.579  62.146 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 105.8309     7.2395  14.618  < 2e-16 ***
## time          0.6188     0.1245   4.972 2.82e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.93 on 98 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.1933 
## F-statistic: 24.72 on 1 and 98 DF,  p-value: 2.823e-06

Regression Line on Plot

## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.
## `geom_smooth()` using formula = 'y ~ x'

Interactive 3D Plot

Mathematical Notation - Linear Regression Model

Simple linear regression model:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

where \(Y\) is the dependent variable, \(X\) is the independent variable, \(\beta_0\) and \(\beta_1\) are coefficients, and \(\epsilon\) is the error term.

Mathematical Notation - Correlation Coefficient

The correlation coefficient \(r\) is defined as:

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]