We are going to create a simple linear regression model. lets getting started by loadig the required library
library(datarium)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.6 v purrr 0.3.4
## v tibble 3.1.7 v dplyr 1.0.9
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
We will be using the marketing data set from datarium library. lets Now load our data set. After loading our data set we use the head() to get a view of it.
data<-marketing
head(marketing,5)
## youtube facebook newspaper sales
## 1 276.12 45.36 83.04 26.52
## 2 53.40 47.16 54.12 12.48
## 3 20.64 55.08 83.16 11.16
## 4 181.80 49.56 70.20 22.20
## 5 216.96 12.96 70.08 15.48
From the data we loaded we will use the linear model to the relationship sales and youtube. We now try and visualize the data set we will be using to create our model usig the ggplot2() package
ggplot(marketing, aes(x = youtube, y = sales)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Graph plotted shows the relationship between the sales and the youtube linear regression is that the relationship between the outcome and predictor variables is linear and additive.
Now lets check the correlation between our variables
cor(marketing$sales, marketing$youtube)
## [1] 0.7822244
Now lets create our linear regression model
model <- lm(sales ~ youtube, data = marketing)
model
##
## Call:
## lm(formula = sales ~ youtube, data = marketing)
##
## Coefficients:
## (Intercept) youtube
## 8.43911 0.04754
The simple linear regression tries to find the best line to predict sales on the basis of youtube advertising budget. linear model equation can be written as follow: sales = b0 + b1 * youtube and R function lm() can be used to determine the beta coefficients of the linear model: The results show the intercept and the beta coefficient for the youtube variable, the estimated regression line equation can be written as follow: sales = 8.44 + 0.048*youtube
#Lets add a regression line then plot
ggplot(marketing, aes(x=youtube,y=sales)) +
geom_point() +
stat_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'
#Lets assess our model
summary(model)
##
## Call:
## lm(formula = sales ~ youtube, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0632 -2.3454 -0.2295 2.4805 8.6548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.439112 0.549412 15.36 <2e-16 ***
## youtube 0.047537 0.002691 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.91 on 198 degrees of freedom
## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
Congrats !!!! You have justed created your first linear regression model in R