Dataset - We will be using two online datasets available in R for this tutorial

plot(y3 ~ x2, data = anscombe, pch = 16)
abline(lm(y3 ~ x3, anscombe), col = "grey20")

Question 1:is there a relationship between x and y? If so, what does the relationship look like?

Yes, there is a positive correlation between x and y. It creates an upward slope.

library(readr)
library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ TV, data = ad_sales)

plot(sales ~ radio, data = ad_sales)

Question 2:Is there a relationship between TV advertising and Sales? If so, what does the relationship look like?

There is a relationship, but it is very weak, as the graph isn’t tightly clustered.

Question 3:Can you plot the relationship between radio advertising and Sales? If so, what does the relationship look like?

Yes, by running: plot(sales ~ radio, data = ad_sales), you can plot the relationship between the two. It is a weak, but positive relationship, similar to the relationship between Sales and TV advertising.

###Question 4:Three things you learned from this tutorial ### 1. I learned how to analyze the relationships between data after displaying them in a scatter plot. 2. I learned that bivariate analysis is used to identify the type of relationship between two variables. 3. I learned that bivariate analysis is different than a simple two sample data analysis. When conducting a bivariate analysis, there is always a Y value for each X. This means that no matter what, there is always a relationship between the data.

install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("broom")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("ggpubr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(broom)
library(ggpubr)
summary(ad_sales)
##       ...1              X1               TV             radio       
##  Min.   :  1.00   Min.   :  1.00   Min.   :  0.70   Min.   : 0.000  
##  1st Qu.: 50.75   1st Qu.: 50.75   1st Qu.: 74.38   1st Qu.: 9.975  
##  Median :100.50   Median :100.50   Median :149.75   Median :22.900  
##  Mean   :100.50   Mean   :100.50   Mean   :147.04   Mean   :23.264  
##  3rd Qu.:150.25   3rd Qu.:150.25   3rd Qu.:218.82   3rd Qu.:36.525  
##  Max.   :200.00   Max.   :200.00   Max.   :296.40   Max.   :49.600  
##    newspaper          sales      
##  Min.   :  0.30   Min.   : 1.60  
##  1st Qu.: 12.75   1st Qu.:10.38  
##  Median : 25.75   Median :12.90  
##  Mean   : 30.55   Mean   :14.02  
##  3rd Qu.: 45.10   3rd Qu.:17.40  
##  Max.   :114.00   Max.   :27.00
lm(sales ~ TV, data = ad_sales)
## 
## Call:
## lm(formula = sales ~ TV, data = ad_sales)
## 
## Coefficients:
## (Intercept)           TV  
##     7.03259      0.04754

Fit the linear model and store the result in ‘model’

model <- lm(sales ~ TV, data = ad_sales)
summary(model)
## 
## Call:
## lm(formula = sales ~ TV, data = ad_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

After running a regression analysis for sales and TV, I found that the intercept is 7.03. This means that 7.03 is the predicted value of Y(Sales), when X(TV) = 0. The slope is 0.04. For every unit increase in x, y increases by .04.

References

Bivariate Analysis Definition & Example https://www.statisticshowto.com/bivariate-analysis/#:~:text=Bivariate%20analysis%20means%20the%20analysis,the%20variables%20X%20and%20Y.

https://www.sciencedirect.com/topics/mathematics/bivariate-data

https://statisticsbyjim.com/graphs/scatterplots/