Objective

The objective of this tutorial is to explain how bivariate analysis works.This analysis can be used by marketers to make decisions about their pricing strategies, advertising strategies, and promotion strategies among others.

Bivariate analysis is one of the simplest forms of statistical analysis. It is generally used to find out if there is a relationship between two sets of values (or two variables). That said, it usually involves the variables X and Y (statisticshowto.com).

https://www.qualtrics.com/experience-management/research/research-design/

Question 1A: What is the percentage of marketers who use consumer research to drive decisions? How can we bridge the existing gap when just such a small portion of marketers utilize consumer research to inform their decision-making?

Less than 40% of marketers use consumer research to drive decisions. Bridging the gap requires marketers and businesses to identify which research methods are applicable to the decisions they want to make and the problems they want to solve. A better overall strategy in research design and approaches will lead marketers on the right path.

Dataset - We will be using two online datasets available in R for this tutorial

plot(y3 ~ x3, data = anscombe, pch = 16)
abline(lm(y3 ~ x3, anscombe), col = "grey20")

fit <- lm(y3 ~ x3, anscombe)
summary(fit)
## 
## Call:
## lm(formula = y3 ~ x3, data = anscombe)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1586 -0.6146 -0.2303  0.1540  3.2411 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0025     1.1245   2.670  0.02562 * 
## x3            0.4997     0.1179   4.239  0.00218 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared:  0.6663, Adjusted R-squared:  0.6292 
## F-statistic: 17.97 on 1 and 9 DF,  p-value: 0.002176

Question 1B: Is there a relationship between x and y? If so, what does the relationship look like?

The relationship shows a simple linear regression between x and y. Y3 is the respondent variable while X3 is the predictive variable.

library(readr)
library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ TV, data = ad_sales)

fit <- lm(sales ~ TV, data = ad_sales)
summary(fit)
## 
## Call:
## lm(formula = sales ~ TV, data = ad_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Question 2: Is there a relationship between TV advertising and Sales? If so, what does the relationship look like?

The plot shows a positive and statistically significant linear correlation between TV advertising (predictor) and subsequent sales (respondent).

library(readr)
library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ radio, data = ad_sales)

fit <- lm(sales ~ radio, data = ad_sales)
summary(fit)
## 
## Call:
## lm(formula = sales ~ radio, data = ad_sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7305  -2.1324   0.7707   2.7775   8.1810 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.31164    0.56290  16.542   <2e-16 ***
## radio        0.20250    0.02041   9.921   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.275 on 198 degrees of freedom
## Multiple R-squared:  0.332,  Adjusted R-squared:  0.3287 
## F-statistic: 98.42 on 1 and 198 DF,  p-value: < 2.2e-16

Question 3: Can you plot the relationship between radio advertising and Sales? If so, what does the relationship look like?

The data shows a positive correlation; however, radio advertising is not as statistically significant as TV advertising to sales.

Question 4A: Three things you learned from this tutorial

One of the things I learned from this tutorial is how R Markdown can pull data from other sources to be used within R studio. Another thing I learned is how to make plots within R Markdown and interpreting the syntax for each chunks of code. Finally, I am actively understanding how to better communicate the research analysis.

Question 4B: Are there any other kinds of exploratory analysis you can perform using R? If so, please include your analysis and results in your final report.

Provided below is a histogram of radio advertising and its frequency distribution.

library(readr)
library(readr)
library(ggplot2)
ggplot(ad_sales, aes(x=radio)) +
  geom_histogram(binwidth=10, fill="grey20")

Question 5: Visit the avocado dataset available in week 2, raise one question you could possibly answer with the data, and explain how a stakeholder could benefit from your proposed analysis.

In avocado.csv, there are seven variables, some qualitative and others quantitative. As the average_price and type of avocado are variables, I could evaluate and answer to the proposed relationship between the average price of conventional avocados to organic avocados. The proposed analysis could help a stakeholder understand the pricing strategy of the avocados.

References

Bivariate Analysis Definition & Example https://www.statisticshowto.com/bivariate-analysis/#:~:text=Bivariate%20analysis%20means%20the%20analysis,the%20variables%20X%20and%20Y.

https://www.sciencedirect.com/topics/mathematics/bivariate-data