R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Note: this analysis was performed using the open source software R and Rstudio.

install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

display <- read.csv("customer_segmentation.csv")


# plot the data to see what it looks like
plot(display$Age, display$Energy_Drink_Frequency)

ggplot(display, aes(x = Age,
                    y = Energy_Drink_Frequency)) +
  geom_point() +
  geom_smooth(method = "lm")   # add trendline using linear model
## `geom_smooth()` using formula = 'y ~ x'

Build a linear regression model with one predictor only

lm_mod1 <- lm(Energy_Drink_Frequency ~ Age, data = display)# look at our model with summary function
summary(lm_mod1)
## 
## Call:
## lm(formula = Energy_Drink_Frequency ~ Age, data = display)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6361 -0.6337 -0.3020  0.6980  2.6980 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.9703     0.7403   4.012 0.000745 ***
## Age          -0.3342     0.2789  -1.198 0.245526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.223 on 19 degrees of freedom
## Multiple R-squared:  0.07027,    Adjusted R-squared:  0.02133 
## F-statistic: 1.436 on 1 and 19 DF,  p-value: 0.2455

Build a multiple regression model including two predictors, Energy Drink Frequency and Coffee Frequency

lm_mod2 <- lm(Energy_Drink_Frequency ~ Age + Coffee_Frequency, data = display)# look at our model with summary function
summary(lm_mod2)
## 
## Call:
## lm(formula = Energy_Drink_Frequency ~ Age + Coffee_Frequency, 
##     data = display)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4541 -0.8020 -0.3341  0.5738  2.3900 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        2.6383     0.8398   3.141  0.00564 **
## Age               -0.4040     0.2924  -1.382  0.18391   
## Coffee_Frequency   0.1560     0.1817   0.858  0.40201   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.232 on 18 degrees of freedom
## Multiple R-squared:  0.1068, Adjusted R-squared:  0.00758 
## F-statistic: 1.076 on 2 and 18 DF,  p-value: 0.3618

Ref: Zhenning “Jimmy”Xu, Marketing Research using R. https://bookdown.org/utjimmyx/marketing_research/basic-regression-analysis.html

Linear Regression.An Introduction to Statistical Learning, with Applications in R. By Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf.