This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
display <- read.csv("customer_segmentation.csv")
# plot the data to see what it looks like
plot(display$Age, display$Energy_Drink_Frequency)
ggplot(display, aes(x = Age,
y = Energy_Drink_Frequency)) +
geom_point() +
geom_smooth(method = "lm") # add trendline using linear model
## `geom_smooth()` using formula = 'y ~ x'
lm_mod1 <- lm(Energy_Drink_Frequency ~ Age, data = display)# look at our model with summary function
summary(lm_mod1)
##
## Call:
## lm(formula = Energy_Drink_Frequency ~ Age, data = display)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6361 -0.6337 -0.3020 0.6980 2.6980
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.9703 0.7403 4.012 0.000745 ***
## Age -0.3342 0.2789 -1.198 0.245526
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.223 on 19 degrees of freedom
## Multiple R-squared: 0.07027, Adjusted R-squared: 0.02133
## F-statistic: 1.436 on 1 and 19 DF, p-value: 0.2455
lm_mod2 <- lm(Energy_Drink_Frequency ~ Age + Coffee_Frequency, data = display)# look at our model with summary function
summary(lm_mod2)
##
## Call:
## lm(formula = Energy_Drink_Frequency ~ Age + Coffee_Frequency,
## data = display)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4541 -0.8020 -0.3341 0.5738 2.3900
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6383 0.8398 3.141 0.00564 **
## Age -0.4040 0.2924 -1.382 0.18391
## Coffee_Frequency 0.1560 0.1817 0.858 0.40201
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.232 on 18 degrees of freedom
## Multiple R-squared: 0.1068, Adjusted R-squared: 0.00758
## F-statistic: 1.076 on 2 and 18 DF, p-value: 0.3618
Ref: Zhenning “Jimmy”Xu, Marketing Research using R. https://bookdown.org/utjimmyx/marketing_research/basic-regression-analysis.html
Linear Regression.An Introduction to Statistical Learning, with Applications in R. By Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf.