14/12/2025Capstone Project: Analyzing Customer Behavior and Revenue Trends in a Telecom Company
Project Overview
This project focuses on understanding customer behavior and revenue trends using descriptive statistics, hypothesis testing, and linear regression. Students will analyze telecom company data to identify factors influencing customer spending, contract types, and service usage patterns
Project Objectives
Data Understanding: Explore the dataset and define key business questions.
Data Cleaning & Preparation: Handle missing values and transform data for analysis.
Descriptive Statistics & Exploratory Data Analysis (EDA): Summarize the dataset using statistical measures.
Hypothesis Testing: Test key assumptions about customer behavior and revenue.
Regression Analysis: Build and interpret linear regression models to explain customer spending trends.
Insights & Recommendations: Provide data-driven business recommendations*
Dataset
Use the Telco Customer Churn dataset (available on Kaggle) or generate a synthetic dataset with the following columns:
CustomerID – Unique customer identifier
Gender – Male/Female
SeniorCitizen – 1 for senior citizens, 0 otherwise
Partner – Whether the customer has a partner (Yes/No)
Dependents – Whether the customer has dependents (Yes/No)
Tenure – Number of months the customer has been with the company
Contract – Type of contract (Month-to-month, One-year, Two-year)
PaymentMethod – Payment method (e.g., Electronic check, Credit card)
MonthlyCharges – Monthly bill amount ($)
TotalCharges – Total amount billed to the customer ($)
#Project Workflow & Deliverables……………………………. Understanding the Business Context & Data Define key questions: Do contract types influence customer spending? Are senior citizens more likely to spend less? What factors impact total customer spending?*
1. Load the dataset and inspect the structure.
setwd("~/R training")
Telco_data<-read.csv("Telco_data.csv")
head(Telco_data)
customerID gender SeniorCitizen Partner Dependents tenure PhoneService
1 7590-VHVEG Female 0 Yes No 1 No
2 5575-GNVDE Male 0 No No 34 Yes
3 3668-QPYBK Male 0 No No 2 Yes
4 7795-CFOCW Male 0 No No 45 No
5 9237-HQITU Female 0 No No 2 Yes
6 9305-CDSKC Female 0 No No 8 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
1 No phone service DSL No Yes No
2 No DSL Yes No Yes
3 No DSL Yes Yes No
4 No phone service DSL Yes No Yes
5 No Fiber optic No No No
6 Yes Fiber optic No No Yes
TechSupport StreamingTV StreamingMovies Contract PaperlessBilling
1 No No No Month-to-month Yes
2 No No No One year No
3 No No No Month-to-month Yes
4 Yes No No One year No
5 No No No Month-to-month Yes
6 No Yes Yes Month-to-month Yes
PaymentMethod MonthlyCharges TotalCharges Churn
1 Electronic check 29.85 29.85 No
2 Mailed check 56.95 1889.50 No
3 Mailed check 53.85 108.15 Yes
4 Bank transfer (automatic) 42.30 1840.75 No
5 Electronic check 70.70 151.65 Yes
6 Electronic check 99.65 820.50 Yes
str(Telco_data)
'data.frame': 7043 obs. of 21 variables:
$ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
$ gender : chr "Female" "Male" "Male" "Male" ...
$ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
$ Partner : chr "Yes" "No" "No" "No" ...
$ Dependents : chr "No" "No" "No" "No" ...
$ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
$ PhoneService : chr "No" "Yes" "Yes" "No" ...
$ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
$ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
$ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
$ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
$ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
$ TechSupport : chr "No" "No" "No" "Yes" ...
$ StreamingTV : chr "No" "No" "No" "No" ...
$ StreamingMovies : chr "No" "No" "No" "No" ...
$ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
$ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
$ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
$ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
$ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
$ Churn : chr "No" "No" "Yes" "No" ...
2. Data Cleaning & Preparation…………………………
Finding count of missing values in TotalCharges from the Telco data
missing_values<-sum(is.na(Telco_data$TotalCharges)) # sum of missing values
missing_values
[1] 11
View(missing_values)#view the missing values
Loading the Mice package and library for missing values handling
library(mice)
library(tidyverse)
md.pattern(Telco_data) #View the distribution of missing data
customerID gender SeniorCitizen Partner Dependents tenure PhoneService
7032 1 1 1 1 1 1 1
11 1 1 1 1 1 1 1
0 0 0 0 0 0 0
MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
7032 1 1 1 1 1
11 1 1 1 1 1
0 0 0 0 0
TechSupport StreamingTV StreamingMovies Contract PaperlessBilling
7032 1 1 1 1 1
11 1 1 1 1 1
0 0 0 0 0
PaymentMethod MonthlyCharges Churn TotalCharges
7032 1 1 1 1 0
11 1 1 1 0 1
0 0 0 11 11
Loading the VIM package for visualizationof missing data
library(VIM)
missing_values_plot<-aggr(Telco_data, col=c("red"),
numbers=TRUE,
sortvars=TRUE,
labels=names(Telco_data),
cex.axis=.7,
gap=3,
ylab=c("Histogram of missing data", "Pattern"))#Display pattern and Ditribution of missing values in the telco data
Selecting data imputaion method to use from the mice library
methods(mice) #checking for methods to use in imputation of missing values. In this case i am going to use the method of random forests
[1] mice.impute.2l.bin mice.impute.2l.lmer
[3] mice.impute.2l.norm mice.impute.2l.pan
[5] mice.impute.2lonly.mean mice.impute.2lonly.norm
[7] mice.impute.2lonly.pmm mice.impute.cart
[9] mice.impute.jomoImpute mice.impute.lasso.logreg
[11] mice.impute.lasso.norm mice.impute.lasso.select.logreg
[13] mice.impute.lasso.select.norm mice.impute.lda
[15] mice.impute.logreg mice.impute.logreg.boot
[17] mice.impute.mean mice.impute.midastouch
[19] mice.impute.mnar.logreg mice.impute.mnar.norm
[21] mice.impute.mpmm mice.impute.norm
[23] mice.impute.norm.boot mice.impute.norm.nob
[25] mice.impute.norm.predict mice.impute.panImpute
[27] mice.impute.passive mice.impute.pmm
[29] mice.impute.polr mice.impute.polyreg
[31] mice.impute.quadratic mice.impute.rf
[33] mice.impute.ri mice.impute.sample
[35] mice.mids mice.theme
see '?methods' for accessing help and source code
imputed_data<-mice(Telco_data, m=5, method='rf')
iter imp variable
1 1 TotalCharges
1 2 TotalCharges
1 3 TotalCharges
1 4 TotalCharges
1 5 TotalCharges
2 1 TotalCharges
2 2 TotalCharges
2 3 TotalCharges
2 4 TotalCharges
2 5 TotalCharges
3 1 TotalCharges
3 2 TotalCharges
3 3 TotalCharges
3 4 TotalCharges
3 5 TotalCharges
4 1 TotalCharges
4 2 TotalCharges
4 3 TotalCharges
4 4 TotalCharges
4 5 TotalCharges
5 1 TotalCharges
5 2 TotalCharges
5 3 TotalCharges
5 4 TotalCharges
5 5 TotalCharges
summary(imputed_data)
Class: mids
Number of multiple imputations: 5
Imputation methods:
customerID gender SeniorCitizen Partner
"" "" "" ""
Dependents tenure PhoneService MultipleLines
"" "" "" ""
InternetService OnlineSecurity OnlineBackup DeviceProtection
"" "" "" ""
TechSupport StreamingTV StreamingMovies Contract
"" "" "" ""
PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
"" "" "" "rf"
Churn
""
PredictorMatrix:
customerID gender SeniorCitizen Partner Dependents tenure
customerID 0 0 1 0 0 1
gender 0 0 1 0 0 1
SeniorCitizen 0 0 0 0 0 1
Partner 0 0 1 0 0 1
Dependents 0 0 1 0 0 1
tenure 0 0 1 0 0 0
PhoneService MultipleLines InternetService OnlineSecurity
customerID 0 0 0 0
gender 0 0 0 0
SeniorCitizen 0 0 0 0
Partner 0 0 0 0
Dependents 0 0 0 0
tenure 0 0 0 0
OnlineBackup DeviceProtection TechSupport StreamingTV
customerID 0 0 0 0
gender 0 0 0 0
SeniorCitizen 0 0 0 0
Partner 0 0 0 0
Dependents 0 0 0 0
tenure 0 0 0 0
StreamingMovies Contract PaperlessBilling PaymentMethod
customerID 0 0 0 0
gender 0 0 0 0
SeniorCitizen 0 0 0 0
Partner 0 0 0 0
Dependents 0 0 0 0
tenure 0 0 0 0
MonthlyCharges TotalCharges Churn
customerID 1 1 0
gender 1 1 0
SeniorCitizen 1 1 0
Partner 1 1 0
Dependents 1 1 0
tenure 1 1 0
Number of logged events: 17
it im dep meth out
1 0 0 constant customerID
2 0 0 constant gender
3 0 0 constant Partner
4 0 0 constant Dependents
5 0 0 constant PhoneService
6 0 0 constant MultipleLines
imputed_data$imp$TotalCharges
1 2 3 4 5
489 595.05 399.45 1777.60 1924.10 554.05
754 19.90 1893.50 393.45 542.40 70.75
937 1240.25 6241.35 3205.60 20.25 1879.25
1083 70.45 19.45 70.80 226.95 369.05
1341 169.45 2773.90 1028.90 3505.10 4719.75
3332 4055.50 535.05 1433.80 289.30 168.65
3827 19.20 193.80 1734.50 94.50 2444.25
4381 20.05 7348.80 295.65 201.10 330.80
5219 1240.15 3231.05 696.80 332.65 318.90
6671 50.15 2391.15 4855.35 1060.60 112.75
6755 45.55 918.70 993.15 270.70 318.90
finished_imputed_data<-complete(imputed_data,1)
head(finished_imputed_data)
customerID gender SeniorCitizen Partner Dependents tenure PhoneService
1 7590-VHVEG Female 0 Yes No 1 No
2 5575-GNVDE Male 0 No No 34 Yes
3 3668-QPYBK Male 0 No No 2 Yes
4 7795-CFOCW Male 0 No No 45 No
5 9237-HQITU Female 0 No No 2 Yes
6 9305-CDSKC Female 0 No No 8 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
1 No phone service DSL No Yes No
2 No DSL Yes No Yes
3 No DSL Yes Yes No
4 No phone service DSL Yes No Yes
5 No Fiber optic No No No
6 Yes Fiber optic No No Yes
TechSupport StreamingTV StreamingMovies Contract PaperlessBilling
1 No No No Month-to-month Yes
2 No No No One year No
3 No No No Month-to-month Yes
4 Yes No No One year No
5 No No No Month-to-month Yes
6 No Yes Yes Month-to-month Yes
PaymentMethod MonthlyCharges TotalCharges Churn
1 Electronic check 29.85 29.85 No
2 Mailed check 56.95 1889.50 No
3 Mailed check 53.85 108.15 Yes
4 Bank transfer (automatic) 42.30 1840.75 No
5 Electronic check 70.70 151.65 Yes
6 Electronic check 99.65 820.50 Yes
sapply(finished_imputed_data, function(x)sum(is.na(x))) # Checking for each column for missing values and we are good to go
customerID gender SeniorCitizen Partner
0 0 0 0
Dependents tenure PhoneService MultipleLines
0 0 0 0
InternetService OnlineSecurity OnlineBackup DeviceProtection
0 0 0 0
TechSupport StreamingTV StreamingMovies Contract
0 0 0 0
PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
0 0 0 0
Churn
0
sum(is.na(finished_imputed_data)) #checking again if missing values are present
[1] 0
Convert categorical variables (e.g., Contract, Gender) into numerical
str(finished_imputed_data$gender, finished_imputed_data$Contract) #View the structure of the data
chr [1:7043] "Female" "Male" "Male" "Male" "Female" "Female" "Male" ...
finished_imputed_data1 <- finished_imputed_data %>%
mutate(
gender = as.numeric(factor(gender)),
Contract = as.numeric(factor(Contract))
)
str(finished_imputed_data1$Contract)
num [1:7043] 1 2 1 2 1 1 1 1 1 2 ...
str(finished_imputed_data1$gender) #View the structure of the data ie gender and contract
num [1:7043] 1 2 2 2 1 1 2 1 1 2 ...
Checking for outliers in the data in MonthlyCharges and TotalCharges
boxplot(finished_imputed_data$MonthlyCharges, col='red', main = "income Distribution")
*checking for outliers using a plot of the boxplot
boxplot(finished_imputed_data$TotalCharges, col='blue', main = "income Distribution")
3. Exploratory Data Analysis (EDA)………………………
Visualize customer distribution across different Contract types.
glimpse(finished_imputed_data$Contract)
chr [1:7043] "Month-to-month" "One year" "Month-to-month" "One year" ...
View(finished_imputed_data)
contract_counts<-table(finished_imputed_data$Contract)
contract_counts
Month-to-month One year Two year
3875 1473 1695
Barplot
barplot(contract_counts,
col = "skyblue",
main = "Customer Distribution by Contract Type",
ylab = "Number of Customers",
xlab = "Contract Type")
Piechart
contract_counts <- table(finished_imputed_data$Contract)
pct <- round(100 * contract_counts / sum(contract_counts), 1)
pie_labels <- paste(names(contract_counts), "-", pct, "%")
pie(contract_counts,
labels = pie_labels,
col = c("tomato", "lightgreen", "lightblue"),
main = "Contract Type Distribution")
Using ggplot2 to plot a bar plot
library(ggplot2)
ggplot(finished_imputed_data, aes(x = Contract, fill = Contract)) +
geom_bar() +
labs(title = "Customer Distribution by Contract Type",
x = "Contract Type", y = "Number of Customers") +
theme_minimal() +
scale_fill_brewer(palette = "Set2")
The same for horizontal bar
ggplot(finished_imputed_data, aes(x = Contract, fill = Contract)) +
geom_bar() +
coord_flip() +
labs(title = "Customer Distribution by Contract Type (Horizontal)",
x = "Contract Type", y = "Number of Customers") +
theme_classic()
Compare MonthlyCharges and TotalCharges for different customer groups. Comparison by Gender MonthlyCharges vs Gender
ggplot(finished_imputed_data, aes(x = gender, y = MonthlyCharges, fill = gender)) +
geom_boxplot() +
labs(title = "Monthly Charges by Gender", y = "Monthly Charges") +
theme_minimal()
TotalCharges vs Gender
ggplot(finished_imputed_data, aes(x = gender, y = TotalCharges, fill = gender)) +
geom_boxplot() +
labs(title = "Total Charges by Gender", y = "Total Charges") +
theme_minimal()
Comparison by InternetService MonthlyCharges vs InternetService
ggplot(finished_imputed_data, aes(x = InternetService, y = MonthlyCharges, fill = InternetService)) +
geom_boxplot() +
labs(title = "Monthly Charges by Internet Service", y = "Monthly Charges") +
theme_minimal()
TotalCharges vs InternetService
ggplot(finished_imputed_data, aes(x = InternetService, y = TotalCharges, fill = InternetService)) +
geom_boxplot() +
labs(title = "Total Charges by Internet Service", y = "Total Charges") +
theme_minimal()
Comparison by MultipleLines MonthlyCharges vs MultipleLines
ggplot(finished_imputed_data, aes(x = MultipleLines, y = MonthlyCharges, fill = MultipleLines)) +
geom_boxplot() +
labs(title = "Monthly Charges by Multiple Lines", y = "Monthly Charges") +
theme_minimal()
TotalCharges vs MultipleLines
ggplot(finished_imputed_data, aes(x = MultipleLines, y = TotalCharges, fill = MultipleLines)) +
geom_boxplot() +
labs(title = "Total Charges by Multiple Lines", y = "Total Charges") +
theme_minimal()
Check correlations between numerical variables
names(finished_imputed_data)
[1] "customerID" "gender" "SeniorCitizen" "Partner"
[5] "Dependents" "tenure" "PhoneService" "MultipleLines"
[9] "InternetService" "OnlineSecurity" "OnlineBackup" "DeviceProtection"
[13] "TechSupport" "StreamingTV" "StreamingMovies" "Contract"
[17] "PaperlessBilling" "PaymentMethod" "MonthlyCharges" "TotalCharges"
[21] "Churn"
str(finished_imputed_data)
'data.frame': 7043 obs. of 21 variables:
$ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
$ gender : chr "Female" "Male" "Male" "Male" ...
$ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
$ Partner : chr "Yes" "No" "No" "No" ...
$ Dependents : chr "No" "No" "No" "No" ...
$ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
$ PhoneService : chr "No" "Yes" "Yes" "No" ...
$ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
$ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
$ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
$ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
$ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
$ TechSupport : chr "No" "No" "No" "Yes" ...
$ StreamingTV : chr "No" "No" "No" "No" ...
$ StreamingMovies : chr "No" "No" "No" "No" ...
$ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
$ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
$ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
$ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
$ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
$ Churn : chr "No" "No" "Yes" "No" ...
str(finished_imputed_data$TotalCharges) #checking again for the structure of Total charges
num [1:7043] 29.9 1889.5 108.2 1840.8 151.7 ...
numeric_vars <- finished_imputed_data[, c("tenure", "MonthlyCharges", "TotalCharges")] #select numerical variable columns from the data
Correlation matrix of the observations
cor_matrix <- cor(numeric_vars, use = "complete.obs") # handles NAs
print(cor_matrix) # Compute correlation matrix for the variables
tenure MonthlyCharges TotalCharges
tenure 1.0000000 0.2478999 0.8257214
MonthlyCharges 0.2478999 1.0000000 0.6508208
TotalCharges 0.8257214 0.6508208 1.0000000
#Insights from correlation matrix
#Tenure and TotalCharges are highly correlated (makes sense: longer stay → higher total)
#MonthlyCharges and TotalCharges are moderately correlated
#Tenure and MonthlyCharges have a weak correlation
Plot of the correlations matrix
library(corrplot)
corrplot(cor_matrix, method = "circle", type = "upper",
title = "Correlation Matrix", mar = c(0,0,1,0)) #Plot correlation matrix
Use boxplots and histograms to understand spending behavior. Histograms of spending
#for monthly charges
hist(finished_imputed_data$MonthlyCharges,
col = "skyblue",
main = "Distribution of Monthly Charges",
xlab = "Monthly Charges",
breaks = 30) #Histogram of mnothly spending
#for total charges
hist(finished_imputed_data$TotalCharges,
col = "lightgreen",
main = "Distribution of Total Charges",
xlab = "Total Charges",
breaks = 30) #Histogram of Total charges
Boxplots to Spot Spending Patterns (Outliers & Group Differences) Monthly Charges by Churn
boxplot(MonthlyCharges ~ Churn, data = finished_imputed_data,
col = c("lightblue", "tomato"),
main = "Monthly Charges by Churn Status",
xlab = "Churn", ylab = "Monthly Charges")
Monthly charges by internet service
boxplot(MonthlyCharges ~ InternetService, data = finished_imputed_data,
col = c("plum", "lightblue", "salmon"),
main = "Monthly Charges by Internet Service",
xlab = "Internet Service", ylab = "Monthly Charges")
##Total Charges by Contract type
boxplot(TotalCharges ~ Contract, data = finished_imputed_data,
col = c("lightyellow", "orange", "lightgreen"),
main = "Total Charges by Contract Type",
xlab = "Contract Type", ylab = "Total Charges")
4. Hypothesis Testing…………………………………. Perform hypothesis tests to validate business assumptions. Hypothesis 1: Do customers with long-term contracts have higher total spending?
Null Hypothesis (H₀): No difference in TotalCharges among contract types. Alternative Hypothesis (H₁): Customers with longer contracts have significantly higher TotalCharges.
finished_imputed_data$Contract <- as.factor(finished_imputed_data$Contract) #Change contract variable into factor
anova_model <- aov(TotalCharges ~ Contract, data = finished_imputed_data) ##We use one way ANOVA since has more than 2 categories
summary(anova_model) ##### We reject the null hypothesis.There is a statistically significant difference in TotalCharges across contract types. This supports the assumption that customers with long-term contracts tend to spend more over time.
Df Sum Sq Mean Sq F value Pr(>F)
Contract 2 7.520e+09 3.76e+09 923.9 <2e-16 ***
Residuals 7040 2.865e+10 4.07e+06
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_model) #To see which contract types differ
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = TotalCharges ~ Contract, data = finished_imputed_data)
$Contract
diff lwr upr p adj
One year-Month-to-month 1664.2102 1519.4561 1808.9644 0
Two year-Month-to-month 2341.3880 2203.6736 2479.1025 0
Two year-One year 677.1778 508.7248 845.6309 0
#Interpretations
#All pairwise comparisons (Month-to-month vs One year, Month-to-month vs Two year, One year vs Two year) show statistically significant differences in TotalCharges.
#Customers with longer-term contracts (One year, Two year) generally have higher total spending than those with a Month-to-month contract.
#The Two year contract has the highest spending, followed by the One year contract, and the Month-to-month contract has the lowest total charges.
Hypothesis 2: Is there a significant difference in spending between male and female customers?
Null Hypothesis (H₀): No difference in MonthlyCharges between male and female customers
Alternative Hypothesis (H₁): Males and females have different spending behaviors Test to use: Independent t-test
t_test_result <- t.test(MonthlyCharges ~ gender, data = finished_imputed_data)
t_test_result
Welch Two Sample t-test
data: MonthlyCharges by gender
t = 1.2227, df = 7038.9, p-value = 0.2215
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-0.5289426 2.2824640
sample estimates:
mean in group Female mean in group Male
65.20424 64.32748
#Since the p-value (0.2215) is greater than 0.05, we fail to reject the null hypothesis. This means that there is no statistically significant difference in MonthlyCharges between male and female customers in your dataset.
#The confidence interval further supports this conclusion, as it contains zero, suggesting that the true difference in means could be zero.
5. Regression Analysis………………………………….
Building a Linear Regression Model to Predict MonthlyCharges
Define MonthlyCharges as the dependent variable
Use Tenure, Contract, and Payment Method as independent variables
linear_model <- lm(MonthlyCharges ~ tenure + Contract + PaymentMethod, data = finished_imputed_data)
summary(linear_model)
Call:
lm(formula = MonthlyCharges ~ tenure + Contract + PaymentMethod,
data = finished_imputed_data)
Residuals:
Min 1Q Median 3Q Max
-72.154 -20.211 3.787 19.016 76.913
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.445739 0.894149 60.891 <2e-16 ***
tenure 0.551490 0.018054 30.547 <2e-16 ***
ContractOne year -11.885419 0.912707 -13.022 <2e-16 ***
ContractTwo year -22.775408 1.047304 -21.747 <2e-16 ***
PaymentMethodCredit card (automatic) 0.006174 0.935446 0.007 0.995
PaymentMethodElectronic check 11.288247 0.896915 12.586 <2e-16 ***
PaymentMethodMailed check -14.685846 0.973991 -15.078 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.89 on 7036 degrees of freedom
Multiple R-squared: 0.2605, Adjusted R-squared: 0.2599
F-statistic: 413.1 on 6 and 7036 DF, p-value: < 2.2e-16
Tenure has a positive and significant effect on MonthlyCharges. As tenure increases, customers tend to pay higher monthly charges.
Customers with longer contracts (one or two years) tend to spend less per month compared to month-to-month contract customers.
Payment method plays a significant role in spending behavior:
Electronic check users tend to spend more per month.
Mailed check users tend to spend less per month.
Credit card (automatic) users don’t show a significant difference in spending compared to other methods.
Model Explanation: The model explains about 26.05% of the variance in MonthlyCharges, which is a moderate level of explanatory power. More features could be added or other techniques (like regularization) could be explored to improve predictive accuracy.