Categorical
Variable Vs Categorical Variable
plot8 <- ggplot(df, aes(Partner, fill = Churn)) +
geom_bar() +
labs(title = "Partner Vs Churn",
x = "Does the Customer have a Partner(Yes or No)?",
y = "Count")
plot9 <- ggplot(df, aes(Dependents, fill = Churn)) +
geom_bar(position = 'fill') +
labs(title = "Dependents Status Vs Churn",
x = "Does the Customer have Dependents (Yes or No)?",
y = "Count")
plot10 <-ggplot(df, aes(gender, fill = Churn)) +
geom_bar() +
labs(title = "Gender Vs Churn",
x = "Gender",
y = "Count")
plot8

The number of churned customers are high in the customer segment who
do not have partners.

The number of churned customers are high in the customer segment who
do not have dependents.

Churn rates are pretty much similar to both gernders.
#grid.arrange(plot8, plot9, plot10, ncol=3)
plot11 <- ggplot(df, aes(PaperlessBilling, fill = Churn)) +
geom_bar() +
labs(title = "Paperless Billing Status",
x = "Does the Customer Use Paperless Billing?",
y = "Count")
plot12 <- ggplot(df, aes(PaymentMethod, fill = Churn)) +
geom_bar() +
labs(title = "Payment Method",
x = "What Payment Method does the Customer Use?",
y = "Count") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
plot11

In overall, More than 4000 customers have subscribed for paperless
billing services and also the higher count of churned customers are also
identified in this category.

#grid.arrange(plot11, plot12, ncol=2)
Most people tend to use electronic check method as their payment
method. The percentages are approximately similar for other three
categories. The highest churn count is also observed in the customers
who use electronic checking.
ggplot(df, aes(SeniorCitizen, fill = PaymentMethod)) +
geom_bar(position = 'fill') +
labs(title = "Payment Method of Seniors Citizen",
x = "Is the Customer a Senior Citizen?",
y = "Fraction")

- From the stack bar chart, it can be seen that more senior citizens
tend to use electronic check method than the not senior citizen
persons.
- Mailed check method is used by non-senior citizen customers than the
senior customers. Other two methods are approximately equally popular
between the two categories.
Categorical
Variable Vs Continuous Variable
#Monthly Charge vs Churn
plot13 <- ggplot(data = df, aes(MonthlyCharges, color = Churn))+
geom_freqpoly(binwidth = 5, size = 1)+
labs(title = "Churn Count vs MonthlyCharges",
x = "Monthly Charges",
y = "Churn Count")
#Total Charge vs Churn
plot14 <-ggplot(data = df, aes(TotalCharges, color = Churn))+
geom_freqpoly(binwidth = 200, size = 1)+
labs(title = "Churn Count vs TotalCharges",
x = "Total Charges",
y = "Churn Count")
#tenure vs churn
plot15 <-ggplot(data = df, aes(tenure, colour = Churn))+
geom_freqpoly(binwidth = 5, size = 1)+
labs(title = "Churn Count vs tenure",
x = "tenure",
y = "Churn Count")
plot13

From the monthly charges univariate histogram also, it was observed
that the highest frequency of customers belonged to below 25 category.
The same pattern is here also observed for the customers who are staying
on the network. But the pattern is different for the customers who have
churned from the network. More customers who have churned, have used
between monthly charges between 75-100.

The total charge two-line charts follow the same pattern for both
churned and current customers. Although the count of churned customers
is low in the dataset, still the count who use below about 250, are same
for both categories.

#grid.arrange(plot13, plot14, plot15, ncol=3)
ggplot(df, aes(x=gender, y=TotalCharges)) +
geom_boxplot() +
labs(title = "Gender vs Total Charges",
x = "Gender",
y = "Total Charge($)")

The distributions of the total charges are similar for both males and
females. The median total charge is about 1350 while 25% of the
customers use more than total charge of 3750 for both categories.
ggplot(df, aes(x=gender, y=MonthlyCharges)) + geom_boxplot() +
labs(title = "Gender vs Monthly Charges",
x = "Gender",
y = "Monthly Charge($)")

The distributions of the monthly charges are similar for both males
and females. The median monthly charge is about 71 while 25% of the
customers use more than total charge about 90 in both categories
Continuous Variable
Vs Continuous Variable
#scatter plots
ggplot(df, aes(x=MonthlyCharges, y=TotalCharges)) +
geom_point(size=1)+
labs(title = "MonthlyCharges vs TotalCharges",
x = "MonthlyCharges",
y = "TotalCharges")

As the monthly charge increase, the total charge also increases which
is a fact. A positive moderate strong relationship can be identified in
the two factors considered.
ggplot(df, aes(x=tenure, y=TotalCharges)) +
geom_point(size=1)+
labs(title = "TotalCharges vs tenure",
x = "tenure",
y = "TotalCharges")

As the tenure increase, the total charge also increases according to
the scatterplot. A positive moderate strong relationship can be
identified in the two factors considered.
ggplot(df, aes(x=tenure, y=MonthlyCharges)) + geom_point(size=1)+
labs(title = "MonthlyCharges vs tenure",
x = "tenure",
y = "MonthlyCharges")

There is no relationship can be identified between the monthly charge
and the tenure from the scatterplot.