Prof. Lianxi Zhou
Jeffery J. Unrau (7307572)
Dustin Pinch (6754550)
Rob Mantini (6282024)
install.packages("readr")
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
Car1<-read_csv("c:/Users/dwp13/Desktop/School/MKTG 3P98/Car_Survey_1.csv")
str(Car1)
head(Car1,n=10)
Car2<-read_csv("c:/Users/dwp13/Desktop/School/MKTG 3P98/Car_Survey_2.csv")
str(Car2)
head(Car2,n=10)
names(Car2)[1]<-c("Resp")
head(Car2,n=1)
Car_Total<-merge(Car1,Car2,By="Resp")
str(Car_Total)
summary(Car_Total)
Car_Total<-Car_Total %>% mutate(across(where(is.numeric), ~replace(.,is.na(.), mean(.,na.rm = TRUE))))
na_Car_Total<-summarise_all(Car_Total, ~sum(is.na(.)))
print(na_Car_Total)
# Pie Chart for Global Market Share
slices <- c(360,208,210,271)
lbls <- c("American", "Asian", "European", "Middle Eastern")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls,"%",sep="") # add % to labels
pie(slices, labels = lbls, main="Market Share by Region")
# Bar chart for Global Model Count
ggplot(Car_Total, aes(x = Model)) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = ..count..),vjust = -.5 ) +
labs(title = "Count of Car Models in Global Market",
x = "Car Model",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Most Popular Brands
Toyota (27.8%)
Ford (19.2%)
Chrysler (16.1%)
Most Popular Model: Chrysler Jeep (16.1%)
# Filter data for the Asian region
asian_car_data <- subset(Car_Total, Region == "Asian")
# Bar chart
ggplot(asian_car_data, aes(x = Model)) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = ..count..),vjust = -.5 ) +
labs(title = "Count of Car Models in Asian Market",
x = "Car Model",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Most Popular Brand: Toyota (49%)
Most Popular Model: Toyota Highlander & Corolla (24.5%)
# Filter data for the American region
american_car_data <- subset(Car_Total, Region == "American")
# Bar chart
ggplot(american_car_data, aes(x = Model)) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = ..count..),vjust = -.5 ) +
labs(title = "Count of Car Models in American Market",
x = "Car Model",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Most Popular Brand: Toyota (28%)
Most Popular Model: Chrysler Jeep (15%)
# Filter data for the European region
european_car_data <- subset(Car_Total, Region == "European")
# Bar chart
ggplot(european_car_data, aes(x = Model)) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = ..count..),vjust = -.5 ) +
labs(title = "Count of Car Models in European Market",
x = "Car Model",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Most Popular Brand: Toyota (38%)
Most Popular Model: Toyota Highlander (23%)
# Filter data for the Middle Eastern region
middleEastern_car_data <- subset(Car_Total, Region == "Middle Eastern")
# Bar chart
ggplot(middleEastern_car_data, aes(x = Model)) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = ..count..),vjust = -.5 ) +
labs(title = "Count of Car Models in Middle Eastern Market",
x = "Car Model",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Most Popular Brand: Chrysler (28%)
Most Popular Model: Chrysler Jeep (28%)
#-variable comparison 1: sum C_cost and H_cost to find the Total_Cost_PerMile_Hw_Cty. Find the totals for each model.compare the
#...total costs to Att_2. reason: attitude 2 measures if the respondent believe the car was a good idea to purchase. Its likely that costs
#...can impact the respondents perception of if the car was good to purchase.
#finding the correlation between attitude_2 and total cost in cents per mile city and highway. Assume respondents drive city and hwy equally on average.
city_cost <- 0.50
hwy_cost <-0.50
Car_Total$Total_milage_cost <- (Car_Total$C_cost*city_cost) + (Car_Total$H_Cost*hwy_cost)
head(Car_Total$Total_milage_cost, n=10)
Car_Total$Cost_to_attitude_correlation <- cor(Car_Total$Att_2, Car_Total$Total_milage_cost, use = "complete.obs")
print(Car_Total$Cost_to_attitude_correlation)
ggplot(data=Car_Total, mapping = aes(x=Total_milage_cost, y=Att_2))+ geom_point()+geom_smooth(method = "lm")
There is not a strong correlation between the total milage cost from city and highway added together (using a weighted average of equal proportions 0.50 and 0.50 for each city and highway.)
The new variable was compared to Att_2 to understand if cost milage has an impact on the Att_2.
This relationship suggests there is not a strong relationship between milage cost and attitude toward the vehicle. Recipients may not be as sensitive when it comes to milage cost and their belief if the car was a good idea or purchase.
#would respondents recommend their car model to others? Measured WOM_1 but the new categorical version CatWOM_1. This measured if consumers would recommend their vehicle to someone else. A stacked bar graph was created based on a low, medium, high levels.
ggplot(Car_Survey_Breakdown, aes(x=Model, fill = CatWOM_1))+geom_bar(position = "stack")+scale_fill_manual(values = c("low" = "red",
"medium"="yellow","high"="green"))+labs(x="Models", y="Rating",fill="CatWOM_1")+theme_minimal()
A large portion of our buyers would represent each model to others. This is a positive result, considering many other brands don’t have the same level of positive response.
It is likely that our brand has a positive image.
It is likely that people see our brand as a go to brand for a car if someone is in need. Considering this is WOM responses.
#Compares models based on the gender of driver. Potential displays which cars each gender tends to prefer.
library(ggplot2)
ggplot(Car_Total, aes(x=Model, fill=Gender)) + geom_bar(position="dodge") +scale_fill_manual(values=c("Male"="lightblue", "Female"="pink","NA" = "black"))
Based on the graph, most of the representatives that own the Toyota models are female. Almost double the number of females to males based on the graph.
When comparing large vehicles and SUVs, all companies except ford are predominately female.
For marketing campaigns, it would be ideal to target women more than men.
#What is the education level of our customers?
Car_Survey_Breakdown <- Car_Survey_Breakdown %>% mutate(Education = case_when(Education == 1 ~ "Diploma", Education == 2 ~ "Bachelors", Education == 3 ~ "Masters"))
ggplot(Car_Survey_Breakdown, aes(x=Model, fill=Education))+geom_bar(position="stack") +scale_fill_manual(values=c("Diploma"="lightblue", "Bachelors"="blue","Masters" = "navy"))
+labs(x="Education Level", y="Count", fill="Education")+theme_minimal()
A large portion of the represenatives have a bachelors degree.
The Highlander has a much more educated demographic.
#what is the age range of the market?
ggplot(Car_Survey_Breakdown, aes(x=Model, fill=CatAge))+geom_bar(position="stack") +scale_fill_manual(values=c("Z"="pink", "M"="red","X" = "lightblue", "B"="blue"))
+labs(x="Model", y="Count", fill="CatAge")+theme_minimal()
#Z = Gen-Z, M = Millenials, X = Gen-X, B = Baby Boomers
# Create a new column "Efficiency" based on MPG
Car_Total <- Car_Total %>%
mutate(Efficiency = case_when(
MPG <= 20 ~ "Low",
MPG > 20 & MPG <= 30 ~ "Moderate",
MPG > 30 ~ "High",
TRUE ~ NA_character_
))
# Create a new column "Performance_Score"
Car_Total <- Car_Total %>%
mutate(Performance_Score = (MPG / max(MPG, na.rm = TRUE)) * 100 - (Cyl / max(Cyl, na.rm = TRUE)) * 50)
# Plot Performance Score vs. MPG
plot(Car_Total$MPG, Car_Total$Performance_Score, main = "Performance Score vs. MPG", xlab = "MPG", ylab = "Performance Score")
Ultimately, the linear relationship between MPG and Performance Score suggests that manufacturers are producing high-performance vehicles that not only offer superior engine power but also achieve better fuel economy, reflecting advancements in automotive engineering and technology.
# Convert Education to a factor
Car_Total$Education <- factor(Car_Total$Education, levels = c(1, 2, 3), labels = c("High School", "Bachelors", "Masters"))
# Plot Performance Score vs. Education
plot(Car_Total$Education, Car_Total$Performance_Score, main = "Performance Score vs. Education", xlab = "Education", ylab = "Performance Score")
# Create a ggplot showing Education vs. MPG
ggplot(Car_Total, aes(x = Education, y = MPG)) +
geom_boxplot() +
labs(title = "MPG by Education Level", x = "Education", y = "MPG") +
theme_minimal()
A linear relationship between MPG and Performance Score where an increase in Performance Score correlates with an increase in MPG, suggests that higher performance, as measured by the Performance Score, is associated with better fuel efficiency, as indicated by higher MPG values.
This relationship implies that certain cars with higher Performance Scores tend to achieve better fuel efficiency compared to those with lower Performance Scores. This could be due to several factors: