World’s Best-Selling Phone’s Sales

Outline


1-Executive Summary


2-Introduction

Context:

Explore our dataset of the Top 120 best-selling mobile phones, featuring detailed information on manufacturers, models, form factors, and release years. Track sales in millions to uncover key trends and shifts in mobile technology over time. This concise overview highlights the evolution of consumer preferences and the impact of major brands in the market.


Methodology

Our methodology involves collecting data on the Top 120 best-selling mobile phones, including details on manufacturers, models, form factors, release years, and units sold. We performed data wrangling to clean and organize this information. Exploratory Data Analysis (EDA) was conducted with visualizations to uncover patterns and trends. Finally, predictive analysis using regression models was applied to forecast future sales trends and assess the impact of various factors on phone sales.


A-Data Collection

This is a Flat file as Csv file contain dataset for the Top 120 best-selling mobile phones.

You can Download File from Here: https://www.kaggle.com/datasets/muhammadroshaanriaz/global-best-selling-phone-sales/data

Code to read data:

Show Code
# Use forward slashes in the file path
data <- read.csv("Your Path", header = TRUE, sep = ",")
 

B-Perform Data Wrangling

We preprocess the collected data to handle missing values, outliers, and inconsistencies. This step ensures that our data is clean, organized, and ready for analysis .


Library used:
Show Code
library(Hmisc)
library(openxlsx)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(here)
library(janitor)
library(skimr)
library(SimDesign)
library(readr)
library(RColorBrewer)  # For color palettes
library(gridExtra)
library(ggrepel)  # For repelling labels
library(RColorBrewer)  # For color palettes
library(htmlwidgets)
library(broom)


>Cleaning Data , Checking missing values , Rename columns and Check data STC:

Data is Already Cleaned and ready for Analysis

Show Code
# To see the data frame column type
str(data)
glimpse(my_data)

# Rename the column 'UnitsSold' to 'Unit Sold Per Million'
data <- data %>%
  rename("Unit Sold Per Million" = "UnitsSold")

# Calculate the number of missing values for each column
missing_values_per_column <- colSums(is.na(data))
print(missing_values_per_column)

C-perform-exploratory-data-analysis-visualization Using (Interactive Charts)

For exploratory data analysis (EDA), we began by visualizing key metrics to uncover underlying patterns and trends. This involved creating various plots to analyze the distribution of data and relationships between variables. Tools like ggplot2 in R were used to generate insightful visualizations that highlighted sales trends, market shifts, and form factor popularity. These visualizations helped us better understand the dataset and guided further analysis.

Note: Most of Those charts Are interactive you can Roll over mouse on the chart to get more details , You can filter the chart with specific element only just Click on Factor.

Show Full Code
# Aggregate the data by 'Manufacturer'
manufacturer_sales <- aggregate(data[["Unit Sold Per Million"]], 
                                by = list(Manufacturer = data[["Manufacturer"]]), 
                                sum)

# Rename columns of the aggregated data
colnames(manufacturer_sales) <- c("Manufacturer", "Unit Sold Per Million")

# Order the results by 'Unit Sold Per Million'
manufacturer_sales <- manufacturer_sales[order(-manufacturer_sales$`Unit Sold Per Million`), ]

# Print the results
print(manufacturer_sales)

# Plot the results
ggplot(manufacturer_sales, aes(x = reorder(Manufacturer, -`Unit Sold Per Million`), 
                               y = `Unit Sold Per Million`)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Total Units Sold by Manufacturer", 
       x = "Manufacturer", 
       y = "Units Sold (Million)")


# Calculate total units sold by manufacturer
manufacturer_sales <- data %>%
  group_by(Manufacturer) %>%
  summarise(Unit_Sold_Per_Million = sum(`Unit Sold Per Million`)) %>%
  arrange(desc(Unit_Sold_Per_Million))  # Sort in descending order

# Calculate percentages
total_units <- sum(manufacturer_sales$Unit_Sold_Per_Million)
manufacturer_sales <- manufacturer_sales %>%
  mutate(Percentage = Unit_Sold_Per_Million / total_units * 100)

# Define a color palette for fantasy colors
color_palette <- c(
  '#FF6347', '#FF4500', '#FFD700', '#32CD32', '#4169E1', '#8A2BE2',
  '#FF1493', '#00FA9A', '#D2691E', '#DC143C', '#B22222', '#4B0082',
  '#7FFF00', '#00CED1', '#FF69B4', '#8B4513'
)

# Create interactive bar chart with Plotly
fig15 <- plot_ly(
  data = manufacturer_sales,
  x = ~Manufacturer,
  y = ~Unit_Sold_Per_Million,
  type = 'bar',
  text = ~paste('Units Sold: ', Unit_Sold_Per_Million, '<br>Percentage: ', round(Percentage, 1), '%'),
  textposition = 'outside',
  textfont = list(size = 18, color = 'black'),  # Increase text size and set color
  marker = list(
    color = color_palette[1:nrow(manufacturer_sales)]  # Apply fantasy colors
  ),
  hoverinfo = 'text',  # Use text for hover information
  color = ~Manufacturer  # Ensure different colors for each manufacturer
) %>%
  layout(
    title = list(
      text = 'Total Units Sold by Manufacturer',
      font = list(size = 22, color = '#4A4A4A')  # Title font size and color
    ),
    xaxis = list(
      title = 'Manufacturer',
      tickangle = 45,
      tickfont = list(size = 18)  # X-axis tick font size
    ),
    yaxis = list(
      title = 'Units Sold (Million)',
      tickfont = list(size = 18)  # Y-axis tick font size
    ),
    margin = list(b = 150),  # Adjust bottom margin for x-axis labels
    barmode = 'group',
    hoverlabel = list(
      bgcolor = 'white',
      font = list(size = 20)  # Font size in hover labels
    ),
    legend = list(
      title = list(text = 'Manufacturer'),
      orientation = 'h',
      x = 0.5,
      xanchor = 'center',
      y = -0.2,  # Position legend below the plot
      font = list(size = 18, color = 'black')  # Increase font size and set color to black
    )
  ) %>%
  # Update x-axis to ensure correct order
  layout(
    xaxis = list(
      categoryorder = 'total descending'  # Sort categories based on total values
    )
  )

# Display the plot
fig15

# Aggregate the data by 'Year'
yearly_sales <- aggregate(data[["Unit Sold Per Million"]], 
                          by = list(Year = data[["Year"]]), 
                          sum)

# Rename columns of the aggregated data
colnames(yearly_sales) <- c("Year", "Total Units Sold (Million)")

# Order the results by 'Year'
yearly_sales <- yearly_sales[order(yearly_sales$Year), ]

# Print the results
print(yearly_sales)

# Plot the results
ggplot(yearly_sales, aes(x = Year, y = `Total Units Sold (Million)`)) +
  geom_line(color = "red") +
  geom_point() +
  labs(title = "Total Units Sold by Year", 
       x = "Year", 
       y = "Units Sold (Million)")



# Aggregate the data by 'Year' and 'Manufacturer'
yearly_manufacturer_sales <- data %>%
  group_by(Year, Manufacturer) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap
ggplot(yearly_manufacturer_sales, aes(x = Year, y = Total_Units_Sold, color = Manufacturer, group = Manufacturer)) +
  geom_line() +
  geom_point() +
  facet_wrap(~ Manufacturer) +
  labs(title = "Total Units Sold by Manufacturer Over Years", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Manufacturer") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))



# Filter data for Apple Manufacturer
apple_data <- data %>%
  filter(Manufacturer == "Apple")

# Aggregate the data by 'Year' and 'Model'
apple_model_sales <- apple_data %>%
  group_by(Year, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results with improved colors, line types, and labels
ggplot(apple_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = Total_Units_Sold), vjust = -0.5, size = 3, check_overlap = TRUE) +
  labs(title = "Performance of Apple Models Over Years", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Model",
       linetype = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8)) +
  scale_color_brewer(palette = "Set1") +
  scale_linetype_manual(values = c("solid", "dashed", "dotted", "dotdash"))


# Filter data for the specified manufacturers
selected_manufacturers <- c("Google", "HTC", "LeTV", "Palm", "Research in Motion (RIM)")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap and include model names with adjusted text size and angle
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = Model), size = 4, angle = 45, hjust = 1, vjust = 1.5, check_overlap = TRUE) +  # Adjust text size and angle
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For One Time (Trend)", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5)) 



# Filter data for the specified manufacturers
selected_manufacturers <- c("Huawei", "Oppo", "Sony Ericsson")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap and include model names with adjusted text size and angle
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = Model), size = 4, angle = 45, hjust = 1, vjust = 1.5, check_overlap = TRUE) +  # Adjust text size and angle
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = " Succeed Short Term plan < 5 Years ", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))


# Filter data for the specified manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung", "LG", "Motorola", "Xiaomi")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap without the legend
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 2) +  # Adjust point size if needed
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For Long Term Plan", 
       x = "Year", 
       y = "Units Sold (Million)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.position = "none",  # Hide the legend
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))

# Filter data for the specified manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap without the legend
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 2) +  # Adjust point size if needed
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For The Best Long Term Plan", 
       x = "Year", 
       y = "Units Sold (Million)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.position = "none",  # Hide the legend
        strip.text = element_text(size = 12),
        ) +  # Adjust facet labels size if needed
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))

# Aggregate data by Form Factor
form_factor_sales <- data %>%
  group_by(`Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Reorder the rows based on Total Units Sold
form_factor_sales <- form_factor_sales %>%
  arrange(desc(Total_Units_Sold))

# Define a color palette from RColorBrewer
color_palette <- brewer.pal(n = length(unique(form_factor_sales$`Form.Factor`)), name = "Set3")

# Print the aggregated data
print(form_factor_sales)

# Plotting the results with enhanced styling and grid lines removed
ggplot(form_factor_sales, aes(x = reorder(`Form.Factor`, -Total_Units_Sold), y = Total_Units_Sold, fill = `Form.Factor`)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = color_palette) +  # Apply the color palette
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 12, color = "black"),
    axis.title = element_text(size = 14, color = "black"),
    legend.title = element_text(size = 14, face = "bold", color = "black"),
    legend.text = element_text(size = 12, color = "black"),
    strip.text = element_text(size = 14, face = "bold", color = "black"),
    panel.grid.major = element_blank(),  # Remove major grid lines
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    plot.title = element_text(face = "bold", size = 16, color = "darkblue", hjust = 0.5),  # Center the title
    legend.background = element_rect(fill = "lightgray", color = "black")
  ) +
  labs(
    title = "Total Units Sold by Form Factor", 
    x = "Form Factor", 
    y = "Units Sold (Million)"
  )


# Aggregate data by Form Factor
form_factor_sales <- data %>%
  group_by(`Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Reorder the rows based on Total Units Sold
form_factor_sales <- form_factor_sales %>%
  arrange(desc(Total_Units_Sold))

# Calculate the percentage of each form factor
total_units <- sum(form_factor_sales$Total_Units_Sold)
form_factor_sales <- form_factor_sales %>%
  mutate(Percentage = Total_Units_Sold / total_units * 100)

# Define a color palette from RColorBrewer
color_palette <- brewer.pal(n = length(unique(form_factor_sales$`Form.Factor`)), name = "Set3")

# Create the interactive bar chart with Plotly
fig14 <- plot_ly(
  data = form_factor_sales,
  x = ~reorder(`Form.Factor`, -Total_Units_Sold),
  y = ~Total_Units_Sold,
  type = 'bar',
  color = ~`Form.Factor`,
  colors = color_palette,
  text = ~paste('Total Units Sold: ', Total_Units_Sold, '<br>Percentage: ', round(Percentage, 1), '%'),
  textposition = 'outside',
  textfont = list(color = 'black'),  # Set font color for text on bars
  hoverinfo = 'text',  # Display text on hover
  showlegend = TRUE
) %>%
  layout(
    title = 'Total Units Sold Perm Million  by Form Factor',
    xaxis = list(
      title = 'Form Factor',
      tickangle = 45,
      showgrid = FALSE  
    ),
    yaxis = list(
      title = 'Units Sold (Million)',
      showgrid = FALSE  # Remove y-axis gridlines
    ),
    margin = list(b = 120),  # Adjust bottom margin for x-axis labels
    legend = list(
      title = 'Form Factor',
      font = list(size = 14, color = 'black')
    ),
    annotations = list(
      list(
        text = 'Hover over bars to see details',
        x = 0.5,
        y = -0.15,
        xref = 'paper',
        yref = 'paper',
        showarrow = FALSE,
        font = list(size = 16, color = 'black'),
        align = 'center'
      )
    )
  )

# Display the interactive plot
fig14

# Filter data for 'Bar' and 'TouchScreen' form factors
filtered_data <- data %>%
  filter(`Form.Factor` %in% c("Bar", "Touchscreen"))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Form.Factor'
yearly_sales <- filtered_data %>%
  group_by(Year, `Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plotting the results
ggplot(yearly_sales, aes(x = Year, y = Total_Units_Sold, color = `Form.Factor`, linetype = `Form.Factor`)) +
  geom_line(size = 1) +  # Line for each form factor
  geom_point(size = 3) +  # Points on the line
  scale_color_manual(values = c("Bar" = "blue", "Touchscreen" = "red")) +  # Custom colors for each form factor
  scale_linetype_manual(values = c("Bar" = "solid", "Touchscreen" = "dashed")) +  # Custom line types
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 12, color = "black"),
    axis.title = element_text(size = 14, color = "black"),
    legend.title = element_text(size = 14, face = "bold", color = "black"),
    legend.text = element_text(size = 12, color = "black"),
    plot.title = element_text(face = "bold", size = 16, color = "darkblue", hjust = 0.5)
  ) +
  labs(
    title = "Yearly Units Sold for Bar and TouchScreen Form Factors", 
    x = "Year", 
    y = "Total Units Sold (Million)",
    color = "Form Factor",
    linetype = "Form Factor"
  )


# Filter data for 'Bar' and 'TouchScreen' form factors
filtered_data <- data %>%
  filter(`Form.Factor` %in% c("Bar", "Touchscreen"))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Form.Factor'
yearly_sales <- filtered_data %>%
  group_by(Year, `Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Create a ggplot object
p <- ggplot(yearly_sales, aes(x = Year, y = Total_Units_Sold, color = `Form.Factor`, linetype = `Form.Factor`)) +
  geom_line(size = 1) +  # Line for each form factor
  geom_point(size = 3) +  # Points on the line
  scale_color_manual(values = c("Bar" = "blue", "Touchscreen" = "red")) +  # Custom colors for each form factor
  scale_linetype_manual(values = c("Bar" = "solid", "Touchscreen" = "dashed")) +  # Custom line types
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 12, color = "black"),
    axis.title = element_text(size = 14, color = "black"),
    legend.title = element_text(size = 14, face = "bold", color = "black"),
    legend.text = element_text(size = 12, color = "black"),
    plot.title = element_text(face = "bold", size = 16, color = "darkblue", hjust = 0.5),
    panel.grid.major = element_blank(),  # Remove major grid lines
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    panel.background = element_rect(fill = "whitesmoke")  # Set background color
  ) +
  labs(
    title = "Yearly Units Sold for Bar and TouchScreen Form Factors", 
    x = "Year", 
    y = "Total Units Sold (Million)",
    color = "Form Factor",
    linetype = "Form Factor"
  )

# Convert ggplot object to a plotly object for interactivity
fig13 <- ggplotly(p)

# Display the interactive plot
fig13

# Filter data to include both smartphones and non-smartphones
filtered_data <- data %>%
  filter(!is.na(Smartphone.))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Convert 'Smartphone.' to a factor with labels
filtered_data$Smartphone. <- factor(filtered_data$Smartphone., levels = c(FALSE, TRUE), labels = c("Non-Smartphone", "Smartphone"))

# Perform t-test
t_test_result <- t.test(`Unit Sold Per Million` ~ Smartphone., data = filtered_data)

# Display t-test results
print(t_test_result)

# Create interactive box plot with Plotly
fig12 <- plot_ly(
  data = filtered_data,
  x = ~Smartphone.,
  y = ~`Unit Sold Per Million`,
  type = 'box',
  color = ~Smartphone.,
  colors = c("Non-Smartphone" = "#FF6347", "Smartphone" = "#4682B4"),  # Tomato and Steel Blue colors
  boxmean = TRUE,  # Show mean
  text = ~paste('Units Sold: ', `Unit Sold Per Million`, '<br>Percentage: ', round(`Unit Sold Per Million` / sum(`Unit Sold Per Million`) * 100, 1), '%'),
  hoverinfo = 'x+y+text'
) %>%
  layout(
    title = 'Units Sold by Smartphone Status',
    xaxis = list(
      title = 'Smartphone Status',
      tickvals = c("Non-Smartphone", "Smartphone"),
      ticktext = c("Non-Smartphone", "Smartphone")
    ),
    yaxis = list(
      title = 'Units Sold (Million)',
      zeroline = FALSE
    ),
    plot_bgcolor = 'lightgray',  # Light gray plot background
    paper_bgcolor = 'white',     # White paper background
    font = list(
      family = "Arial, sans-serif",
      size = 14,
      color = "black"
    ),
    boxmode = 'group'
  )

# Display the plot
fig12

# Filter data to include both smartphones and non-smartphones
filtered_data <- data %>%
  filter(!is.na(Smartphone.))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by year and smartphone status
yearly_sales <- filtered_data %>%
  group_by(Year, Smartphone.) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Convert Smartphone status to factor for better plot labeling
yearly_sales$Smartphone. <- factor(yearly_sales$Smartphone., labels = c("Non-Smartphone", "Smartphone"))

# Create ggplot object
p <- ggplot(yearly_sales, aes(x = Year, y = Total_Units_Sold, color = Smartphone., linetype = Smartphone.)) +
  geom_line(size = 1.2) +  # Line for each form factor
  geom_point(size = 3) +  # Points on the line
  scale_color_manual(values = c("Non-Smartphone" = "#FF6347", "Smartphone" = "#4682B4")) +  # Tomato and Steel Blue colors
  scale_linetype_manual(values = c("Non-Smartphone" = "solid", "Smartphone" = "dashed")) +  # Custom line types
  labs(
    title = "Trends in Units Sold Over Years by Smartphone Status", 
    x = "Year", 
    y = "Total Units Sold (Million)",
    color = "Smartphone Status",
    linetype = "Smartphone Status"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 14, color = "black"),
    axis.text.y = element_text(size = 14, color = "black"),
    axis.title = element_text(size = 16, color = "black"),
    plot.title = element_text(face = "bold", size = 18, color = "darkblue", hjust = 0.5),
    legend.title = element_text(size = 14, face = "bold", color = "black"),
    legend.text = element_text(size = 12, color = "black"),
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.background = element_rect(fill = "white", color = NA),  # Simple white background
    plot.background = element_rect(fill = "lightgray", color = NA)  # Light gray plot background
  )

# Convert ggplot object to plotly for interactivity
fig11 <- ggplotly(p)

# Display the interactive plot
fig11

# Filter data for the selected manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate data by Manufacturer, Smartphone status, and Form Factor
aggregated_data <- filtered_data %>%
  group_by(Manufacturer, Smartphone., Form.Factor) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Convert Smartphone status to a factor for better labeling
aggregated_data$Smartphone. <- factor(aggregated_data$Smartphone., labels = c("Non-Smartphone", "Smartphone"))

# Plotting the results
p <- ggplot(aggregated_data, aes(x = Manufacturer, y = Total_Units_Sold, fill = interaction(Smartphone., Form.Factor))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Total Units Sold by Manufacturer, Smartphone Status, and Form Factor",
    x = "Manufacturer",
    y = "Total Units Sold (Million)",
    fill = "Smartphone Status and Form Factor"
  ) +
  scale_fill_manual(values = c("Smartphone.Bar" = "#FF6347", "Smartphone.Touchscreen" = "#4682B4", "Non-Smartphone.Bar" = "#32CD32", "Non-Smartphone.Touchscreen" = "#FFD700")) +  # Custom colors
  theme_minimal(base_size = 16) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 14, color = "black"),
    axis.text.y = element_text(size = 14, color = "black"),
    axis.title = element_text(size = 16, color = "black"),
    plot.title = element_text(face = "bold", size = 18, color = "darkblue", hjust = 0.5),
    legend.title = element_text(size = 14, face = "bold", color = "black"),
    legend.text = element_text(size = 12, color = "black"),
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.background = element_rect(fill = "white", color = NA),  # Simple white background
    plot.background = element_rect(fill = "lightgray", color = NA)  # Light gray plot background
  )

# Convert ggplot object to plotly for interactivity
fig10 <- ggplotly(p)

# Display the interactive plot
fig10

1-Total Units Sold Per Manufacturer

Show Code
# Load data from Google Drive
url <- "https://drive.google.com/uc?id=1W9UYyKDAV3UZWPppYs7N8R9ne5ws5iZ3&export=download"
data <- read.csv(url)


# Convert 'Year' column to numeric (just in case it's needed later)
data$Year <- as.numeric(as.character(data$Year))
# Aggregate the data by 'Manufacturer'
manufacturer_sales <- aggregate(data[["UnitsSold"]], 
                                by = list(Manufacturer = data[["Manufacturer"]]), 
                                sum)

# Rename columns of the aggregated data
colnames(manufacturer_sales) <- c("Manufacturer", "Unit_Sold_Per_Million")

# Reorder the Manufacturer factor levels based on Unit_Sold_Per_Million
manufacturer_sales$Manufacturer <- factor(
  manufacturer_sales$Manufacturer,
  levels = manufacturer_sales$Manufacturer[order(-manufacturer_sales$Unit_Sold_Per_Million)]
)

# Replace "Research in Motion (RIM)" with "RIM"
manufacturer_sales$Manufacturer[manufacturer_sales$Manufacturer == "Research in Motion (RIM)"] <- "RIM"
## Warning in `[<-.factor`(`*tmp*`, manufacturer_sales$Manufacturer == "Research
## in Motion (RIM)", : invalid factor level, NA generated
# Print the results
print(manufacturer_sales)
##     Manufacturer Unit_Sold_Per_Million
## 1          Apple                1669.3
## 2         Google                   2.1
## 3            HTC                  16.0
## 4         Huawei                 113.8
## 5           LeTV                   3.0
## 6             LG                  92.0
## 7       Motorola                 323.0
## 8          Nokia                2374.5
## 9           Oppo                  16.7
## 10          Palm                   2.0
## 11          <NA>                  15.0
## 12       Samsung                 994.5
## 13 Sony Ericsson                  45.0
## 14        Xiaomi                  99.1
# Convert ggplot to an interactive plot using plotly
fig1 <- ggplotly(p2, width = 900, height = 600)
# Convert ggplot to an interactive plot using plotly
fig1 
Observations:

2-Total Unit Sold Over Years

Show Code
# Aggregate the data by 'Year'
yearly_total_sales <- data %>%
  group_by(Year) %>%
  summarise(Total_Units_Sold = sum(UnitsSold), .groups = 'drop')
# Convert ggplot to an interactive plot using plotly
fig2 <- ggplotly(p2, width = 900, height = 600)
# Convert ggplot to an interactive plot using plotly
fig2











Observations:

3- Units Sold by each Manufacturer Over Years

Show Code
  # Filter data for the specified manufacturers
selected_manufacturers <- c("Google", "HTC", "LeTV", "Palm", "Research in Motion (RIM)")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap and include model names with adjusted text size and angle
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = Model), size = 4, angle = 45, hjust = 1, vjust = 1.5, check_overlap = TRUE) +  # Adjust text size and angle
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For One Time (Trend)", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5)) 
fig3
fig3
Observations:

We will now classify and decompose market strategies for all manufacturers into four distinct groups:


4- One-Time Trend

Manufacturers with only one model achieving top rank, showing a single-time trend in their sales performance

Show Code
  # Filter data for the specified manufacturers
selected_manufacturers <- c("Huawei", "Oppo", "Sony Ericsson")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = Model), size = 4, angle = 45, hjust = 1, vjust = 1.5, check_overlap = TRUE) +  # Adjust text size and angle
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = " Succeed Short Term plan < 5 Years ", 
       x = "Year", 
       y = "Units Sold (Million)",
       color = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))
fig4
fig4
Observations:
<

5- Short Term Plan

Classification: Short-Term Plan - This data highlights manufacturers that achieved top sales with multiple models over a period of less than 5 years. Specifically:

Show Code
  # Filter data for the specified manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung", "LG", "Motorola", "Xiaomi")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap without the legend
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 2) +  # Adjust point size if needed
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For Long Term Plan", 
       x = "Year", 
       y = "Units Sold (Million)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.position = "none",  # Hide the legend
        strip.text = element_text(size = 12)) +  # Adjust facet labels size if needed
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))
  
fig5
fig5
Observations:
<

6- Long Term Plan

Classification: Long-Term Plan - This data highlights manufacturers that achieved top sales with multiple models over a period of more than 5 years. Specifically:

Show Code
  # Filter data for the specified manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap without the legend
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 2) +  # Adjust point size if needed
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For The Best Long Term Plan", 
       x = "Year", 
       y = "Units Sold (Million)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.position = "none",  # Hide the legend
        strip.text = element_text(size = 12),
        ) +  # Adjust facet labels size if needed
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))
fig6
fig6
Observations:

7- Best Market Strategic plan

Classification: Best Market Strategy - This data highlights Nokia and Samsung as manufacturers with the most effective market strategies, achieving top sales with multiple models over an extended period. Specifically:

Show Code
  # Filter data for the specified manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Aggregate the data by 'Year' and 'Model'
manufacturer_model_sales <- filtered_data %>%
  group_by(Year, Manufacturer, Model) %>%
  summarise(Total_Units_Sold = sum(`Unit Sold Per Million`), .groups = 'drop')

# Plot the results using facet_wrap without the legend
ggplot(manufacturer_model_sales, aes(x = Year, y = Total_Units_Sold, color = Model, group = Model)) +
  geom_line(size = 1) +
  geom_point(size = 2) +  # Adjust point size if needed
  facet_wrap(~ Manufacturer, scales = "free_y") +  # Create a separate panel for each manufacturer
  labs(title = "For The Best Long Term Plan", 
       x = "Year", 
       y = "Units Sold (Million)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_text(size = 12),
        legend.position = "none",  # Hide the legend
        strip.text = element_text(size = 12),
        ) +  # Adjust facet labels size if needed
  scale_x_continuous(limits = c(1995, 2025), breaks = seq(1995, 2025, by = 5))
fig7
fig7
Observations:



Let’s examine why Nokia’s sales dropped and how Apple and Samsung managed to stay stable. This will help identify strategies to improve market stability for other manufacturers.


8-Total Units Sold by Form Factor

This plot examines Total Units Sold by Form Factor to determine if different form factors impact sales performance.

Show Code
library(RColorBrewer)  # Make sure to include RColorBrewer for color palettes
# Aggregate data by Form Factor
form_factor_sales <- data %>%
  group_by(`Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(UnitsSold), .groups = 'drop')

# Reorder the rows based on Total Units Sold
form_factor_sales <- form_factor_sales %>%
  arrange(desc(Total_Units_Sold))

# Calculate the percentage of each form factor
total_units <- sum(form_factor_sales$Total_Units_Sold)
form_factor_sales <- form_factor_sales %>%
  mutate(Percentage = Total_Units_Sold / total_units * 100)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
fig8










Observations:

9-Yearly units sold for Bar and Touchscreen

Let’s dive deeper to examine how the sales of bar and touchscreen phones have evolved over the years to better understand their impact on market trends.

Show Code
# Filter data for 'Bar' and 'Touchscreen' form factors
filtered_data <- data %>%
  filter(`Form.Factor` %in% c("Bar", "Touchscreen"))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))
# Aggregate the data by 'Year' and 'Form.Factor'
yearly_sales <- filtered_data %>%
  group_by(Year, `Form.Factor`) %>%
  summarise(Total_Units_Sold = sum(UnitsSold), .groups = 'drop')
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fig9


Observations:
  1. Bar Form Factor Trends:
    • Significant increase in sales from 1999 to 2007.
    • Sales peaked in 2005, then declined sharply after 2007.
    • Failed to achieve high sales in later years.
  2. Touchscreen Form Factor Trends:
    • Steady rise in sales from 2008 to 2019.
    • Peak in 2019, followed by a gradual decline.
    • Consistently high sales figures compared to other form factors.
  3. Market Impact:
    • Nokia’s primary focus on bar phones aligns with the decline in bar phone sales, contributing to its decreased market performance.
    • Apple’s exclusive focus on touchscreens supports its sustained success and alignment with consumer preferences.
    • Samsung’s diverse portfolio across form factors allowed it to adapt and remain strong in the market.

10-Trend in Unit sold Over years By Smartphone Status

<

Next, we’ll explore the trends in units sold over the years, categorized by smartphone status..

Show Code
# Filter data to include both smartphones and non-smartphones
filtered_data <- data %>%
  filter(!is.na(Smartphone.))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))

# Convert 'Smartphone.' to a factor with labels
filtered_data$Smartphone. <- factor(filtered_data$Smartphone., levels = c(FALSE, TRUE), labels = c("Non-Smartphone", "Smartphone"))
# Perform t-test
t_test_result <- t.test(UnitsSold ~ Smartphone., data = filtered_data)
# Display t-test results in R Markdown
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  UnitsSold by Smartphone.
## t = 2.9613, df = 49.125, p-value = 0.004708
## alternative hypothesis: true difference in means between group Non-Smartphone and group Smartphone is not equal to 0
## 95 percent confidence interval:
##  12.17153 63.55956
## sample estimates:
## mean in group Non-Smartphone     mean in group Smartphone 
##                     74.87143                     37.00588
# Filter data to include both smartphones and non-smartphones
filtered_data <- data %>%
  filter(!is.na(Smartphone.))

# Ensure 'Year' is numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))
# Aggregate the data by year and smartphone status
yearly_sales <- filtered_data %>%
  group_by(Year, Smartphone.) %>%
  summarise(Total_Units_Sold = sum(UnitsSold), .groups = 'drop')
# Convert Smartphone status to factor for better plot labeling
yearly_sales$Smartphone. <- factor(yearly_sales$Smartphone., labels = c("Non-Smartphone", "Smartphone"))
# Convert ggplot to an interactive plot using plotly
fig10
fig11







Observations:

11-Units sold By Manufacturer (Smartphone status - Form Factor)

Next, we will analyze the plot showing units sold by manufacturer, segmented by smartphone status and form factor. This will help us understand how different manufacturers perform across various form factors and whether their smartphone status influences their sales performance.

Show Code
# Load data from Google Drive
url <- "https://drive.google.com/uc?id=1W9UYyKDAV3UZWPppYs7N8R9ne5ws5iZ3&export=download"
data <- read.csv(url)

# Filter data for selected manufacturers
selected_manufacturers <- c("Apple", "Nokia", "Samsung")
filtered_data <- data %>%
  filter(Manufacturer %in% selected_manufacturers)

# Convert 'Year' column to numeric
filtered_data$Year <- as.numeric(as.character(filtered_data$Year))
# Aggregate data by manufacturer, smartphone status, and form factor
aggregated_data <- filtered_data %>%
  group_by(Manufacturer, Smartphone., Form.Factor) %>%
  summarise(Total_Units_Sold = sum(UnitsSold), .groups = 'drop')

# Convert 'Smartphone.' column to factor for better display
aggregated_data$Smartphone. <- factor(aggregated_data$Smartphone., labels = c("Non-Smartphone", "Smartphone"))
fig12










Observations:
  1. Apple’s hot sales are in smartphones with touchscreens, indicating a clear and focused strategy on this type.
  2. Nokia’s highest sales are in non-smartphone bar models, suggesting a targeted approach towards simpler devices.
  3. Samsung has a mixed strategy, achieving significant sales in both smartphones with touchscreens and non-smartphone bar models.
  4. The most effective market play appears to be in the smartphone with touchscreen category, as it leads the sales across manufacturers.

12-Total Units Sold per Manufacturer

This pie chart illustrates the total units sold for each manufacturer, showcasing the percentage of overall sales contributed by each one. It provides a clear visual representation of how sales are distributed among different manufacturers.

Show Code
# Calculate the number of models per manufacturer that achieved a top rank
model_count_by_manufacturer <- data %>%
  group_by(Manufacturer) %>%
  summarise(Number_of_Models_with_Top_Rank = n()) %>%
  mutate(Percentage = Number_of_Models_with_Top_Rank / sum(Number_of_Models_with_Top_Rank) * 100)

# Calculate the total units sold per manufacturer
units_sold_by_manufacturer <- data %>%
  group_by(Manufacturer) %>%
  summarise(Total_Units_Sold = sum(UnitsSold)) %>%
  mutate(Percentage = Total_Units_Sold / sum(Total_Units_Sold) * 100)

# Calculate the total units sold per manufacturer
units_sold_by_manufacturer <- data %>%
  group_by(Manufacturer) %>%
  summarise(Total_Units_Sold = sum(UnitsSold)) %>%
  mutate(Percentage = Total_Units_Sold / sum(Total_Units_Sold) * 100)

# Define an interactive color palette
manufacturer_colors <- c("#FFEB3B", "#81D4FA", "#A5D6A7", "#FFD700", "#FFB74D", "#4FC3F7", "#C8E6C9")

# Pie chart for total units sold with hover information
pie_chart_units <- plot_ly(
  units_sold_by_manufacturer,
  labels = ~Manufacturer,
  values = ~Total_Units_Sold,
  type = 'pie',
  text = ~paste("Total Units Sold: ", Total_Units_Sold, "<br>Percentage: ", round(Percentage, 2), "%"),
  hoverinfo = 'text',
  textinfo = 'label+percent',
  insidetextorientation = 'radial',
  marker = list(colors = manufacturer_colors, line = list(color = '#FFFFFF', width = 1))
) %>%
  layout(
    title = list(
      text = 'Total Units Sold by Manufacturer',
      font = list(size = 16, color = "#333333")
    ),
    showlegend = TRUE,
    legend = list(
      font = list(size = 12, color = "#333333")
    )
  )
pie_chart_units













Observations:

Show Code
## Count the number of models per manufacturer
model_count_by_manufacturer <- data %>%
  group_by(Manufacturer) %>%
  summarise(Number_of_Models_with_Top_Rank = n()) %>%
  mutate(Percentage = Number_of_Models_with_Top_Rank / sum(Number_of_Models_with_Top_Rank) * 100)

# Define interactive colors for the chart
manufacturer_colors <- c("#FFEB3B", "#81D4FA", "#A5D6A7", "#FFD700", "#FFB74D", "#4FC3F7", "#C8E6C9")

# Create a doughnut chart
doughnut_chart <- plot_ly(
  model_count_by_manufacturer,
  labels = ~Manufacturer,
  values = ~Number_of_Models_with_Top_Rank,
  type = 'pie',
  textinfo = 'label+percent',
  insidetextorientation = 'radial',
  hole = 0.4,  # Create a "doughnut" effect by adding a hole in the middle
  marker = list(colors = manufacturer_colors, line = list(color = '#FFFFFF', width = 1))
) %>%
  layout(
    title = list(
      text = 'Number of Models with Top Rank for Each Manufacturer',
      font = list(size = 16, color = "#333333", family = "Arial")
    ),
    showlegend = TRUE,
    legend = list(
      font = list(size = 12, color = "#333333"),
      orientation = "h",  # Horizontal orientation for legend
      xanchor = "center",  # Center the legend horizontally
      yanchor = "top",     # Anchor the legend to the top
      x = 0.5,             # Position the legend in the middle of the chart
      y = -0.1,            # Position the legend just below the chart
      traceorder = "normal",  # Display items in the order they appear in the data
      itemclick = "toggleothers"  # Click on a legend item to toggle the visibility of others
    )
  )
# Display both doughnut charts side by side
doughnut_chart




















Observations:

Perform-predictive-analysis-using-regression-models

We conduct predictive analysis using simple and multiple regression models to forecast sales performance. The simple model assesses the impact of manufacturer alone, while the multiple regression model incorporates additional factors such as smartphone status, form factor, and year for a more detailed prediction.

Show Code

Define a function to calculate RMSE, R-squared, and MAPE manually

calculate_metrics <- function(model, data) { # Predictions from the model predictions <- predict(model, newdata = data)

Actual values

actuals <- data$UnitsSold

Calculate RMSE

rmse_value <- sqrt(mean((actuals - predictions)^2))

Calculate R-squared

residual_sum_of_squares <- sum((actuals - predictions)^2) total_sum_of_squares <- sum((actuals - mean(actuals))^2) r_squared_value <- 1 - (residual_sum_of_squares / total_sum_of_squares)

Calculate MAPE

mape_value <- mean(abs((actuals - predictions) / actuals)) * 100

Return metrics as a named vector

return(c(RMSE = rmse_value, R_squared = r_squared_value, MAPE = mape_value)) }

Create a data frame to store metrics

metrics_summary <- data.frame( Model = c(“Simple Regression”, “Multiple Regression”), RMSE = numeric(2), R_squared = numeric(2), MAPE = numeric(2), stringsAsFactors = FALSE )

Calculate metrics for the simple model

simple_metrics <- calculate_metrics(simple_model, data1) metrics_summary[1, 2:4] <- simple_metrics

Calculate metrics for the multiple model

multiple_metrics <- calculate_metrics(multiple_model, data1) metrics_summary[2, 2:4] <- multiple_metrics

fig13
fig13

The simple regression model shows an RMSE of 44.44, an R-squared of 0.41, and a MAPE of 163.14, indicating moderate predictive accuracy. In contrast, the multiple regression model performs better with an RMSE of 37.64, an R-squared of 0.58, and a MAPE of 107.23, reflecting improved accuracy and a more detailed understanding of sales factors.


Conclusion

Appendix





Go to Top