Report

LA

Aditya gupta and Aayan Khan


Problem Statement

Develop a script in R to create a funnel chart illustrating user drop-off on an e-commerce website using real dataset.


What we will do?

  1. Load required libraries

  2. Load dataset

  3. Perform exploratory data analysis

  4. Create funnel data

  5. Create funnel chart

  6. Analyze results


Step 1: Load Required Libraries

In this step, we initialize the R environment by loading specific software packages.

  • ggplot2: The primary engine for creating the map and charts.

  • dplyr: A tool for “data plumbing” used to filter, group, and calculate revenue.

  • maps: Provides the spatial coordinates (longitude and latitude) for world countries.

  • readr: Used to import your data.csv file quickly and accurately .

library(ggplot2) 
Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(maps)
Warning: package 'maps' was built under R version 4.5.3

Step 2: Load Dataset

We bring your external CSV file into R’s memory as an object named data. Using head(data) allows us to verify that columns like QuantityUnitPrice, and Country loaded correctly.

# Load the actual sales data
data <- read.csv("data.csv")

# Verify the column names exist
names(data)
[1] "InvoiceNo"   "StockCode"   "Description" "Quantity"    "InvoiceDate"
[6] "UnitPrice"   "CustomerID"  "Country"    

Step 3: Exploratory Data Analysis

This is the “cleaning” phase .

  • Cleaning: We filter out transactions where Quantity or UnitPrice are zero or negative to ensure the revenue calculation is realistic.

  • Calculation: We create a new Revenue column by multiplying Quantity and UnitPrice .

  • Aggregation: We use group_by(Country) and summarise() to get one total revenue figure for each country rather than thousands of individual rows.

library(dplyr)

# Check if columns exist before filtering
if("Quantity" %in% names(data) & "UnitPrice" %in% names(data)) {
  data_clean <- data %>%
    filter(Quantity > 0, UnitPrice > 0) %>%
    mutate(Revenue = Quantity * UnitPrice)
  
  # Group by country for the map
  country_revenue <- data_clean %>%
    group_by(Country) %>%
    summarise(TotalRevenue = sum(Revenue)) %>%
    mutate(Country = tolower(Country))
  
  head(country_revenue)
} else {
  print("Error: Columns 'Quantity' or 'UnitPrice' not found in the dataset.")
}
# A tibble: 6 × 2
  Country   TotalRevenue
  <chr>            <dbl>
1 australia      138521.
2 austria         10199.
3 bahrain           754.
4 belgium         41196.
5 brazil           1144.
6 canada           3666.

Step 4: Create Funnel Data

Before we can map the data, R needs to know the “shapes” of the countries . We use map_data("world") to create a table of thousands of points that outline global borders. We convert all country names to lowercase to ensure they match our sales data perfectly .

world_map <- map_data("world")
world_map$region <- tolower(world_map$region)

# This will display the output for Step 4
head(world_map)
       long      lat group order region subregion
1 -69.89912 12.45200     1     1  aruba      <NA>
2 -69.89571 12.42300     1     2  aruba      <NA>
3 -69.94219 12.43853     1     3  aruba      <NA>
4 -70.00415 12.50049     1     4  aruba      <NA>
5 -70.06612 12.54697     1     5  aruba      <NA>
6 -70.05088 12.59707     1     6  aruba      <NA>

Step 5: Merge Data with Map

This is the most critical technical step. We use a left_join to attach our calculated sales figures to the map coordinates. By matching on the “region” (country name), we tell R exactly which “shape” on the map should represent which “revenue” value .

# Synchronize country names for the merge
country_revenue$region <- country_revenue$Country
country_revenue$region[country_revenue$region == "united kingdom"] <- "uk"
country_revenue$region[country_revenue$region == "usa"] <- "usa"

# Merge the datasets
merged_data <- left_join(world_map, country_revenue, by = "region")

# This will display the output for Step 5
head(merged_data)
       long      lat group order region subregion Country TotalRevenue
1 -69.89912 12.45200     1     1  aruba      <NA>    <NA>           NA
2 -69.89571 12.42300     1     2  aruba      <NA>    <NA>           NA
3 -69.94219 12.43853     1     3  aruba      <NA>    <NA>           NA
4 -70.00415 12.50049     1     4  aruba      <NA>    <NA>           NA
5 -70.06612 12.54697     1     5  aruba      <NA>    <NA>           NA
6 -70.05088 12.59707     1     6  aruba      <NA>    <NA>           NA

Step 6: Create Choropleth Map

The final step is the visualization.

  • geom_polygon: Draws the country shapes using the coordinates from Step 4.

  • aes(fill = TotalRevenue): Tells R to color each country based on its sales volume.

  • scale_fill_gradient: Defines the color scheme (e.g., light blue for low sales, dark blue for high sales) .

  • labs(): Adds the professional finishing touches like the title and legend labels .

ggplot(merged_data, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = TotalRevenue), color = "white", linewidth = 0.1) + # Changed 'size' to 'linewidth'
  scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "grey90") +
  theme_minimal() +
  labs(title = "Global E-commerce Revenue (2010-2011)",
       x = "Longitude", y = "Latitude",
       fill = "Revenue ($)")

Global Revenue Distribution Map

LA-1

Problem Statement

Develop a script in R to create a funnel chart illustrating user drop-off on an e-commerce website using real dataset.

What we will do?

  1. Load required libraries

  2. Load dataset

  3. Perform exploratory data analysis

  4. Create funnel data

  5. Create funnel chart

  6. Analyze results

Step 1: Load Required Libraries

In this step , we load required Libraries

  1. ggplot2: The primary engine for creating the map and charts.

  2. dplyr: A tool for “data plumbing” used to filter, group, and calculate revenue.

  3. maps: Provides the spatial coordinates (longitude and latitude) for world countries.

  4. readr: Used to import your data.csv file quickly and accurately .

Step 2: Load Dataset

We bring your external CSV file into R’s memory as an object named data. Using head(data) allows us to verify that columns like QuantityUnitPrice, and Country loaded correctly.

Load the actual sales data

data <- read.csv(“data.csv”)

Verify the column names exist

names(data)

Step 3: Exploratory Data Analysis

This is the “cleaning” phase .

  • Cleaning: We filter out transactions where Quantity or UnitPrice are zero or negative to ensure the revenue calculation is realistic.

  • Calculation: We create a new Revenue column by multiplying Quantity and UnitPrice .

  • Aggregation: We use group_by(Country) and summarise() to get one total revenue figure for each country rather than thousands of individual rows.

Step 4: Create Funnel Data

Before we can map the data, R needs to know the “shapes” of the countries . We use map_data("world") to create a table of thousands of points that outline global borders. We convert all country names to lowercase to ensure they match our sales data perfectly .

world_map <- map_data(“world”)

world_map\(region <- tolower(world_map\)region)

This will display the output for Step 4

head(world_map)

Step 5: Merge Data with Map

This is the most critical technical step. We use a left_join to a  left_join to attach our calculated sales figures to the map coordinates. By matching on the “region” (country name), we tell R exactly which “shape” on the map should represent which “revenue” value .

Synchronize country names for the merge

country_revenue\(region <- country_revenue\)Country country_revenue\(region[country_revenue\)region == “united kingdom”] <- “uk” country_revenue\(region[country_revenue\)region == “usa”] <- “usa”

Merge the datasets

merged_data <- left_join(world_map, country_revenue, by = “region”)

This will display the output for Step 5

head(merged_data)

Step 6: Create Choropleth Map

The final step is the visualization.

  1. geom_polygon: Draws the country shapes using the coordinates from Step 4.

  2. aes(fill = TotalRevenue): Tells R to color each country based on its sales volume.

  3. scale_fill_gradient: Defines the color scheme (e.g., light blue for low sales, dark blue for high sales) .

  4. labs(): Adds the professional finishing touches like the title and legend labels .