library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
df <- read.csv('Auto Sales data.csv')
Frame 1 - How many sales were gathered from each country?
countrySales <- df |>
group_by(COUNTRY) |>
summarize(TOTALSALES = sum(SALES))
countrySales
## # A tibble: 19 × 2
## COUNTRY TOTALSALES
## <chr> <dbl>
## 1 Australia 630623.
## 2 Austria 202063.
## 3 Belgium 108413.
## 4 Canada 224079.
## 5 Denmark 245637.
## 6 Finland 329582.
## 7 France 1110917.
## 8 Germany 220472.
## 9 Ireland 57756.
## 10 Italy 374674.
## 11 Japan 188168.
## 12 Norway 307464.
## 13 Philippines 94016.
## 14 Singapore 288488.
## 15 Spain 1215687.
## 16 Sweden 210014.
## 17 Switzerland 117714.
## 18 UK 478880.
## 19 USA 3355576.
countrySales <- countrySales |>
tail(10)
ggplot(
data = countrySales,
mapping = aes(x = COUNTRY, y = TOTALSALES)
) +
geom_bar(
stat = "identity",
) +
labs(
title = "Total Sales by Country"
)
It makes sense that USA would spend the most on automobiles. I wasn’t expecting that much of a lead, though.
Frame 2 - There’s a lot of products, but which is the most and least expensive…?
df |>
group_by(PRODUCTLINE) |>
summarize(CHEAPEST = min(PRICEEACH), COSLIEST = max(PRICEEACH))
## # A tibble: 7 × 3
## PRODUCTLINE CHEAPEST COSLIEST
## <chr> <dbl> <dbl>
## 1 Classic Cars 28.3 249.
## 2 Motorcycles 32.6 230.
## 3 Planes 40.2 233.
## 4 Ships 43.7 173.
## 5 Trains 36.1 199.
## 6 Trucks and Buses 35.7 233.
## 7 Vintage Cars 26.9 253.
Frame 3 - How about the size of orders between companies?
companySales <- df |>
group_by(CUSTOMERNAME, DEALSIZE) |>
count(DEALSIZE, name = "DEALS") |>
ungroup() |>
complete(CUSTOMERNAME, DEALSIZE, fill = list(DEALS = 0, freq = 0))
companySales <- companySales[order(companySales$DEALS),]
## companySales <- companySales[order(companySales$DEALS, decreasing = TRUE)]
companySales
## # A tibble: 267 × 3
## CUSTOMERNAME DEALSIZE DEALS
## <chr> <chr> <int>
## 1 Atelier graphique Large 0
## 2 Auto-Moto Classics Inc. Large 0
## 3 Bavarian Collectables Imports, Co. Large 0
## 4 Boards & Toys Co. Large 0
## 5 Cambridge Collectables Co. Large 0
## 6 Double Decker Gift Stores, Ltd Large 0
## 7 Enaco Distributors Large 0
## 8 Gift Ideas Corp. Large 0
## 9 Gifts4AllAges.com Large 0
## 10 giftsbymail.co.uk Large 0
## # ℹ 257 more rows
There’s 267 rows and 3 different sizes of deal… 267 / 3 = 89 different customers on record.
21 of them have not placed Large orders! That’s nearly a quarter of the roster… if we were to choose a customer at random, there’s a (21/89 ~23.6%) chance they don’t have a Large order on record.
I noticed that they tend to have a fittingly large value in the Sales column (>7000 units or so?). Perhaps most of them are from countries with lower total sales.
How good is the connection between the total sales of a country and the sizes of orders a company there has placed?
comboFrame <- df |>
group_by(COUNTRY, PRODUCTLINE) |>
count(PRODUCTLINE, name = "ORDERS") |>
ungroup() |>
complete(COUNTRY, PRODUCTLINE, fill = list(ORDERS = 0, freq = 0))
## comboFrame <- comboFrame[order(comboFrame$ORDERS),]
comboFrame
## # A tibble: 133 × 3
## COUNTRY PRODUCTLINE ORDERS
## <chr> <chr> <int>
## 1 Australia Classic Cars 53
## 2 Australia Motorcycles 26
## 3 Australia Planes 25
## 4 Australia Ships 2
## 5 Australia Trains 1
## 6 Australia Trucks and Buses 20
## 7 Australia Vintage Cars 58
## 8 Austria Classic Cars 25
## 9 Austria Motorcycles 5
## 10 Austria Planes 6
## # ℹ 123 more rows
Similar to the last group-by data frame, I sorted this one with the lowest values of its continuous variable first. A couple combinations are not present, like Austria not having Trains shipped, and the Philippines not having ever purchased 3 of the 7 types of products…
And then there’s Switzerland, only having purchased Classic Cars (31 of them). They have one of the lowest total sales, and it’s just for one type of product.
On the other end of the spectrum, there’s Spain with 120 orders of Classic Card. USA, France, Spain and Australia appear frequently on the last 10 rows, too!
comboFrame <- comboFrame |>
head(42)
ggplot(
data = comboFrame,
mapping = aes(x = COUNTRY, y = ORDERS, fill = PRODUCTLINE)
) +
geom_bar(
stat = "identity",
position = "dodge"
)