The goal of this project is to analyze the online sales in five countries “Australia, Belgium, France, Germany, United Kingdom”,conducted for expansion and investments purposes.
The sample size used for this project is 3% of online sales in those countries in 2010 rendomly selected from full dataset.
it is important in the project to understand the features of the data and predict the best location for expansion and future investments.
I will check the number of sales in each country using adding a column of total amount of sales in each country.
Load the libraries
library(ggplot2)
First we get the data, show the summary, means. medians.
url <- "https://raw.githubusercontent.com/akarimhammoud/RbridgeFinalProjectnlineRetail/master/Online%20Retail.csv"
OnlineSales <- read.csv(file= url, header=TRUE, sep=",")
summary(OnlineSales)
## InvoiceNo StockCode Description Quantity
## Length:513 Length:513 Length:513 Min. : -7.00
## Class :character Class :character Class :character 1st Qu.: 4.00
## Mode :character Mode :character Mode :character Median : 8.00
## Mean : 16.78
## 3rd Qu.: 12.00
## Max. :432.00
## InvoiceDate UnitPrice CustomerID Country
## Length:513 Min. : 0.000 Min. :12395 Length:513
## Class :character 1st Qu.: 1.250 1st Qu.:12567 Class :character
## Mode :character Median : 2.100 Median :12686 Mode :character
## Mean : 3.665 Mean :14015
## 3rd Qu.: 4.250 3rd Qu.:15311
## Max. :42.950 Max. :18074
head(OnlineSales)
## InvoiceNo StockCode Description Quantity InvoiceDate
## 1 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/10 8:26
## 2 536365 71053 WHITE METAL LANTERN 6 12/1/10 8:26
## 3 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/10 8:26
## 4 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/10 8:26
## 5 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/10 8:26
## 6 536365 22752 SET 7 BABUSHKA NESTING BOXES 2 12/1/10 8:26
## UnitPrice CustomerID Country
## 1 2.55 17850 United Kingdom
## 2 3.39 17850 United Kingdom
## 3 2.75 17850 United Kingdom
## 4 3.39 17850 United Kingdom
## 5 3.39 17850 United Kingdom
## 6 7.65 17850 United Kingdom
str(OnlineSales)
## 'data.frame': 513 obs. of 8 variables:
## $ InvoiceNo : chr "536365" "536365" "536365" "536365" ...
## $ StockCode : chr "85123A" "71053" "84406B" "84029G" ...
## $ Description: chr "WHITE HANGING HEART T-LIGHT HOLDER" "WHITE METAL LANTERN" "CREAM CUPID HEARTS COAT HANGER" "KNITTED UNION FLAG HOT WATER BOTTLE" ...
## $ Quantity : int 6 6 8 6 6 2 6 6 6 32 ...
## $ InvoiceDate: chr "12/1/10 8:26" "12/1/10 8:26" "12/1/10 8:26" "12/1/10 8:26" ...
## $ UnitPrice : num 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
## $ CustomerID : int 17850 17850 17850 17850 17850 17850 17850 17850 17850 13047 ...
## $ Country : chr "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...
Create new frame of data and called “mySets”, with only five columns and rename the “Description Column” to “Details”, and the “InvoiceDate Column” to “date”.
mySets <- OnlineSales[ c("Description", "Quantity", "InvoiceDate", "UnitPrice", "Country")]
colnames(mySets) <- c("Details", "Quantity", "Date", "UnitPrice", "Country")
head(mySets)
## Details Quantity Date UnitPrice
## 1 WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/10 8:26 2.55
## 2 WHITE METAL LANTERN 6 12/1/10 8:26 3.39
## 3 CREAM CUPID HEARTS COAT HANGER 8 12/1/10 8:26 2.75
## 4 KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/10 8:26 3.39
## 5 RED WOOLLY HOTTIE WHITE HEART. 6 12/1/10 8:26 3.39
## 6 SET 7 BABUSHKA NESTING BOXES 2 12/1/10 8:26 7.65
## Country
## 1 United Kingdom
## 2 United Kingdom
## 3 United Kingdom
## 4 United Kingdom
## 5 United Kingdom
## 6 United Kingdom
Replacing ‘United Kingdom’ to ‘UK’ in the data of the Country Column.
mySets$Country <- sub("*United Kingdom", "UK", mySets$Country)
head(mySets)
## Details Quantity Date UnitPrice Country
## 1 WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/10 8:26 2.55 UK
## 2 WHITE METAL LANTERN 6 12/1/10 8:26 3.39 UK
## 3 CREAM CUPID HEARTS COAT HANGER 8 12/1/10 8:26 2.75 UK
## 4 KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/10 8:26 3.39 UK
## 5 RED WOOLLY HOTTIE WHITE HEART. 6 12/1/10 8:26 3.39 UK
## 6 SET 7 BABUSHKA NESTING BOXES 2 12/1/10 8:26 7.65 UK
Here I want to add new column and call it “Amount” to calculate the total amount of sales by multiplying the UnitePrice column with Quantity column.
mySets["Amount"] <- mySets$Quantity * mySets$UnitPrice
head(mySets)
## Details Quantity Date UnitPrice Country
## 1 WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/10 8:26 2.55 UK
## 2 WHITE METAL LANTERN 6 12/1/10 8:26 3.39 UK
## 3 CREAM CUPID HEARTS COAT HANGER 8 12/1/10 8:26 2.75 UK
## 4 KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/10 8:26 3.39 UK
## 5 RED WOOLLY HOTTIE WHITE HEART. 6 12/1/10 8:26 3.39 UK
## 6 SET 7 BABUSHKA NESTING BOXES 2 12/1/10 8:26 7.65 UK
## Amount
## 1 15.30
## 2 20.34
## 3 22.00
## 4 20.34
## 5 20.34
## 6 15.30
We porject that the total Amount of sales will increase at least 10% next year, we increase the total Amount of sales by 10% and create new column for the AmountNextYear “Amount” Multipled by 10%.
mySets$AmountNextYear <- mySets$Amount * 1.10
head(mySets)
## Details Quantity Date UnitPrice Country
## 1 WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/10 8:26 2.55 UK
## 2 WHITE METAL LANTERN 6 12/1/10 8:26 3.39 UK
## 3 CREAM CUPID HEARTS COAT HANGER 8 12/1/10 8:26 2.75 UK
## 4 KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/10 8:26 3.39 UK
## 5 RED WOOLLY HOTTIE WHITE HEART. 6 12/1/10 8:26 3.39 UK
## 6 SET 7 BABUSHKA NESTING BOXES 2 12/1/10 8:26 7.65 UK
## Amount AmountNextYear
## 1 15.30 16.830
## 2 20.34 22.374
## 3 22.00 24.200
## 4 20.34 22.374
## 5 20.34 22.374
## 6 15.30 16.830
Boxplot of Transactions spreads of Amount spent on online purchases per Country, We notice people are willing to spend reletivley higher Purchases Amounts in the UK.
ggplot(mySets, aes(y = Amount,x = Country, fill= Amount)) + geom_boxplot()+ggtitle("Boxplot of Amount spreads of online purchases per Country.")+theme_classic()+xlab("Countries")
Boxplot of Unit Price spreads of online purchases per Country.
ggplot(mySets, aes(y = UnitPrice,x = Country, fill= UnitPrice)) + geom_boxplot()+ggtitle("Boxplot of Unit Price spreads of online purchases per Country.")+theme_classic()+xlab("Country")
Using histogram we want to check the frequency of the Unit Prices for the items that been sold in those countries, we notice the majorty of the items has unit prices less than $10.
hist(mySets$UnitPrice, breaks= 10, xlim = c(0, 50), ylim = c(0, 500), xlab = "UnitPrice", main = "R Histogram \nUnit Price", col = "red")
Now to make a Density Plot
hist(mySets$UnitPrice, freq = FALSE, main = "Density Plot of the Unit Prices in the Study")
New we add the distribution curve for the unit prices by adding aesthetics.
hist(mySets$UnitPrice, freq = FALSE, xlab = "Unit Price", main = "Density Plot of the A Unit Price per Dollar in this Study", col="lightblue")
curve(dnorm(x, mean=mean(mySets$UnitPrice), sd=sd(mySets$UnitPrice)), add=TRUE, col="darkred", lwd=2)
Now I want to check how much People are welling to buy online using historgram and ggplot2.
A <- ggplot(mySets, aes(x=Amount))
B <- A + geom_histogram(binwidth = 1, color='red',fill='pink', alpha = 0.4)
C <- B + xlab('Amount of sales') + ylab('Count')
print(C + ggtitle("Count of the Total Amount of sales"))
Now I want to check the density of the Amounts spent on the internet in 2010.
ggplot(data = mySets) + geom_density(aes(x = Amount), fill = "grey50")
Now I want to check the Amounts spent each of the five countries we have on the list using ggplot with line data.
ggplot(mySets, aes(x = Country, y = Amount)) + geom_line()
Scatter plot using Country and Amount variables
ggplot(mySets, aes(x = Country, y = Amount))+ geom_point()
Scatter plot using Country, Amount, and Unit Price variables
graph <- ggplot(mySets, aes(x= Country, y = Amount)) + geom_line(color = "red") + geom_point()
graph <- graph + geom_line(aes(x = Country, y = UnitPrice), color = "green")
graph
Using Graph scatter plot for Amounts spent each of the five countries using geom_point and geom_point with the opposit axis.
ggplot(mySets, aes(x = Amount, y = Country)) + geom_point(na.rm=TRUE)+geom_smooth(method=lm,se=FALSE, na.rm=TRUE)
## `geom_smooth()` using formula 'y ~ x'
After analyzing the online sales in five different countries “Australia, Belgium, France, Germany, United Kingdom”, its clearly that people are willing to buy stuff online 2010 in the UK more than any other country in the study. The majority of the items and units prices sold online are less than $10, this means the cheaper items with less than 10 dollars the more likely to be sold online, but in Australia the unit prices of online purchases are little higher than than the other countries. Finally I advise to expand in the UK as the country with the highest numbers of online sales by providing more low prices products, and in order to invest in the other countires there must be more programs to encourage buyers to buy products online.