This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Online_Retail <- read.csv('C:/Users/laasy/Documents/Fall 2023/Intro to Statistics in R/Datasets for Final Project/OnlineRetail.csv')
summary(Online_Retail)
## InvoiceNo StockCode Description Quantity
## Length:541909 Length:541909 Length:541909 Min. :-80995.00
## Class :character Class :character Class :character 1st Qu.: 1.00
## Mode :character Mode :character Mode :character Median : 3.00
## Mean : 9.55
## 3rd Qu.: 10.00
## Max. : 80995.00
##
## InvoiceDate UnitPrice CustomerID Country
## Length:541909 Min. :-11062.06 Min. :12346 Length:541909
## Class :character 1st Qu.: 1.25 1st Qu.:13953 Class :character
## Mode :character Median : 2.08 Median :15152 Mode :character
## Mean : 4.61 Mean :15288
## 3rd Qu.: 4.13 3rd Qu.:16791
## Max. : 38970.00 Max. :18287
## NA's :135080
The main pursopse of this Online Retail is to analyze the pattern of the orders with respective of each country.
This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.
The dataset is multivariate, sequential, and time-series in nature, indicating that it consists of multiple interrelated variables collected at regular intervals over time, making it suitable for in-depth analysis of temporal patterns and trends.
The raw data includes 541909 observations of 8 variables. ##
Attribute information: InvoiceNo: Invoice number. Nominal, a 6-digit
integral number uniquely assigned to each transaction. If this code
starts with letter ‘c’, it indicates a cancellation. StockCode: Product
(item) code. Nominal, a 5-digit integral number uniquely assigned to
each distinct product. Description: Product (item) name. Nominal.
Quantity: The quantities of each product (item) per transaction.
Numeric.
InvoiceDate: Invice Date and time. Numeric, the day and time when each
transaction was generated. UnitPrice: Unit price. Numeric, Product price
per unit in sterling. CustomerID: Customer number. Nominal, a 5-digit
integral number uniquely assigned to each customer. Country: Country
name. Nominal, the name of the country where each customer resides.
# standard deviation of Unit Price
std_dev_tp <- sd(Online_Retail$UnitPrice, na.rm = TRUE)
print(std_dev_tp)
## [1] 96.75985
#variance of Unit Price
var_tp <- var(Online_Retail$UnitPrice, na.rm= TRUE)
print(var_tp)
## [1] 9362.469
#sum of total Quantity
Quantity_sum <- sum(Online_Retail$Quantity)
print(Quantity_sum)
## [1] 5176450
# Scattered plot for Customer ID vs Stock Code
library(ggplot2)
plot(Online_Retail$CustomerID,Online_Retail$StockCode, main = "Scatter plot for Customer ID vs Stock Code", xlab="Customer ID", ylab="Stock Code")
## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion
# Histogram for Quantity
hist(Online_Retail$Quantity, breaks = 20, col = "red", main = "Histogram of Quantity", xlab = "Quantity")
# Customer ID vs Country bar plot
data_frame_ot <- Online_Retail
nrow(data_frame_ot)
## [1] 541909
result <- aggregate(data_frame_ot$CustomerID,by=list(data_frame_ot$Country), mean)
result
## Group.1 x
## 1 Australia 12464.66
## 2 Austria 12521.45
## 3 Bahrain NA
## 4 Belgium 12430.30
## 5 Brazil 12769.00
## 6 Canada 17321.08
## 7 Channel Islands 14888.15
## 8 Cyprus 12404.95
## 9 Czech Republic 12781.00
## 10 Denmark 12536.59
## 11 EIRE NA
## 12 European Community 15108.00
## 13 Finland 12517.01
## 14 France NA
## 15 Germany 12646.14
## 16 Greece 13757.42
## 17 Hong Kong NA
## 18 Iceland 12347.00
## 19 Israel NA
## 20 Italy 12648.40
## 21 Japan 12757.80
## 22 Lebanon 12764.00
## 23 Lithuania 15332.00
## 24 Malta 16996.03
## 25 Netherlands 14420.30
## 26 Norway 12437.98
## 27 Poland 12733.06
## 28 Portugal NA
## 29 RSA 12446.00
## 30 Saudi Arabia 12565.00
## 31 Singapore 12744.00
## 32 Spain 12905.37
## 33 Sweden 14697.15
## 34 Switzerland NA
## 35 United Arab Emirates 14984.59
## 36 United Kingdom NA
## 37 Unspecified NA
## 38 USA 12618.85
barplot(result$x, names.arg=result$Group.1, xlab="Customer ID", ylab="Country", col=rainbow(6),
main="Customer ID vs Country",border="black")
# Pie chart for Quantity
pie(table(Online_Retail$Quantity), main="Pie chart for Quantity")