R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Online_Retail <- read.csv('C:/Users/laasy/Documents/Fall 2023/Intro to Statistics in R/Datasets for Final Project/OnlineRetail.csv')
summary(Online_Retail)
##   InvoiceNo          StockCode         Description           Quantity        
##  Length:541909      Length:541909      Length:541909      Min.   :-80995.00  
##  Class :character   Class :character   Class :character   1st Qu.:     1.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :     3.00  
##                                                           Mean   :     9.55  
##                                                           3rd Qu.:    10.00  
##                                                           Max.   : 80995.00  
##                                                                              
##  InvoiceDate          UnitPrice           CustomerID       Country         
##  Length:541909      Min.   :-11062.06   Min.   :12346    Length:541909     
##  Class :character   1st Qu.:     1.25   1st Qu.:13953    Class :character  
##  Mode  :character   Median :     2.08   Median :15152    Mode  :character  
##                     Mean   :     4.61   Mean   :15288                      
##                     3rd Qu.:     4.13   3rd Qu.:16791                      
##                     Max.   : 38970.00   Max.   :18287                      
##                                         NA's   :135080

Goals of Online Retail:

The main pursopse of this Online Retail is to analyze the pattern of the orders with respective of each country.

Data Documentation:

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

The dataset is multivariate, sequential, and time-series in nature, indicating that it consists of multiple interrelated variables collected at regular intervals over time, making it suitable for in-depth analysis of temporal patterns and trends.

The raw data includes 541909 observations of 8 variables. ## Attribute information: InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter ‘c’, it indicates a cancellation. StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product. Description: Product (item) name. Nominal. Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated. UnitPrice: Unit price. Numeric, Product price per unit in sterling. CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer. Country: Country name. Nominal, the name of the country where each customer resides.

Aggregate functions

# standard deviation of Unit Price
std_dev_tp <- sd(Online_Retail$UnitPrice, na.rm = TRUE)
print(std_dev_tp)
## [1] 96.75985
#variance of Unit Price
var_tp <- var(Online_Retail$UnitPrice, na.rm= TRUE)
print(var_tp)
## [1] 9362.469
#sum of total Quantity
Quantity_sum <- sum(Online_Retail$Quantity)
print(Quantity_sum)
## [1] 5176450
# Scattered plot for Customer ID vs Stock Code
library(ggplot2)
plot(Online_Retail$CustomerID,Online_Retail$StockCode, main = "Scatter plot for Customer ID vs Stock Code", xlab="Customer ID", ylab="Stock Code")
## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion

# Histogram for Quantity
hist(Online_Retail$Quantity, breaks = 20, col = "red", main = "Histogram of Quantity", xlab = "Quantity")

# Customer ID vs Country  bar plot
data_frame_ot <- Online_Retail
nrow(data_frame_ot)
## [1] 541909
result <- aggregate(data_frame_ot$CustomerID,by=list(data_frame_ot$Country), mean)
result
##                 Group.1        x
## 1             Australia 12464.66
## 2               Austria 12521.45
## 3               Bahrain       NA
## 4               Belgium 12430.30
## 5                Brazil 12769.00
## 6                Canada 17321.08
## 7       Channel Islands 14888.15
## 8                Cyprus 12404.95
## 9        Czech Republic 12781.00
## 10              Denmark 12536.59
## 11                 EIRE       NA
## 12   European Community 15108.00
## 13              Finland 12517.01
## 14               France       NA
## 15              Germany 12646.14
## 16               Greece 13757.42
## 17            Hong Kong       NA
## 18              Iceland 12347.00
## 19               Israel       NA
## 20                Italy 12648.40
## 21                Japan 12757.80
## 22              Lebanon 12764.00
## 23            Lithuania 15332.00
## 24                Malta 16996.03
## 25          Netherlands 14420.30
## 26               Norway 12437.98
## 27               Poland 12733.06
## 28             Portugal       NA
## 29                  RSA 12446.00
## 30         Saudi Arabia 12565.00
## 31            Singapore 12744.00
## 32                Spain 12905.37
## 33               Sweden 14697.15
## 34          Switzerland       NA
## 35 United Arab Emirates 14984.59
## 36       United Kingdom       NA
## 37          Unspecified       NA
## 38                  USA 12618.85
barplot(result$x, names.arg=result$Group.1, xlab="Customer ID", ylab="Country", col=rainbow(6),
        main="Customer ID vs Country",border="black")

# Pie chart for Quantity
pie(table(Online_Retail$Quantity), main="Pie chart for Quantity")