HW2

Task 1

Read in the ities.csv datafile as a dataframe object, df.

df <- read.csv('ities.csv')

No descriptive answer is needed here.

Task 2

(2 points) Display the count of rows and columns in the dataframe using an appropriate R function. Below the output, identify the count of rows and the count of columns.

nrow(df)

## [1] 438128

ncol(df)

## [1] 13

This dataframe has 438128 rows and 13 columns.

Task 3

(3 points) Use the appropriate R function to display the structure (i.e., number of rows, columns, column names, column data type, some values from each column) of the dataframe, df. Below the output, briefly summarize two main points about the dataframe structure.

summary(df)

##      Date           OperationType      CashierName          LineItem        
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Department          Category         RegisterName       StoreNumber       
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  TransactionNumber  CustomerCode           Price             Quantity      
##  Length:438128      Length:438128      Min.   :-5740.51   Min.   :  1.000  
##  Class :character   Class :character   1st Qu.:    4.50   1st Qu.:  1.000  
##  Mode  :character   Mode  :character   Median :   11.29   Median :  1.000  
##                                        Mean   :   14.36   Mean   :  1.177  
##                                        3rd Qu.:   14.68   3rd Qu.:  1.000  
##                                        Max.   :21449.97   Max.   :815.000  
##                                        NA's   :12                          
##     TotalDue       
##  Min.   :-5740.51  
##  1st Qu.:    4.50  
##  Median :   11.80  
##  Mean   :   15.25  
##  3rd Qu.:   15.04  
##  Max.   :21449.97  
##  NA's   :12

str(df)

## 'data.frame':    438128 obs. of  13 variables:
##  $ Date             : chr  "7/18/2016" "7/18/2016" "7/18/2016" "7/18/2016" ...
##  $ OperationType    : chr  "SALE" "SALE" "SALE" "SALE" ...
##  $ CashierName      : chr  "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" ...
##  $ LineItem         : chr  "Salmon and Wheat Bran Salad" "Fountain Drink" "Beef and Squash Kabob" "Salmon and Wheat Bran Salad" ...
##  $ Department       : chr  "Entrees" "Beverage" "Kabobs" "Salad" ...
##  $ Category         : chr  "Salmon and Wheat Bran Salad" "Fountain" "Beef" "general" ...
##  $ RegisterName     : chr  "RT149" "RT149" "RT149" "RT149" ...
##  $ StoreNumber      : chr  "AZ23501305" "AZ23501289" "AZ23501367" "AZ23501633" ...
##  $ TransactionNumber: chr  "002XIIC146121" "002XIIC146121" "00PG9FL135736" "00Z3B4R37335" ...
##  $ CustomerCode     : chr  "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" ...
##  $ Price            : num  66.22 2.88 12.02 18.43 18.43 ...
##  $ Quantity         : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ TotalDue         : num  66.22 2.88 24.04 18.43 18.43 ...

This summary shows that the range of prices goes from a first quartile of $4.50 to third quartile of $14.68. It however also includes negative price values that reflect money going out for purchasing services like “catering.” ## Task 4

(6 points) Is every transaction summarized in one row of the dataframe? Include a code chunk with code that will display some kind of evidence (e.g., number of rows and number of unique transaction numbers) to support your conclusion. Below the code chunk, clearly indicate how the output of your code supports your decision.

In the summary default shows each column and what type of data is in each column.

# Print the first six rows
summary.default(df)

##                   Length Class  Mode     
## Date              438128 -none- character
## OperationType     438128 -none- character
## CashierName       438128 -none- character
## LineItem          438128 -none- character
## Department        438128 -none- character
## Category          438128 -none- character
## RegisterName      438128 -none- character
## StoreNumber       438128 -none- character
## TransactionNumber 438128 -none- character
## CustomerCode      438128 -none- character
## Price             438128 -none- numeric  
## Quantity          438128 -none- numeric  
## TotalDue          438128 -none- numeric

length(unique(df))

## [1] 13

nrow(df)

## [1] 438128

ncol(df)

## [1] 13

nrow(df)*ncol(df)

## [1] 5695664

#This display shows a summary of what type of data is in each column. the number of rows was 438,128, number of columns was 13, and total transacions 5,695,664 - as the product of rows x columns.

Task 5

(3 points) Display the summaries of the Price, Quantity and TotalDue columns. Below the output, provide a brief interpretation of the output for each column.

summary (df,Price)

##      Date           OperationType      CashierName          LineItem        
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Department          Category         RegisterName       StoreNumber       
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  TransactionNumber  CustomerCode           Price             Quantity      
##  Length:438128      Length:438128      Min.   :-5740.51   Min.   :  1.000  
##  Class :character   Class :character   1st Qu.:    4.50   1st Qu.:  1.000  
##  Mode  :character   Mode  :character   Median :   11.29   Median :  1.000  
##                                        Mean   :   14.36   Mean   :  1.177  
##                                        3rd Qu.:   14.68   3rd Qu.:  1.000  
##                                        Max.   :21449.97   Max.   :815.000  
##                                        NA's   :12                          
##     TotalDue       
##  Min.   :-5740.51  
##  1st Qu.:    4.50  
##  Median :   11.80  
##  Mean   :   15.25  
##  3rd Qu.:   15.04  
##  Max.   :21449.97  
##  NA's   :12

summary (df,Quantity)

##      Date           OperationType      CashierName          LineItem        
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Department          Category         RegisterName       StoreNumber       
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  TransactionNumber  CustomerCode           Price             Quantity      
##  Length:438128      Length:438128      Min.   :-5740.51   Min.   :  1.000  
##  Class :character   Class :character   1st Qu.:    4.50   1st Qu.:  1.000  
##  Mode  :character   Mode  :character   Median :   11.29   Median :  1.000  
##                                        Mean   :   14.36   Mean   :  1.177  
##                                        3rd Qu.:   14.68   3rd Qu.:  1.000  
##                                        Max.   :21449.97   Max.   :815.000  
##                                        NA's   :12                          
##     TotalDue       
##  Min.   :-5740.51  
##  1st Qu.:    4.50  
##  Median :   11.80  
##  Mean   :   15.25  
##  3rd Qu.:   15.04  
##  Max.   :21449.97  
##  NA's   :12

summary (df,TotalDue)

##      Date           OperationType      CashierName          LineItem        
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Department          Category         RegisterName       StoreNumber       
##  Length:438128      Length:438128      Length:438128      Length:438128     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  TransactionNumber  CustomerCode           Price             Quantity      
##  Length:438128      Length:438128      Min.   :-5740.51   Min.   :  1.000  
##  Class :character   Class :character   1st Qu.:    4.50   1st Qu.:  1.000  
##  Mode  :character   Mode  :character   Median :   11.29   Median :  1.000  
##                                        Mean   :   14.36   Mean   :  1.177  
##                                        3rd Qu.:   14.68   3rd Qu.:  1.000  
##                                        Max.   :21449.97   Max.   :815.000  
##                                        NA's   :12                          
##     TotalDue       
##  Min.   :-5740.51  
##  1st Qu.:    4.50  
##  Median :   11.80  
##  Mean   :   15.25  
##  3rd Qu.:   15.04  
##  Max.   :21449.97  
##  NA's   :12

#The price ranges from -5,740.51 to $21,449, but clustered between 4.50 to 14.68. #The quanity alo ranges from 1.0 to 815, but almost enitrely <2. #The total due is a reflection of the price.

Task 6

(6 points) Display the boxplots of the log values for the Price, Quantity and TotalDue columns. Below the output, provide a brief description of three insights that you see in the boxplots. As part of your description, indicate how the output from task 5 relates to the boxplots in this task.

boxplot(log10(df$Price))

## Warning in boxplot(log10(df$Price)): NaNs produced

boxplot(log10(df$Quantity))

boxplot(log10(df$TotalDue))

## Warning in boxplot(log10(df$TotalDue)): NaNs produced

The boxplots for price and total output are similar. This is reflects what was mentioned in Question 5. The bloxplot for Quantity also is consistent with question 5 as it shows that there is a wide range with significant outlier, but the overwhelming quantities are <2. This is why there is a solid line all the way at the botttom.

HW2

Miguel Alampay

8/27/2024

Task 1

Task 2

Task 3

Task 5

Task 6