Task 1

Read in the ities.csv datafile as a dataframe object, df.

df <- read.csv("ities.csv")

No descriptive answer is needed here.

Task 2

(2 points) Display the count of rows and columns in the dataframe using an appropriate R function. Below the output, identify the count of rows and the count of columns.

dim(df)
## [1] 438128     13

For example, this dataframe has x rows and y columns.

Task 3

(3 points) Use the appropriate R function to display the structure (i.e., number of rows, columns, column names, column data type, some values from each column) of the dataframe, df. Below the output, briefly summarize two main points about the dataframe structure.

str(df)
## 'data.frame':    438128 obs. of  13 variables:
##  $ Date             : chr  "7/18/2016" "7/18/2016" "7/18/2016" "7/18/2016" ...
##  $ OperationType    : chr  "SALE" "SALE" "SALE" "SALE" ...
##  $ CashierName      : chr  "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" ...
##  $ LineItem         : chr  "Salmon and Wheat Bran Salad" "Fountain Drink" "Beef and Squash Kabob" "Salmon and Wheat Bran Salad" ...
##  $ Department       : chr  "Entrees" "Beverage" "Kabobs" "Salad" ...
##  $ Category         : chr  "Salmon and Wheat Bran Salad" "Fountain" "Beef" "general" ...
##  $ RegisterName     : chr  "RT149" "RT149" "RT149" "RT149" ...
##  $ StoreNumber      : chr  "AZ23501305" "AZ23501289" "AZ23501367" "AZ23501633" ...
##  $ TransactionNumber: chr  "002XIIC146121" "002XIIC146121" "00PG9FL135736" "00Z3B4R37335" ...
##  $ CustomerCode     : chr  "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" ...
##  $ Price            : num  66.22 2.88 12.02 18.43 18.43 ...
##  $ Quantity         : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ TotalDue         : num  66.22 2.88 24.04 18.43 18.43 ...

Task 4

(6 points) Is every transaction summarized in one row of the dataframe? Include a code chunk with code that will display some kind of evidence (e.g., number of rows and number of unique transaction numbers) to support your conclusion. Below the code chunk, clearly indicate how the output of your code supports your decision.

num_rows <- nrow(df)
num_unique_transactions <- length(unique(df$TransactionNumber))
c(num_rows,num_unique_transactions)
## [1] 438128 161053

Task 5

(3 points) Display the summaries of the Price, Quantity and TotalDue columns. Below the output, provide a brief interpretation of the output for each column.

summary(df$Price)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -5740.51     4.50    11.29    14.36    14.68 21449.97       12
summary(df$Quantity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.177   1.000 815.000
summary(df$TotalDue)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -5740.51     4.50    11.80    15.25    15.04 21449.97       12

Task 6

(6 points) Display the boxplots of the log values for the Price, Quantity and TotalDue columns. Below the output, provide a brief description of three insights that you see in the boxplots. As part of your description, indicate how the output from task 5 relates to the boxplots in this task.

boxplot(log(df$Price), main="Log of Price", ylab="Log(Price)")
## Warning in log(df$Price): NaNs produced

boxplot(log(df$Quantity), main="Log of Quantity", ylab="Log(Quantity)")

boxplot(log(df$TotalDue), main="Log of TotalDue", ylab="Log(TotalDue)")
## Warning in log(df$TotalDue): NaNs produced