Task 1

Read in the ities.csv datafile as a dataframe object, df.

data.frame <- read.csv('ities.csv', stringsAsFactors = F, header = T)
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
## : EOF within quoted string

No descriptive answer is needed here.

Task 2

(2 points) Display the count of rows and columns in the dataframe using an appropriate R function. Below the output, identify the count of rows and the count of columns.

dim(data.frame)
## [1] 15755    13
your_dataframe <- read.csv('ities.csv')
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
## : EOF within quoted string
rows <- nrow(your_dataframe)
columns <- ncol(your_dataframe)

cat("This dataframe has", rows, "rows and", columns, "columns.")
## This dataframe has 15755 rows and 13 columns.

For example, this dataframe has x rows and y columns.

Task 3

(3 points) Use the appropriate R function to display the structure (i.e., number of rows, columns, column names, column data type, some values from each column) of the dataframe, df. Below the output, briefly summarize two main points about the dataframe structure.

str(data.frame)
## 'data.frame':    15755 obs. of  13 variables:
##  $ Date             : chr  "7/18/2016" "7/18/2016" "7/18/2016" "7/18/2016" ...
##  $ OperationType    : chr  "SALE" "SALE" "SALE" "SALE" ...
##  $ CashierName      : chr  "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" "Wallace Kuiper" ...
##  $ LineItem         : chr  "Salmon and Wheat Bran Salad" "Fountain Drink" "Beef and Squash Kabob" "Salmon and Wheat Bran Salad" ...
##  $ Department       : chr  "Entrees" "Beverage" "Kabobs" "Salad" ...
##  $ Category         : chr  "Salmon and Wheat Bran Salad" "Fountain" "Beef" "general" ...
##  $ RegisterName     : chr  "RT149" "RT149" "RT149" "RT149" ...
##  $ StoreNumber      : chr  "AZ23501305" "AZ23501289" "AZ23501367" "AZ23501633" ...
##  $ TransactionNumber: chr  "002XIIC146121" "002XIIC146121" "00PG9FL135736" "00Z3B4R37335" ...
##  $ CustomerCode     : chr  "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" "CWM11331L8O" ...
##  $ Price            : num  66.22 2.88 12.02 18.43 18.43 ...
##  $ Quantity         : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ TotalDue         : num  66.22 2.88 24.04 18.43 18.43 ...

Task 4

(6 points) Is every transaction summarized in one row of the dataframe? Include a code chunk with code that will display some kind of evidence (e.g., number of rows and number of unique transaction numbers) to support your conclusion. Below the code chunk, clearly indicate how the output of your code supports your decision.

head(data.frame$TransactionNumber)
## [1] "002XIIC146121" "002XIIC146121" "00PG9FL135736" "00Z3B4R37335" 
## [5] "00Z3B4R37335"  "006LUOW47310"

Task 5

(3 points) Display the summaries of the Price, Quantity and TotalDue columns. Below the output, provide a brief interpretation of the output for each column.

#displaying the summary of the price
summary(data.frame$Price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -298.73    4.50   12.02   14.83   14.68 5484.24      13
#displaying the summary of the quantity
summary(data.frame$Quantity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.000   1.000   1.184   1.000 125.000       1
#displaying the summary of the TotalDue columns
summary(data.frame$TotalDue)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -298.73    4.50   12.02   15.93   14.72 5484.24      13

Task 6

(6 points) Display the boxplots of the log values for the Price, Quantity and TotalDue columns. Below the output, provide a brief description of three insights that you see in the boxplots. As part of your description, indicate how the output from task 5 relates to the boxplots in this task.

#BPlot P
boxplot (log10(data.frame$Price))
## Warning in boxplot(log10(data.frame$Price)): NaNs produced

#BPlot Q
boxplot (log10(data.frame$Quantity))

#BPlot_TD
boxplot (log10(data.frame$TotalDue))
## Warning in boxplot(log10(data.frame$TotalDue)): NaNs produced