Notebook Instructions


Load Packages in R/RStudio

We are going to use tidyverse a collection of R packages designed for data science.

Loading required package: tidyverse
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 2.2.1     v purrr   0.2.4
v tibble  1.4.2     v dplyr   0.7.4
v tidyr   0.8.0     v stringr 1.2.0
v readr   1.1.1     v forcats 0.2.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
Loading required package: gridExtra

Attaching package: <U+393C><U+3E31>gridExtra<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:

    combine

Task 1: Quantitative Analysis


1A) Read the csv file into R Studio and display the dataset.

  • Name your dataset ‘mydata’ so it easy to work with.

  • Commands: read_csv() head() max() min() var() sd()

Extract the assigned features (columns) to perform some analytics.

mydata <- read_csv("data/Advertising.csv")
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
  X1 = col_integer(),
  TV = col_double(),
  radio = col_double(),
  newspaper = col_double(),
  sales = col_double()
)
head(mydata)

Change the variable name “X1” to case_number using the function rename()

  • mydata <- rename(mydata, “NEW_VAR_NAME” = “OLD_VAR_NAME”)
mydata <- rename(mydata, "case_number" = "X1")
Error: `X1` contains unknown variables

1B) Find the range ( difference between min and max ), min, max, standard deviation and variance for each assigned feature ( Use separate chunks for each feature ). Compare each feature and note any significant differences

SALES

sales <- mydata$sales
#variable_max
sales_max <- max(mydata$sales)
sales_max
[1] 27
#variable_min
sales_min <- min(mydata$sales)
sales_min
[1] 1.6
#variable_Range max-min
sales_Range <- sales_max - sales_min
sales_Range
[1] 25.4
#variable_mean 
sales_mean <- mean(mydata$sales)
sales_mean
[1] 14.0225
#variable_sd Standard Deviation
sales_sd <- sd(mydata$sales)
sales_sd
[1] 5.217457
#variable_variance
sales_variance <- var(mydata$sales)
sales_variance
[1] 27.22185

TV

tv <- mydata$TV
#variable_max
tv_max <- max(mydata$TV)
tv_max
[1] 296.4
#variable_min
tv_min <- min(mydata$TV)
tv_min
[1] 0.7
#variable_Range max-min
tv_Range <- tv_max - tv_min
tv_Range
[1] 295.7
#variable_mean 
tv_mean <- mean(mydata$TV)
tv_mean
[1] 147.0425
#variable_sd Standard Deviation
tv_sd <- sd(mydata$TV)
tv_sd
[1] 85.85424
#variable_variance
tv_variance <- var(mydata$TV)
tv_variance
[1] 7370.95

Radio

radio <- mydata$radio
#variable_max
radio_max <- max(mydata$radio)
radio_max
[1] 49.6
#variable_min
radio_min <- min(mydata$radio)
radio_min
[1] 0
#variable_Range max-min
radio_Range <- radio_max - radio_min
radio_Range
[1] 49.6
#variable_mean 
radio_mean <- mean(mydata$radio)
radio_mean
[1] 23.264
#variable_sd Standard Deviation
radio_sd <- sd(mydata$radio)
radio_sd
[1] 14.84681
#variable_variance
radio_variance <- var(mydata$radio)
radio_variance
[1] 220.4277

newspaper

newspaper <- mydata$newspaper
#variable_max
newspaper_max <- max(mydata$newspaper)
newspaper_max
[1] 114
#variable_min
newspaper_min <- min(mydata$newspaper)
newspaper_min
[1] 0.3
#variable_Range max-min
newspaper_Range <- newspaper_max - newspaper_min
newspaper_Range
[1] 113.7
#variable_mean 
newspaper_mean <- mean(mydata$newspaper)
newspaper_mean
[1] 30.554
#variable_sd Standard Deviation
newspaper_sd <- sd(mydata$newspaper)
newspaper_sd
[1] 21.77862
#variable_variance
newspaper_variance <- var(mydata$newspaper)
newspaper_variance
[1] 474.3083

1C) Use the summary() function on all the dataset to give you a general description of the data. Note any differences between features.

summary(mydata)
  case_number           TV             radio          newspaper          sales      
 Min.   :  1.00   Min.   :  0.70   Min.   : 0.000   Min.   :  0.30   Min.   : 1.60  
 1st Qu.: 50.75   1st Qu.: 74.38   1st Qu.: 9.975   1st Qu.: 12.75   1st Qu.:10.38  
 Median :100.50   Median :149.75   Median :22.900   Median : 25.75   Median :12.90  
 Mean   :100.50   Mean   :147.04   Mean   :23.264   Mean   : 30.55   Mean   :14.02  
 3rd Qu.:150.25   3rd Qu.:218.82   3rd Qu.:36.525   3rd Qu.: 45.10   3rd Qu.:17.40  
 Max.   :200.00   Max.   :296.40   Max.   :49.600   Max.   :114.00   Max.   :27.00  

Are there any outliers, if not explain the lack of outliers? if any explain what the outliers represent and how many records are outliers? ( Use code from notebook-03 to find outliers)

quantile(sales, na.rm = TRUE)
    0%    25%    50%    75%   100% 
 1.600 10.375 12.900 17.400 27.000 
quantile(tv, na.rm = TRUE)
     0%     25%     50%     75%    100% 
  0.700  74.375 149.750 218.825 296.400 
quantile(radio, na.rm = TRUE)
    0%    25%    50%    75%   100% 
 0.000  9.975 22.900 36.525 49.600 
quantile(newspaper, na.rm = TRUE)
    0%    25%    50%    75%   100% 
  0.30  12.75  25.75  45.10 114.00 
lowerqsales = quantile(sales, na.rm = TRUE)[2]
lowerqtv = quantile(tv, na.rm = TRUE)[2]
lowerqradio = quantile(radio, na.rm = TRUE)[2]
lowerqnewspaper = quantile(newspaper, na.rm = TRUE)[2]
upperqsales = quantile(sales, na.rm = TRUE)[4]
upperqtv = quantile(tv, na.rm = TRUE)[4]
upperqradio = quantile(radio, na.rm = TRUE)[4]
upperqnewspaper = quantile(newspaper, na.rm = TRUE)[4]
salesiqr = upperqsales - lowerqsales
salesiqr
  75% 
7.025 
tviqr = upperqtv - lowerqtv
tviqr
   75% 
144.45 
radioiqr = upperqradio - lowerqradio
radioiqr
  75% 
26.55 
newspaperiqr = upperqnewspaper - lowerqnewspaper
newspaperiqr
  75% 
32.35 
upper_threshold_sales = (salesiqr * 1.5) + upperqsales
upper_threshold_sales
    75% 
27.9375 
upper_threshold_tv = (tviqr * 1.5) + upperqtv
upper_threshold_tv
  75% 
435.5 
upper_threshold_radio = (radioiqr * 1.5) + upperqradio
upper_threshold_radio
  75% 
76.35 
upper_threshold_newspaper = (newspaperiqr * 1.5) + upperqnewspaper
upper_threshold_newspaper
   75% 
93.625 
lower_threshold_sales = lowerqsales - (salesiqr * 1.5) 
lower_threshold_sales
    25% 
-0.1625 
lower_threshold_tv = lowerqtv - (tviqr * 1.5)
lower_threshold_tv
   25% 
-142.3 
lower_threshold_radio = lowerqradio - (radioiqr * 1.5)
lower_threshold_radio
   25% 
-29.85 
lower_threshold_newspaper = lowerqnewspaper - (newspaperiqr * 1.5)
lower_threshold_newspaper
    25% 
-35.775 
sales[ sales > upper_threshold_sales][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
radio[ radio > upper_threshold_radio][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
tv[ tv > upper_threshold_tv][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
newspaper[ newspaper > upper_threshold_newspaper][1:10]
 [1] 114.0 100.9    NA    NA    NA    NA    NA    NA    NA    NA
sales[ sales < lower_threshold_sales][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
tv[ tv < lower_threshold_tv][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
radio[ radio < lower_threshold_radio][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
newspaper[ newspaper < lower_threshold_newspaper][1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
mydata[ sales > upper_threshold_sales, ]
mydata[ tv > upper_threshold_tv, ]
mydata[ radio > upper_threshold_radio, ]
mydata[ newspaper > upper_threshold_newspaper, ]
mydata[ sales < lower_threshold_sales, ]
mydata[ tv < lower_threshold_tv, ]
mydata[ radio < lower_threshold_radio, ]
mydata[ newspaper < lower_threshold_newspaper, ]
count(mydata[ sales > upper_threshold_sales, ])
count(mydata[ tv > upper_threshold_tv, ])
count(mydata[ radio > upper_threshold_radio, ])
count(mydata[ newspaper > upper_threshold_newspaper, ])
count(mydata[ sales < lower_threshold_sales, ])
count(mydata[ tv < lower_threshold_tv, ])
count(mydata[ radio < lower_threshold_radio, ])
count(mydata[ newspaper < lower_threshold_newspaper, ])

There are no outliers for sales, tv, or radio. However, there are 2 outliers above the upperthreshold of Newspaper. This means there are 2 points that are far away from the majority of the other points.

1D) Write a general description of the dataset using the statistics found in the steps above. Use the min,max range to compare the features, note any significant differences.

Tv had the highest maximum of 296.4 followed by newspaper, then radio, then sales of 27. Radio had a minimum of 0, newspaper 0.3, tv 0.7, and sales 1.6. This shows that most money for advertising was spent on TV which makes sense because it is an easy way to reach people today.


Task 2: Qualitative Analysis


2A) Plot all the assigned features as y-axis for x-axis use case_number. Use the given commands to create each plot and create a grid to plot all features Note any trends/patters in the data

  • Commands: VARIABLE_plot <- ggplot(data = mydata, aes(x = VARIABLE, y = VARIABLE)) + geom_point()
  • Commands: grid.arrange(VARIABLE_plot1, VARIABLE_plot2, VARIABLE_plot3, VARIABLE_plot4, ncol=2)
sales_plot <- ggplot(data = mydata, aes(x = case_number, y = sales)) + geom_point()
tv_plot <- ggplot(data = mydata, aes(x = case_number, y = tv)) + geom_point()
radio_plot <- ggplot(data = mydata, aes(x = case_number, y = radio)) + geom_point()
newspaper_plot <- ggplot(data = mydata, aes(x = case_number, y = newspaper)) + geom_point()
grid.arrange(sales_plot, tv_plot, radio_plot, newspaper_plot, ncol=2)

  • When looking at these plots it is hard to see a particular trend.
  • One way to observe any possible trend in the sales data would be to re-order the data from low to high.
  • The 200 months observations are in no particular chronological time sequence.
  • The case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.

2B) Re-order sales from low to high, and save re-ordered data in a new set. As sales data is re-reorded associated other column fields follow.

  • Commands: newdata <- mydata[ order(mydata$VARIABLE), ]
# Extract case_number from the newdata
newdata = mydata[ order(mydata$sales), ]
case_number <- newdata$case_number
head(newdata)

Extract the variables from the new data

# new_VARIABLE = newdata$VARIABLE
new_sales = newdata$sales
new_tv = newdata$TV
new_radio = newdata$radio
new_newspaper = newdata$newspaper

# It is much easier to see the increasing trend in sales and tv with the new order. Radio has a bit more of a identifiable pattern, but newspaper does not.

Task 3: Standardized Z-Value


3A) Create a histogram of the assigned feature z-scores. Describe the output note any relevant values.

  • Command: z_score = ( VARIABLE - mean(VARIABLE) ) / sd(VARIABLE)
  • Commands: qplot( x = VARIABLE ,geom=“histogram”, binwidth = 0.3)
z_score_sales = ( sales - mean(sales) ) / sd(sales)
z_score_tv = ( tv - mean(tv) ) / sd(tv)
z_score_radio = ( radio - mean(radio) ) / sd(radio)
z_score_newspaper = ( newspaper - mean(newspaper) ) / sd(newspaper)
qplot( x = z_score_sales ,geom="histogram", binwidth = 0.3)

qplot( x = z_score_tv ,geom="histogram", binwidth = 0.3)

qplot( x = z_score_radio ,geom="histogram", binwidth = 0.3)

qplot( x = z_score_newspaper ,geom="histogram", binwidth = 0.3)

Sales is the most normally distributed. Radio and TV are both seem to vary throughout the graph. Newspaper is right skewed which makes sense since it was the only variable with ooutliers.

3B) Given a sales value of $26700, calculate the corresponding z-value or z-score.

  • Command: z_score = ( VARIABLE - mean(VARIABLE) ) / sd(VARIABLE)
x = 26.7
 z_score_calculatecor = ( x - mean(sales) ) / sd(sales)
 z_score_calculatecor
[1] 2.429824

3C) Based on the z-value, how would you rate a $26700 sales value: poor, average, good, or very good performance? Explain your logic.

I think this is a good performance because since it’s a positive z-score it will bring about a profit and it will do the same the next time which is good for business.

---
title: "Descriptive Analytics"
author: "Ashley Krenz"
date: "February 26, 2018"
output:
  html_notebook: default
  pdf_document: default
subtitle: CME Group Foundation Business Analytics Lab
---

-------------

## Notebook Instructions

-------------

* For your assignment you may be using different dataset than what is included here. 

* Always read carefully the instructions on Sakai.  

* Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their section.

### Load Packages in R/RStudio 

We are going to use tidyverse a collection of R packages designed for data science. 

* Info: https://www.tidyverse.org/

```{r, echo=FALSE}

# Here we are checking if the package is installed
if(!require(tidyverse)){
  
  # If the package is not in the system then it will be install
  install.packages("tidyverse", dependencies = TRUE)
  
  # Here we are loading the package
  library(tidyverse)
}

if(!require(gridExtra)){
  
  # If the package is not in the system then it will be install
  install.packages("gridExtra", dependencies = TRUE)
  
  # Here we are loading the package
  library(gridExtra)
}

```

-------------

## Task 1: Quantitative Analysis

-------------

### 1A) Read the csv file into R Studio and display the dataset. 

* Name your dataset 'mydata' so it easy to work with.

* Commands: read_csv() head() max() min() var() sd()

#### Extract the assigned features (columns) to perform some analytics. 

```{r} 
mydata <- read_csv("data/Advertising.csv")
```
```{r}
head(mydata)
```


#### Change the variable name "X1" to case_number using the function rename()

* mydata <- rename(mydata, "NEW_VAR_NAME" = "OLD_VAR_NAME")

```{r} 
mydata <- rename(mydata, "case_number" = "X1")
```

### 1B) Find the range ( difference between min and max ), min, max, standard deviation and variance for each assigned feature ( Use separate chunks for each feature ). Compare each feature and note any significant differences

**SALES**
```{r}
sales <- mydata$sales
#variable_max
sales_max <- max(mydata$sales)
sales_max
#variable_min
sales_min <- min(mydata$sales)
sales_min
#variable_Range max-min
sales_Range <- sales_max - sales_min
sales_Range
#variable_mean 
sales_mean <- mean(mydata$sales)
sales_mean
#variable_sd Standard Deviation
sales_sd <- sd(mydata$sales)
sales_sd
#variable_variance
sales_variance <- var(mydata$sales)
sales_variance
```

**TV**
```{r}
tv <- mydata$TV
#variable_max
tv_max <- max(mydata$TV)
tv_max
#variable_min
tv_min <- min(mydata$TV)
tv_min
#variable_Range max-min
tv_Range <- tv_max - tv_min
tv_Range
#variable_mean 
tv_mean <- mean(mydata$TV)
tv_mean
#variable_sd Standard Deviation
tv_sd <- sd(mydata$TV)
tv_sd
#variable_variance
tv_variance <- var(mydata$TV)
tv_variance
```

**Radio**
```{r}
radio <- mydata$radio
#variable_max
radio_max <- max(mydata$radio)
radio_max
#variable_min
radio_min <- min(mydata$radio)
radio_min
#variable_Range max-min
radio_Range <- radio_max - radio_min
radio_Range
#variable_mean 
radio_mean <- mean(mydata$radio)
radio_mean
#variable_sd Standard Deviation
radio_sd <- sd(mydata$radio)
radio_sd
#variable_variance
radio_variance <- var(mydata$radio)
radio_variance
```

**newspaper**
```{r}
newspaper <- mydata$newspaper
#variable_max
newspaper_max <- max(mydata$newspaper)
newspaper_max
#variable_min
newspaper_min <- min(mydata$newspaper)
newspaper_min
#variable_Range max-min
newspaper_Range <- newspaper_max - newspaper_min
newspaper_Range
#variable_mean 
newspaper_mean <- mean(mydata$newspaper)
newspaper_mean
#variable_sd Standard Deviation
newspaper_sd <- sd(mydata$newspaper)
newspaper_sd
#variable_variance
newspaper_variance <- var(mydata$newspaper)
newspaper_variance
```

### 1C) Use the summary() function on all the dataset to give you a general description of the data. Note any differences between features.

```{r}
summary(mydata)
```

#### Are there any outliers, if not explain the lack of outliers? if any explain what the outliers represent and how many records are outliers? ( Use code from notebook-03 to find outliers) 



```{r}
quantile(sales, na.rm = TRUE)
quantile(tv, na.rm = TRUE)
quantile(radio, na.rm = TRUE)
quantile(newspaper, na.rm = TRUE)
```

```{r}
lowerqsales = quantile(sales, na.rm = TRUE)[2]
lowerqtv = quantile(tv, na.rm = TRUE)[2]
lowerqradio = quantile(radio, na.rm = TRUE)[2]
lowerqnewspaper = quantile(newspaper, na.rm = TRUE)[2]
```

```{r}
upperqsales = quantile(sales, na.rm = TRUE)[4]
upperqtv = quantile(tv, na.rm = TRUE)[4]
upperqradio = quantile(radio, na.rm = TRUE)[4]
upperqnewspaper = quantile(newspaper, na.rm = TRUE)[4]
```

```{r}
salesiqr = upperqsales - lowerqsales
salesiqr
tviqr = upperqtv - lowerqtv
tviqr
radioiqr = upperqradio - lowerqradio
radioiqr
newspaperiqr = upperqnewspaper - lowerqnewspaper
newspaperiqr
```

```{r}
upper_threshold_sales = (salesiqr * 1.5) + upperqsales
upper_threshold_sales
upper_threshold_tv = (tviqr * 1.5) + upperqtv
upper_threshold_tv
upper_threshold_radio = (radioiqr * 1.5) + upperqradio
upper_threshold_radio
upper_threshold_newspaper = (newspaperiqr * 1.5) + upperqnewspaper
upper_threshold_newspaper

lower_threshold_sales = lowerqsales - (salesiqr * 1.5) 
lower_threshold_sales
lower_threshold_tv = lowerqtv - (tviqr * 1.5)
lower_threshold_tv
lower_threshold_radio = lowerqradio - (radioiqr * 1.5)
lower_threshold_radio
lower_threshold_newspaper = lowerqnewspaper - (newspaperiqr * 1.5)
lower_threshold_newspaper
```

```{r}
sales[ sales > upper_threshold_sales][1:10]
radio[ radio > upper_threshold_radio][1:10]
tv[ tv > upper_threshold_tv][1:10]
newspaper[ newspaper > upper_threshold_newspaper][1:10]

sales[ sales < lower_threshold_sales][1:10]
tv[ tv < lower_threshold_tv][1:10]
radio[ radio < lower_threshold_radio][1:10]
newspaper[ newspaper < lower_threshold_newspaper][1:10]
```

```{r}
mydata[ sales > upper_threshold_sales, ]
mydata[ tv > upper_threshold_tv, ]
mydata[ radio > upper_threshold_radio, ]
mydata[ newspaper > upper_threshold_newspaper, ]
```

```{r}
mydata[ sales < lower_threshold_sales, ]
mydata[ tv < lower_threshold_tv, ]
mydata[ radio < lower_threshold_radio, ]
mydata[ newspaper < lower_threshold_newspaper, ]
```


```{r}
count(mydata[ sales > upper_threshold_sales, ])
```

```{r}
count(mydata[ tv > upper_threshold_tv, ])
```
```{r}
count(mydata[ radio > upper_threshold_radio, ])
```
```{r}
count(mydata[ newspaper > upper_threshold_newspaper, ])
```


```{r}
count(mydata[ sales < lower_threshold_sales, ])
```
```{r}
count(mydata[ tv < lower_threshold_tv, ])
```
```{r}
count(mydata[ radio < lower_threshold_radio, ])
```

```{r}
count(mydata[ newspaper < lower_threshold_newspaper, ])
```
# There are no outliers for sales, tv, or radio. However, there are 2 outliers above the upperthreshold of Newspaper. This means there are 2 points that are far away from the majority of the other points. 

### 1D) Write a general description of the dataset using the statistics found in the steps above. Use the min,max range to compare the features, note any significant differences.

# Tv had the highest maximum of 296.4 followed by newspaper, then radio, then sales of 27. Radio had a minimum of 0, newspaper 0.3, tv 0.7, and sales 1.6. This shows that most money for advertising was spent on TV which makes sense because it is an easy way to reach people today. 

-------------

## Task 2: Qualitative Analysis

-------------


### 2A) Plot all the assigned features as y-axis for x-axis use case_number. Use the given commands to create each plot and create a grid to plot all features Note any trends/patters in the data

* Commands: VARIABLE_plot <- ggplot(data = mydata, aes(x = VARIABLE, y = VARIABLE)) + geom_point()
* Commands: grid.arrange(VARIABLE_plot1, VARIABLE_plot2, VARIABLE_plot3, VARIABLE_plot4, ncol=2)

```{r}
sales_plot <- ggplot(data = mydata, aes(x = case_number, y = sales)) + geom_point()
tv_plot <- ggplot(data = mydata, aes(x = case_number, y = tv)) + geom_point()
radio_plot <- ggplot(data = mydata, aes(x = case_number, y = radio)) + geom_point()
newspaper_plot <- ggplot(data = mydata, aes(x = case_number, y = newspaper)) + geom_point()
grid.arrange(sales_plot, tv_plot, radio_plot, newspaper_plot, ncol=2)
```


* When looking at these plots it is hard to see a particular trend. 
* One way to observe any possible trend in the sales data would be to re-order the data from low to high. 
* The 200 months observations are in no particular chronological time sequence. 
* The case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them. 


### 2B) Re-order sales from low to high, and save re-ordered data in a new set. As sales data is re-reorded associated other column fields follow.

* Commands: newdata <- mydata[ order(mydata$VARIABLE), ]

```{r}

# Extract case_number from the newdata
newdata = mydata[ order(mydata$sales), ]
case_number <- newdata$case_number

head(newdata)
```


#### Extract the variables from the new data

```{r}
# new_VARIABLE = newdata$VARIABLE
new_sales = newdata$sales
new_tv = newdata$TV
new_radio = newdata$radio
new_newspaper = newdata$newspaper
```

### 2C) Repeat the 4 graphs with the newdata to spot any trends. Note your observations on what the new plots are revealing in terms of trending relationship. 

* Commands: VARIABLE_plot <- ggplot(data = mydata, aes(x = VARIABLE, y = VARIABLE)) + geom_point()
* Commands: For x variable in the plot use: aes(x = case_number[order(case_number)])
* Commands: grid.arrange(VARIABLE_plot1, VARIABLE_plot2, VARIABLE_plot3, VARIABLE_plot4, ncol=2)

```{r}
newsales_plot <- ggplot(data = newdata, aes(x = case_number[order(case_number)], y = new_sales)) + geom_point()
newtv_plot <- ggplot(data = newdata, aes(x = case_number[order(case_number)], y = new_tv)) + geom_point()
newradio_plot <- ggplot(data = newdata, aes(x = case_number[order(case_number)], y = new_radio)) + geom_point()
newnewspaper_plot <- ggplot(data = newdata, aes(x = case_number[order(case_number)], y = new_newspaper)) + geom_point()
grid.arrange(newsales_plot, newtv_plot, newradio_plot, newnewspaper_plot, ncol=2)
```
# It is much easier to see the increasing trend in sales and tv with the new order. Radio has a bit more of a identifiable pattern, but newspaper does not. 
----------

## Task 3: Standardized Z-Value

----------


### 3A) Create a histogram of the assigned feature z-scores. Describe the output note any relevant values.

* Command: z_score = ( VARIABLE - mean(VARIABLE) ) / sd(VARIABLE)
* Commands: qplot( x = VARIABLE ,geom="histogram", binwidth = 0.3)

```{r}
z_score_sales = ( sales - mean(sales) ) / sd(sales)
z_score_tv = ( tv - mean(tv) ) / sd(tv)
z_score_radio = ( radio - mean(radio) ) / sd(radio)
z_score_newspaper = ( newspaper - mean(newspaper) ) / sd(newspaper)
```

```{r}
qplot( x = z_score_sales ,geom="histogram", binwidth = 0.3)
```
```{r}
qplot( x = z_score_tv ,geom="histogram", binwidth = 0.3)
```
```{r}
qplot( x = z_score_radio ,geom="histogram", binwidth = 0.3)
```
```{r}
qplot( x = z_score_newspaper ,geom="histogram", binwidth = 0.3)
```
# Sales is the most normally distributed. Radio and TV are both seem to vary throughout the graph. Newspaper is right skewed which makes sense since it was the only variable with ooutliers.  

### 3B) Given a sales value of $26700, calculate the corresponding z-value or z-score. 

* Command: z_score = ( VARIABLE - mean(VARIABLE) ) / sd(VARIABLE)

```{r}
x = 26.7
 z_score_calculatecor = ( x - mean(sales) ) / sd(sales)
 z_score_calculatecor
```


### 3C) Based on the z-value, how would you rate a $26700 sales value: poor, average, good, or very good performance? Explain your logic. 

# I think this is a good performance because since it's a positive z-score it will bring about a profit and it will do the same the next time which is good for business. 


