Data sets: http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list): datasets.csv

“Stat2Data”,“AppleStock”,“Daily Price and Volume of Apple Stock”,66,4,0,0,1,0,3, “https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Stat2Data/AppleStock.csv”, “https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/Stat2Data/AppleStock.html

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes:

library(readr) 
apple.data <-read_delim(file="AppleStock.csv",  delim=',') 
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   Date = col_character(),
##   Price = col_double(),
##   Change = col_double(),
##   Volume = col_double()
## )
apple.data 
## # A tibble: 66 x 5
##       X1 Date      Price Change Volume
##    <dbl> <chr>     <dbl>  <dbl>  <dbl>
##  1     1 7/21/2016  99.4  NA      32.7
##  2     2 7/22/2016  98.7  -0.77   28.2
##  3     3 7/25/2016  97.3  -1.32   40.3
##  4     4 7/26/2016  96.7  -0.67   53.5
##  5     5 7/27/2016 103.    6.28   92.1
##  6     6 7/28/2016 104.    1.39   38.8
##  7     7 7/29/2016 104.   -0.13   27.7
##  8     8 8/1/2016  106.    1.84   38.0
##  9     9 8/2/2016  104.   -1.57   33.8
## 10    10 8/3/2016  106.    1.31   30.1
## # … with 56 more rows
summary(apple.data)
##        X1            Date               Price            Change       
##  Min.   : 1.00   Length:66          Min.   : 96.67   Min.   :-2.8400  
##  1st Qu.:17.25   Class :character   1st Qu.:106.75   1st Qu.:-0.4600  
##  Median :33.50   Mode  :character   Median :108.97   Median : 0.0000  
##  Mean   :33.50                      Mean   :109.75   Mean   : 0.2677  
##  3rd Qu.:49.75                      3rd Qu.:113.58   3rd Qu.: 0.8600  
##  Max.   :66.00                      Max.   :117.63   Max.   : 6.2800  
##                                                      NA's   :1        
##      Volume      
##  Min.   : 18.65  
##  1st Qu.: 25.03  
##  Median : 29.66  
##  Mean   : 35.72  
##  3rd Qu.: 37.89  
##  Max.   :111.19  
## 

-> Mean and Median of Attribute Price:

round(mean(apple.data$Price), digits=2)
## [1] 109.75
round(median(apple.data$Price), digits=2)
## [1] 108.97

-> Mean and Median of Attribute Volume:

round(mean(apple.data$Volume), digits=2)
## [1] 35.72
round(median(apple.data$Volume), digits=2)
## [1] 29.66

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it:

-> Creating asubset having only August 2016 Data in it ; and selected columns:

apple.data.august2016 <- subset(apple.data, grepl("^8", apple.data$Date) & grepl("2016$", apple.data$Date)  , select = c("X1","Date","Price","Change"), drop = FALSE )

apple.data.august2016
## # A tibble: 23 x 4
##       X1 Date      Price Change
##    <dbl> <chr>     <dbl>  <dbl>
##  1     8 8/1/2016   106.   1.84
##  2     9 8/2/2016   104.  -1.57
##  3    10 8/3/2016   106.   1.31
##  4    11 8/4/2016   106.   0.08
##  5    12 8/5/2016   107.   1.61
##  6    13 8/8/2016   108.   0.89
##  7    14 8/9/2016   109.   0.44
##  8    15 8/10/2016  108   -0.81
##  9    16 8/11/2016  108.  -0.07
## 10    17 8/12/2016  108.   0.25
## # … with 13 more rows

3. Create new column names for the new data frame:

-> Renaming the column names

names(apple.data.august2016) <- c("X1"= "Id", "Date" ="MarketDate","Price" = "StockPrice", "Change" ="PriceChange")
apple.data.august2016
## # A tibble: 23 x 4
##       Id MarketDate StockPrice PriceChange
##    <dbl> <chr>           <dbl>       <dbl>
##  1     8 8/1/2016         106.        1.84
##  2     9 8/2/2016         104.       -1.57
##  3    10 8/3/2016         106.        1.31
##  4    11 8/4/2016         106.        0.08
##  5    12 8/5/2016         107.        1.61
##  6    13 8/8/2016         108.        0.89
##  7    14 8/9/2016         109.        0.44
##  8    15 8/10/2016        108        -0.81
##  9    16 8/11/2016        108.       -0.07
## 10    17 8/12/2016        108.        0.25
## # … with 13 more rows

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare:

-> USe of summary on the new data set:

summary(apple.data.august2016)
##        Id        MarketDate          StockPrice     PriceChange      
##  Min.   : 8.0   Length:23          Min.   :104.5   Min.   :-1.57000  
##  1st Qu.:13.5   Class :character   1st Qu.:106.5   1st Qu.:-0.31000  
##  Median :19.0   Mode  :character   Median :108.0   Median : 0.00000  
##  Mean   :19.0                      Mean   :107.6   Mean   : 0.09217  
##  3rd Qu.:24.5                      3rd Qu.:108.8   3rd Qu.: 0.34500  
##  Max.   :30.0                      Max.   :109.5   Max.   : 1.84000
round(mean(apple.data.august2016$StockPrice), digits=2)
## [1] 107.63
round(median(apple.data.august2016$StockPrice), digits=2)
## [1] 108
round(mean(apple.data.august2016$PriceChange), digits=2)
## [1] 0.09
round(median(apple.data.august2016$PriceChange), digits=2)
## [1] 0

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”:

-> Adding a new Column having character values:

apple.data.august2016$ChangeIndicator = ifelse(apple.data.august2016$PriceChange > 0 , "I",(ifelse(apple.data.august2016$PriceChange <0, "D", "S" )) )

apple.data.august2016
## # A tibble: 23 x 5
##       Id MarketDate StockPrice PriceChange ChangeIndicator
##    <dbl> <chr>           <dbl>       <dbl> <chr>          
##  1     8 8/1/2016         106.        1.84 I              
##  2     9 8/2/2016         104.       -1.57 D              
##  3    10 8/3/2016         106.        1.31 I              
##  4    11 8/4/2016         106.        0.08 I              
##  5    12 8/5/2016         107.        1.61 I              
##  6    13 8/8/2016         108.        0.89 I              
##  7    14 8/9/2016         109.        0.44 I              
##  8    15 8/10/2016        108        -0.81 D              
##  9    16 8/11/2016        108.       -0.07 D              
## 10    17 8/12/2016        108.        0.25 I              
## # … with 13 more rows

-> Column Value Renaming:

apple.data.august2016$ChangeIndicator <- gsub("I","Increase", apple.data.august2016$ChangeIndicator)
apple.data.august2016$ChangeIndicator <- gsub("D","Decrease", apple.data.august2016$ChangeIndicator)
apple.data.august2016$ChangeIndicator <- gsub("S","Same", apple.data.august2016$ChangeIndicator)

apple.data.august2016
## # A tibble: 23 x 5
##       Id MarketDate StockPrice PriceChange ChangeIndicator
##    <dbl> <chr>           <dbl>       <dbl> <chr>          
##  1     8 8/1/2016         106.        1.84 Increase       
##  2     9 8/2/2016         104.       -1.57 Decrease       
##  3    10 8/3/2016         106.        1.31 Increase       
##  4    11 8/4/2016         106.        0.08 Increase       
##  5    12 8/5/2016         107.        1.61 Increase       
##  6    13 8/8/2016         108.        0.89 Increase       
##  7    14 8/9/2016         109.        0.44 Increase       
##  8    15 8/10/2016        108        -0.81 Decrease       
##  9    16 8/11/2016        108.       -0.07 Decrease       
## 10    17 8/12/2016        108.        0.25 Increase       
## # … with 13 more rows

6. Display enough rows to see examples of all of steps 1-5 above:

-> Data:

as.data.frame(apple.data)
##    X1       Date  Price Change  Volume
## 1   1  7/21/2016  99.43     NA  32.690
## 2   2  7/22/2016  98.66  -0.77  28.218
## 3   3  7/25/2016  97.34  -1.32  40.291
## 4   4  7/26/2016  96.67  -0.67  53.455
## 5   5  7/27/2016 102.95   6.28  92.144
## 6   6  7/28/2016 104.34   1.39  38.772
## 7   7  7/29/2016 104.21  -0.13  27.698
## 8   8   8/1/2016 106.05   1.84  38.019
## 9   9   8/2/2016 104.48  -1.57  33.770
## 10 10   8/3/2016 105.79   1.31  30.148
## 11 11   8/4/2016 105.87   0.08  26.782
## 12 12   8/5/2016 107.48   1.61  39.547
## 13 13   8/8/2016 108.37   0.89  28.010
## 14 14   8/9/2016 108.81   0.44  26.296
## 15 15  8/10/2016 108.00  -0.81  23.840
## 16 16  8/11/2016 107.93  -0.07  27.460
## 17 17  8/12/2016 108.18   0.25  18.649
## 18 18  8/15/2016 109.48   1.30  25.704
## 19 19  8/16/2016 109.38  -0.10  33.755
## 20 20  8/17/2016 109.22  -0.16  25.329
## 21 21  8/18/2016 109.08  -0.14  21.918
## 22 22  8/19/2016 109.08   0.00  25.109
## 23 23  8/22/2016 108.08   0.00  25.784
## 24 24  8/23/2016 108.85   0.00  21.237
## 25 25  8/24/2016 108.03  -0.82  23.606
## 26 26  8/25/2016 107.57  -0.46  25.002
## 27 27  8/26/2016 106.94  -0.63  27.744
## 28 28  8/29/2016 106.82  -0.12  24.900
## 29 29  8/30/2016 106.00  -0.82  24.818
## 30 30  8/31/2016 106.10   0.10  31.639
## 31 31   9/1/2016 106.73   0.63  26.675
## 32 32   9/2/2016 107.73   1.00  26.394
## 33 33   9/6/2016 107.70  -0.03  26.645
## 34 34   9/7/2016 108.36   0.66  42.250
## 35 35   9/8/2016 105.52  -2.84  52.955
## 36 36   9/9/2016 103.13  -2.39  46.462
## 37 37  9/12/2016 105.44   2.31  45.115
## 38 38  9/13/2016 107.95   2.51  62.080
## 39 39  9/14/2016 111.77   3.82 111.187
## 40 40  9/15/2016 115.57   3.80  90.398
## 41 41  9/16/2016 114.92  -0.65  79.764
## 42 42  9/19/2016 113.58  -1.34  46.937
## 43 43  9/20/2016 113.57  -0.01  34.494
## 44 44  9/21/2016 113.55  -0.02  35.952
## 45 45  9/22/2016 114.62   1.07  31.048
## 46 46  9/23/2016 112.71  -1.91  52.411
## 47 47  9/26/2016 112.88   0.17  29.800
## 48 48  9/27/2016 113.09   0.21  24.587
## 49 49  9/28/2016 113.95   0.86  29.608
## 50 50  9/29/2016 112.18  -1.77  35.850
## 51 51  9/30/2016 113.05   0.87  36.341
## 52 52  10/3/2016 112.52  -0.53  21.635
## 53 53  10/4/2016 113.00   0.48  29.707
## 54 54  10/5/2016 113.05   0.05  21.400
## 55 55  10/6/2016 113.89   0.84  28.509
## 56 56  10/7/2016 114.06   0.17  24.336
## 57 57 10/10/2016 116.05   1.99  36.088
## 58 58 10/11/2016 116.30   0.25  63.963
## 59 59 10/12/2016 117.34   1.04  37.513
## 60 60 10/13/2016 116.98  -0.36  35.042
## 61 61 10/14/2016 117.63   0.65  35.626
## 62 62 10/17/2016 117.55  -0.08  23.584
## 63 63 10/18/2016 117.47  -0.08  24.308
## 64 64 10/19/2016 117.12  -0.35  19.977
## 65 65 10/20/2016 117.06  -0.06  24.100
## 66 66 10/21/2016 116.60  -0.46  22.528
as.data.frame(apple.data.august2016)
##    Id MarketDate StockPrice PriceChange ChangeIndicator
## 1   8   8/1/2016     106.05        1.84        Increase
## 2   9   8/2/2016     104.48       -1.57        Decrease
## 3  10   8/3/2016     105.79        1.31        Increase
## 4  11   8/4/2016     105.87        0.08        Increase
## 5  12   8/5/2016     107.48        1.61        Increase
## 6  13   8/8/2016     108.37        0.89        Increase
## 7  14   8/9/2016     108.81        0.44        Increase
## 8  15  8/10/2016     108.00       -0.81        Decrease
## 9  16  8/11/2016     107.93       -0.07        Decrease
## 10 17  8/12/2016     108.18        0.25        Increase
## 11 18  8/15/2016     109.48        1.30        Increase
## 12 19  8/16/2016     109.38       -0.10        Decrease
## 13 20  8/17/2016     109.22       -0.16        Decrease
## 14 21  8/18/2016     109.08       -0.14        Decrease
## 15 22  8/19/2016     109.08        0.00            Same
## 16 23  8/22/2016     108.08        0.00            Same
## 17 24  8/23/2016     108.85        0.00            Same
## 18 25  8/24/2016     108.03       -0.82        Decrease
## 19 26  8/25/2016     107.57       -0.46        Decrease
## 20 27  8/26/2016     106.94       -0.63        Decrease
## 21 28  8/29/2016     106.82       -0.12        Decrease
## 22 29  8/30/2016     106.00       -0.82        Decrease
## 23 30  8/31/2016     106.10        0.10        Increase

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career:

-> Access csv fiel via github:

theURL <-  "https://raw.githubusercontent.com/kamathvk1982/CunyBridgeR/master/AppleStock.csv"
git.apple.data <-read_delim(file=theURL,  delim=',') 
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   Date = col_character(),
##   Price = col_double(),
##   Change = col_double(),
##   Volume = col_double()
## )
git.apple.data 
## # A tibble: 66 x 5
##       X1 Date      Price Change Volume
##    <dbl> <chr>     <dbl>  <dbl>  <dbl>
##  1     1 7/21/2016  99.4  NA      32.7
##  2     2 7/22/2016  98.7  -0.77   28.2
##  3     3 7/25/2016  97.3  -1.32   40.3
##  4     4 7/26/2016  96.7  -0.67   53.5
##  5     5 7/27/2016 103.    6.28   92.1
##  6     6 7/28/2016 104.    1.39   38.8
##  7     7 7/29/2016 104.   -0.13   27.7
##  8     8 8/1/2016  106.    1.84   38.0
##  9     9 8/2/2016  104.   -1.57   33.8
## 10    10 8/3/2016  106.    1.31   30.1
## # … with 56 more rows
summary(git.apple.data)
##        X1            Date               Price            Change       
##  Min.   : 1.00   Length:66          Min.   : 96.67   Min.   :-2.8400  
##  1st Qu.:17.25   Class :character   1st Qu.:106.75   1st Qu.:-0.4600  
##  Median :33.50   Mode  :character   Median :108.97   Median : 0.0000  
##  Mean   :33.50                      Mean   :109.75   Mean   : 0.2677  
##  3rd Qu.:49.75                      3rd Qu.:113.58   3rd Qu.: 0.8600  
##  Max.   :66.00                      Max.   :117.63   Max.   : 6.2800  
##                                                      NA's   :1        
##      Volume      
##  Min.   : 18.65  
##  1st Qu.: 25.03  
##  Median : 29.66  
##  Mean   : 35.72  
##  3rd Qu.: 37.89  
##  Max.   :111.19  
##