r-assignment2

Load required libraries

library(plyr)

Load .csv data from Github page

theUrl <- "https://raw.githubusercontent.com/maelillien/bridge-R/master/acme.csv"
acme_data <- read.csv(file = theUrl, header = TRUE, sep = ",")
head(acme_data)

##   X month    market      acme
## 1 1  1/86 -0.061134  0.030160
## 2 2  2/86  0.008220 -0.165457
## 3 3  3/86 -0.007381  0.080137
## 4 4  4/86 -0.067561 -0.109917
## 5 5  5/86 -0.006238 -0.114853
## 6 6  6/86 -0.044251 -0.099254

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

summary(acme_data)

##        X             month        market              acme         
##  Min.   : 1.00   1/86   : 1   Min.   :-0.26208   Min.   :-0.28480  
##  1st Qu.:15.75   1/87   : 1   1st Qu.:-0.07901   1st Qu.:-0.13305  
##  Median :30.50   1/88   : 1   Median :-0.04487   Median :-0.08999  
##  Mean   :30.50   1/89   : 1   Mean   :-0.05117   Mean   :-0.06897  
##  3rd Qu.:45.25   1/90   : 1   3rd Qu.:-0.01159   3rd Qu.:-0.03149  
##  Max.   :60.00   10/86  : 1   Max.   : 0.07340   Max.   : 0.24262  
##                  (Other):54

mean(acme_data$market)

## [1] -0.0511683

median(acme_data$market)

## [1] -0.0448665

mean(acme_data$acme)

## [1] -0.06896925

median(acme_data$acme)

## [1] -0.0899915

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

Selecting data when market returns between -10 and 10% as a subset

sub_data = subset(acme_data, market < 0.1 & market > -0.1, select=c(month, market, acme))

3. Create new column names for the new data frame.

sub_data <- rename(sub_data, c("month"="period", "market"="market_return", "acme"="acme_return"))

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(sub_data)

##      period   market_return        acme_return      
##  1/86   : 1   Min.   :-0.093729   Min.   :-0.19083  
##  1/87   : 1   1st Qu.:-0.061572   1st Qu.:-0.11377  
##  1/88   : 1   Median :-0.039495   Median :-0.07622  
##  1/89   : 1   Mean   :-0.034522   Mean   :-0.04591  
##  10/86  : 1   3rd Qu.:-0.007761   3rd Qu.: 0.01421  
##  10/88  : 1   Max.   : 0.073396   Max.   : 0.24262  
##  (Other):44

mean(sub_data$market_return)

## [1] -0.03452238

median(sub_data$market_return)

## [1] -0.039495

mean(sub_data$acme_return)

## [1] -0.0459094

median(sub_data$acme_return)

## [1] -0.0762195

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”. (rename values in a column, regex)

I am constructing an arbitrary renaming filter where market returns greater than 5%, between -5% and 5% and below -5% are set the coefficient 1, 0 and -1 respectively.

Column \(filter\) is added to avoid the issue whereby the filtering conditions below filter the newly renamed value and produce unintended results when used in the same column.

sub_data$filter = 0

sub_data$filter[sub_data$market_return < 0.05 & sub_data$market_return > -0.5] <- 0
sub_data$filter[sub_data$market_return >= 0.05] <- 1
sub_data$filter[sub_data$market_return <= -0.05] <- -1

6. Display enough rows to see examples of all of steps 1-5 above.

head(sub_data, 15)

##    period market_return acme_return filter
## 1    1/86     -0.061134    0.030160     -1
## 2    2/86      0.008220   -0.165457      0
## 3    3/86     -0.007381    0.080137      0
## 4    4/86     -0.067561   -0.109917     -1
## 5    5/86     -0.006238   -0.114853      0
## 6    6/86     -0.044251   -0.099254      0
## 8    8/86      0.030226    0.073445      0
## 10  10/86      0.001319    0.034776      0
## 11  11/86     -0.033679   -0.063375      0
## 12  12/86     -0.072795   -0.058735     -1
## 13   1/87      0.073396    0.050214      1
## 14   2/87     -0.011618    0.111165      0
## 15   3/87     -0.026852   -0.127492      0
## 16   4/87     -0.040356    0.054522      0
## 17   5/87     -0.047539   -0.072918      0

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

See the second R chunk above when the raw url is used to import from Github