January 7, 2020

Gehad Gad

CUNY MSDS Winter Bridge

Assignment: 2

#Read data into R.

StockMarket <- read.csv(file=“https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/EuStockMarkets.csv”)

Question 1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

summary(StockMarket)

#summary(StockMarket) # X DAX SMI CAC FTSE
# Min. : 1.0 Min. :1402 Min. :1587 Min. :1611 Min. :2281
# 1st Qu.: 465.8 1st Qu.:1744 1st Qu.:2166 1st Qu.:1875 1st Qu.:2843
# Median : 930.5 Median :2141 Median :2796 Median :1992 Median :3247
# Mean : 930.5 Mean :2531 Mean :3376 Mean :2228 Mean :3566
# 3rd Qu.:1395.2 3rd Qu.:2722 3rd Qu.:3812 3rd Qu.:2274 3rd Qu.:3994
# Max. :1860.0 Max. :6186 Max. :8412 Max. :4388 Max. :6179

mean(StockMarket$CAC)

#[1] 2227.828

mean(StockMarket$FTSE)

#[1] 3565.643

median(StockMarket$CAC)

#[1] 1992.3

median(StockMarket$FTSE)

#[1] 3246.6

Question 2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

NewStockMarket<- data.frame (StockMarket[c(1:200),c(2:5)])

Question 3. Create new column names for the new data frame.

names(NewStockMarket) <- c("DPI","SWISS","FRE","FINA")

Question 4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(NewStockMarket)

#DPI SWISS FRE FINA
Min. :1502 Min. :1587 Min. :1634 Min. :2345
1st Qu.:1589 1st Qu.:1679 1st Qu.:1765 1st Qu.:2475
Median :1625 Median :1718 Median :1847 Median :2544
Mean :1633 Mean :1720 Mean :1830 Mean :2532
3rd Qu.:1679 3rd Qu.:1753 3rd Qu.:1876 3rd Qu.:2581
Max. :1764 Max. :1850 Max. :1994 Max. :2680

#There is a difference between the mean and the median.

mean(NewStockMarket$FRE)

#[1] 1830.005

mean(NewStockMarket$FINA)

#[1] 2531.749

median(NewStockMarket$FRE)

#[1] 1846.85

median(NewStockMarket$FINA)

#[1] 2544.15

Question 5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

# Creating a replace vector and matrix.

replace_vector = c (NewStockMarket$FINA [NewStockMarket$FINA  < 2500])
replace_matrix = "Medium"

# Replacing values higher than 2500 with Median

NewStockMarket$FINA[NewStockMarket$FINA  < 2500] = replace (replace_vector,values=replace_matrix)

Question 6. Display enough rows to see examples of all of steps 1-5 above.

#Slicing 13  rows from the original data.

StockMarket [1:13, ] 

#Slicing 13 rows from the subset to display the changes.

NewStockMarket [1:13, ] 

head(NewStockMarket,13) #Displaying

tail(NewStockMarket,13) #Displaying

Question 7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

StockMarket_Github <- read.csv(file="https://github.com/GehadGad/R.bridge-assignment-2-/raw/master/EuStockMarkets.csv")