January 7, 2020
Gehad Gad
CUNY MSDS Winter Bridge
Assignment: 2
#Read data into R.
StockMarket <- read.csv(file=“https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/EuStockMarkets.csv”)
Question 1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
summary(StockMarket)
#summary(StockMarket) # X DAX SMI CAC FTSE
# Min. : 1.0 Min. :1402 Min. :1587 Min. :1611 Min. :2281
# 1st Qu.: 465.8 1st Qu.:1744 1st Qu.:2166 1st Qu.:1875 1st Qu.:2843
# Median : 930.5 Median :2141 Median :2796 Median :1992 Median :3247
# Mean : 930.5 Mean :2531 Mean :3376 Mean :2228 Mean :3566
# 3rd Qu.:1395.2 3rd Qu.:2722 3rd Qu.:3812 3rd Qu.:2274 3rd Qu.:3994
# Max. :1860.0 Max. :6186 Max. :8412 Max. :4388 Max. :6179
mean(StockMarket$CAC)
#[1] 2227.828
mean(StockMarket$FTSE)
#[1] 3565.643
median(StockMarket$CAC)
#[1] 1992.3
median(StockMarket$FTSE)
#[1] 3246.6
Question 2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
NewStockMarket<- data.frame (StockMarket[c(1:200),c(2:5)])
Question 3. Create new column names for the new data frame.
names(NewStockMarket) <- c("DPI","SWISS","FRE","FINA")
Question 4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
summary(NewStockMarket)
#DPI SWISS FRE FINA
Min. :1502 Min. :1587 Min. :1634 Min. :2345
1st Qu.:1589 1st Qu.:1679 1st Qu.:1765 1st Qu.:2475
Median :1625 Median :1718 Median :1847 Median :2544
Mean :1633 Mean :1720 Mean :1830 Mean :2532
3rd Qu.:1679 3rd Qu.:1753 3rd Qu.:1876 3rd Qu.:2581
Max. :1764 Max. :1850 Max. :1994 Max. :2680
#There is a difference between the mean and the median.
mean(NewStockMarket$FRE)
#[1] 1830.005
mean(NewStockMarket$FINA)
#[1] 2531.749
median(NewStockMarket$FRE)
#[1] 1846.85
median(NewStockMarket$FINA)
#[1] 2544.15
Question 5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
# Creating a replace vector and matrix.
replace_vector = c (NewStockMarket$FINA [NewStockMarket$FINA < 2500])
replace_matrix = "Medium"
# Replacing values higher than 2500 with Median
NewStockMarket$FINA[NewStockMarket$FINA < 2500] = replace (replace_vector,values=replace_matrix)
Question 6. Display enough rows to see examples of all of steps 1-5 above.
#Slicing 13 rows from the original data.
StockMarket [1:13, ]
#Slicing 13 rows from the subset to display the changes.
NewStockMarket [1:13, ]
head(NewStockMarket,13) #Displaying
tail(NewStockMarket,13) #Displaying
Question 7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
StockMarket_Github <- read.csv(file="https://github.com/GehadGad/R.bridge-assignment-2-/raw/master/EuStockMarkets.csv")