R Bridge Course 2 Week 2 Assignment

# read the data
dt <- read.csv("iris.csv" , row.names = 1, stringsAsFactors = FALSE)
  1. Use the summary function to gain an overview of the data set. Then dispaly the mean and median for at leat two attributes.
summary(dt)
##        X           Sepal.Length    Sepal.Width     Petal.Length  
##  Min.   :  1.00   Min.   :4.300   Min.   :2.000   Min.   :1.000  
##  1st Qu.: 38.25   1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600  
##  Median : 75.50   Median :5.800   Median :3.000   Median :4.350  
##  Mean   : 75.50   Mean   :5.843   Mean   :3.057   Mean   :3.758  
##  3rd Qu.:112.75   3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100  
##  Max.   :150.00   Max.   :7.900   Max.   :4.400   Max.   :6.900  
##   Petal.Width      Species         
##  Min.   :0.100   Length:150        
##  1st Qu.:0.300   Class :character  
##  Median :1.300   Mode  :character  
##  Mean   :1.199                     
##  3rd Qu.:1.800                     
##  Max.   :2.500
mean(dt$Sepal.Length)
## [1] 5.843333
mean(dt$Petal.Width )
## [1] 1.199333
median(dt$Sepal.Length)
## [1] 5.8
median(dt$Petal.Width )
## [1] 1.3
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
 subsetdt <- dt[c(1:20,60,80), c(1:2,5)]
  1. Create new column for the new data frame.
subsetdt$SL.type <- ifelse(subsetdt$Sepal.Length>5,"High Sepal Length","Low Sepal Length")
  1. Use the summary function to create an overview of your new data frame. The print the means and median for the same two attributes . Please compare.
summary(subsetdt)
##        X          Sepal.Length    Petal.Width       SL.type         
##  Min.   : 1.00   Min.   :4.300   Min.   :0.1000   Length:22         
##  1st Qu.: 6.25   1st Qu.:4.800   1st Qu.:0.2000   Class :character  
##  Median :11.50   Median :5.050   Median :0.2000   Mode  :character  
##  Mean   :15.91   Mean   :5.073   Mean   :0.3227                     
##  3rd Qu.:16.75   3rd Qu.:5.400   3rd Qu.:0.3000                     
##  Max.   :80.00   Max.   :5.800   Max.   :1.4000
mean(subsetdt$Sepal.Length)
## [1] 5.072727
mean(subsetdt$Petal.Width )
## [1] 0.3227273
median(subsetdt$Sepal.Length)
## [1] 5.05
median(subsetdt$Petal.Width )
## [1] 0.2

Mean sepal length and mean petal width is lower than the full data set.

Median sepal length is lower than the full data set.Median sepal width is aslo lower than the full data set. 5. For at least 3 values in a column please rename so that every value in that column is renamed. For example suppos I have 20 values of the letter “e” in one column. Rename those values so taht all 20 would shows as “excellent”.

subsetdt[subsetdt$Species=="setosa","Species"] <- "Setosa"
  1. Display enought rows to see examples of all of steps 1-5 above.
head(subsetdt)
##   X Sepal.Length Petal.Width           SL.type Species
## 1 1          5.1         0.2 High Sepal Length    <NA>
## 2 2          4.9         0.2  Low Sepal Length    <NA>
## 3 3          4.7         0.2  Low Sepal Length    <NA>
## 4 4          4.6         0.2  Low Sepal Length    <NA>
## 5 5          5.0         0.2  Low Sepal Length    <NA>
## 6 6          5.4         0.4 High Sepal Length    <NA>
  1. Bonus - place the originial .csv in a github file and have R read from the link. This will be a very useful skill as you progress your data science edudcation and career.
# replace url by the link of your github
url <- "https://raw.githubusercontent.com/jonygeta/iris.csv/master/iris.xls?token=Ar4SZICZcfUNSmQjkodz_m1d77rxWU3zks5cMbLEwA%3D%3D"
dt <- read.csv(url)

Submit .rmd, .csv and link to RPubs.