R Markdown

  1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes
data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
df<-data.frame(ID=iris, stringsAsFactors=FALSE)

Petal length mean is:

mean(df$ID.Petal.Length)
## [1] 3.758

Petal width mean is:

mean(df$ID.Petal.Width)
## [1] 1.199333

Petal length median is:

median(df$ID.Petal.Length)
## [1] 4.35

Petal width median is:

median(df$ID.Petal.Width)
## [1] 1.3
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
flower_subset = data.frame(df[1:3, ], stringsAsFactors=FALSE)
print(flower_subset)
##   ID.Sepal.Length ID.Sepal.Width ID.Petal.Length ID.Petal.Width ID.Species
## 1             5.1            3.5             1.4            0.2     setosa
## 2             4.9            3.0             1.4            0.2     setosa
## 3             4.7            3.2             1.3            0.2     setosa
  1. Create new column names for the new data frame
colnames(flower_subset) <- c('subsetSepal_length','subsetSepal_width','subsetPetal_length', 'subsetPetal_width', 'flowerType')
print(flower_subset)
##   subsetSepal_length subsetSepal_width subsetPetal_length subsetPetal_width
## 1                5.1               3.5                1.4               0.2
## 2                4.9               3.0                1.4               0.2
## 3                4.7               3.2                1.3               0.2
##   flowerType
## 1     setosa
## 2     setosa
## 3     setosa
  1. Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.
summary_subset <- summary(flower_subset)

subsetPetallengthMean<-mean(flower_subset[,3])

paste("Subset petal length mean is", subsetPetallengthMean)
## [1] "Subset petal length mean is 1.36666666666667"
subsetPetalwidthMean<-mean(flower_subset[,4])

paste("Subset petal width mean is", subsetPetalwidthMean)
## [1] "Subset petal width mean is 0.2"
subsetPetallengthMedian<-median(flower_subset[,3])

paste("Subset petal length median is", subsetPetallengthMedian)
## [1] "Subset petal length median is 1.4"
subsetPetalwidthMedian<-median(flower_subset[,4])

paste("Subset petal width median is", subsetPetalwidthMedian )
## [1] "Subset petal width median is 0.2"
print(flower_subset["flowerType"])
##   flowerType
## 1     setosa
## 2     setosa
## 3     setosa
print(flower_subset)
##   subsetSepal_length subsetSepal_width subsetPetal_length subsetPetal_width
## 1                5.1               3.5                1.4               0.2
## 2                4.9               3.0                1.4               0.2
## 3                4.7               3.2                1.3               0.2
##   flowerType
## 1     setosa
## 2     setosa
## 3     setosa
#df[df == 'Old Value'] <- 'New value'
#str[flower_subset]
#flower_subset[flower_subset=='setosa']<- 'setosas'
flower_subset[flower_subset== 'setosa'] <- 'setosas!'
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = "setosas!"): invalid factor
## level, NA generated
print(flower_subset)
##   subsetSepal_length subsetSepal_width subsetPetal_length subsetPetal_width
## 1                5.1               3.5                1.4               0.2
## 2                4.9               3.0                1.4               0.2
## 3                4.7               3.2                1.3               0.2
##   flowerType
## 1       <NA>
## 2       <NA>
## 3       <NA>

I understand that the above code is giving invalid factor level, which is why I set stringsAsFactors=FALSE, but it wasn't fixed (didnt get time to figure out the solution).