Setup

toothData <- read.table(file='https://raw.githubusercontent.com/brian-cuny/rassignment2/master/ToothGrowth.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
rbind(head(toothData), tail(toothData))
##     X  len supp dose
## 1   1  4.2   VC  0.5
## 2   2 11.5   VC  0.5
## 3   3  7.3   VC  0.5
## 4   4  5.8   VC  0.5
## 5   5  6.4   VC  0.5
## 6   6 10.0   VC  0.5
## 55 55 24.8   OJ  2.0
## 56 56 30.9   OJ  2.0
## 57 57 26.4   OJ  2.0
## 58 58 27.3   OJ  2.0
## 59 59 29.4   OJ  2.0
## 60 60 23.0   OJ  2.0

Question 1 - Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

The data only has one attribute whose mean and median are meaningful, so I split the data into two based on the value of the ‘supp’ column.

summary(toothData)
##        X              len            supp                dose      
##  Min.   : 1.00   Min.   : 4.20   Length:60          Min.   :0.500  
##  1st Qu.:15.75   1st Qu.:13.07   Class :character   1st Qu.:0.500  
##  Median :30.50   Median :19.25   Mode  :character   Median :1.000  
##  Mean   :30.50   Mean   :18.81                      Mean   :1.167  
##  3rd Qu.:45.25   3rd Qu.:25.27                      3rd Qu.:2.000  
##  Max.   :60.00   Max.   :33.90                      Max.   :2.000
ojData <- toothData[toothData$supp == 'OJ',]
vcData <- toothData[toothData$supp == 'VC',]
ojMean <- mean(ojData$len)
vcMean <- mean(vcData$len)
ojMedian <- median(ojData$len)
vcMedian <- median(vcData$len)
cat(sprintf('VC: The mean ginuea pig tooth length is: %s and the median is %s', vcMean, vcMedian),'\n',sprintf('OJ: The mean ginuea pig tooth length is: %s and the median is %s', ojMean, ojMedian))
## VC: The mean ginuea pig tooth length is: 16.9633333333333 and the median is 16.5 
##  OJ: The mean ginuea pig tooth length is: 20.6633333333333 and the median is 22.7

Question 2 - Create a new data frame with a subset of the columns and rows. Make sure to rename it.

I removed the redundant X column and eliminated all rows that have a dose amount of 2.

betterToothData <- toothData[toothData$dose != 2,2:4]
rbind(head(betterToothData), tail(betterToothData))
##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 45 20.0   OJ  1.0
## 46 25.2   OJ  1.0
## 47 25.8   OJ  1.0
## 48 21.2   OJ  1.0
## 49 14.5   OJ  1.0
## 50 27.3   OJ  1.0

Question 3 - Create new column names for the new data frame.

names(betterToothData) <- c('Tooth_Length', 'Supplement', 'Dose')
head(betterToothData)
##   Tooth_Length Supplement Dose
## 1          4.2         VC  0.5
## 2         11.5         VC  0.5
## 3          7.3         VC  0.5
## 4          5.8         VC  0.5
## 5          6.4         VC  0.5
## 6         10.0         VC  0.5

Question 4 - Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(betterToothData)
##   Tooth_Length     Supplement             Dose     
##  Min.   : 4.200   Length:40          Min.   :0.50  
##  1st Qu.: 9.925   Class :character   1st Qu.:0.50  
##  Median :15.200   Mode  :character   Median :0.75  
##  Mean   :15.170                      Mean   :0.75  
##  3rd Qu.:19.775                      3rd Qu.:1.00  
##  Max.   :27.300                      Max.   :1.00
ojBetterData <- betterToothData[betterToothData$Supplement == 'OJ',]
vcBetterData <- betterToothData[betterToothData$Supplement == 'VC',]
ojBetterMean <- mean(ojBetterData$Tooth_Length)
vcBetterMean <- mean(vcBetterData$Tooth_Length)
ojBetterMedian <- median(ojBetterData$Tooth_Length)
vcBetterMedian <- median(vcBetterData$Tooth_Length)
cat(sprintf('VC: The new mean guinea pig tooth length is: %s and the median is %s', vcBetterMean, vcBetterMedian),'\n',sprintf('VC: Old mean: %s vs. New mean: %s and Old median: %s vs. New Median: %s', vcMean, vcBetterMean, vcMedian, vcBetterMedian))
## VC: The new mean guinea pig tooth length is: 12.375 and the median is 12.55 
##  VC: Old mean: 16.9633333333333 vs. New mean: 12.375 and Old median: 16.5 vs. New Median: 12.55
cat(sprintf('OJ: The new mean guinea pig tooth length is: %s and the median is %s', ojBetterMean, ojBetterMedian),'\n',sprintf('OJ: Old mean: %s vs. New mean: %s and Old median: %s vs. New Median: %s', ojMean, ojBetterMean, ojMedian, ojBetterMedian))
## OJ: The new mean guinea pig tooth length is: 17.965 and the median is 18.65 
##  OJ: Old mean: 20.6633333333333 vs. New mean: 17.965 and Old median: 22.7 vs. New Median: 18.65

Question 5 - For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”

betterToothData$Supplement[betterToothData$Supplement == 'VC' & betterToothData$Dose == 0.5] <- 'Lo Vitamin C'
betterToothData$Supplement[betterToothData$Supplement == 'VC' & betterToothData$Dose == 1] <- 'Hi Vitamin C'
betterToothData$Supplement[betterToothData$Supplement == 'OJ' & betterToothData$Dose == 0.5] <- 'Lo Orange Juice'
betterToothData$Supplement[betterToothData$Supplement == 'OJ' & betterToothData$Dose == 1] <- 'Hi Orange Juice'
rbind(head(betterToothData), tail(betterToothData))
##    Tooth_Length      Supplement Dose
## 1           4.2    Lo Vitamin C  0.5
## 2          11.5    Lo Vitamin C  0.5
## 3           7.3    Lo Vitamin C  0.5
## 4           5.8    Lo Vitamin C  0.5
## 5           6.4    Lo Vitamin C  0.5
## 6          10.0    Lo Vitamin C  0.5
## 45         20.0 Hi Orange Juice  1.0
## 46         25.2 Hi Orange Juice  1.0
## 47         25.8 Hi Orange Juice  1.0
## 48         21.2 Hi Orange Juice  1.0
## 49         14.5 Hi Orange Juice  1.0
## 50         27.3 Hi Orange Juice  1.0