1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
library(RCurl)
## Loading required package: bitops
library(ggplot2)
library(plyr)
library(reshape2)
library(extrafont)
## Registering fonts with R
weightLoss.data <- getURL("https://raw.githubusercontent.com/ann2014/CUNY/master/WeightLoss.csv")
weightLoss.data <- read.csv(text = weightLoss.data)
head(weightLoss.data)
##   X   group wl1 wl2 wl3 se1 se2 se3
## 1 1 Control   4   3   3  14  13  15
## 2 2 Control   4   4   3  13  14  17
## 3 3 Control   4   3   1  17  12  16
## 4 4 Control   3   2   1  11  11  12
## 5 5 Control   5   3   2  16  15  14
## 6 6 Control   6   5   4  17  18  18
summary(weightLoss.data)
##        X             group         wl1             wl2       
##  Min.   : 1.00   Control:12   Min.   :3.000   Min.   :2.000  
##  1st Qu.: 9.25   Diet   :12   1st Qu.:4.000   1st Qu.:3.000  
##  Median :17.50   DietEx :10   Median :5.000   Median :4.000  
##  Mean   :17.50                Mean   :5.294   Mean   :4.353  
##  3rd Qu.:25.75                3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   :34.00                Max.   :9.000   Max.   :9.000  
##       wl3             se1             se2             se3       
##  Min.   :1.000   Min.   :11.00   Min.   :11.00   Min.   :11.00  
##  1st Qu.:1.000   1st Qu.:13.00   1st Qu.:12.00   1st Qu.:15.00  
##  Median :2.000   Median :15.00   Median :14.00   Median :17.00  
##  Mean   :2.176   Mean   :14.91   Mean   :13.82   Mean   :16.21  
##  3rd Qu.:3.000   3rd Qu.:16.00   3rd Qu.:15.00   3rd Qu.:18.00  
##  Max.   :4.000   Max.   :19.00   Max.   :19.00   Max.   :19.00
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
str(weightLoss.data)
## 'data.frame':    34 obs. of  8 variables:
##  $ X    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ group: Factor w/ 3 levels "Control","Diet",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ wl1  : int  4 4 4 3 5 6 6 5 5 3 ...
##  $ wl2  : int  3 4 3 2 3 5 5 4 4 3 ...
##  $ wl3  : int  3 3 1 1 2 4 4 1 1 2 ...
##  $ se1  : int  14 13 17 11 16 17 17 13 14 14 ...
##  $ se2  : int  13 14 12 11 15 18 16 15 14 15 ...
##  $ se3  : int  15 17 16 12 14 18 19 15 15 13 ...
new.df <- subset(weightLoss.data, select = -X)
head(new.df)
##     group wl1 wl2 wl3 se1 se2 se3
## 1 Control   4   3   3  14  13  15
## 2 Control   4   4   3  13  14  17
## 3 Control   4   3   1  17  12  16
## 4 Control   3   2   1  11  11  12
## 5 Control   5   3   2  16  15  14
## 6 Control   6   5   4  17  18  18
  1. Create new column names for the new data frame.
names(new.df)[2:4] <- c("WeightLoss_month1", "WeightLoss_month2", "WeightLoss_month3")
names(new.df)[5:7] <- c("SelfEsteem_month1", "SelfEsteem_month2", "SelfEsteem_month3")
head(new.df)
##     group WeightLoss_month1 WeightLoss_month2 WeightLoss_month3
## 1 Control                 4                 3                 3
## 2 Control                 4                 4                 3
## 3 Control                 4                 3                 1
## 4 Control                 3                 2                 1
## 5 Control                 5                 3                 2
## 6 Control                 6                 5                 4
##   SelfEsteem_month1 SelfEsteem_month2 SelfEsteem_month3
## 1                14                13                15
## 2                13                14                17
## 3                17                12                16
## 4                11                11                12
## 5                16                15                14
## 6                17                18                18
  1. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
summary(new.df)
##      group    WeightLoss_month1 WeightLoss_month2 WeightLoss_month3
##  Control:12   Min.   :3.000     Min.   :2.000     Min.   :1.000    
##  Diet   :12   1st Qu.:4.000     1st Qu.:3.000     1st Qu.:1.000    
##  DietEx :10   Median :5.000     Median :4.000     Median :2.000    
##               Mean   :5.294     Mean   :4.353     Mean   :2.176    
##               3rd Qu.:6.000     3rd Qu.:5.000     3rd Qu.:3.000    
##               Max.   :9.000     Max.   :9.000     Max.   :4.000    
##  SelfEsteem_month1 SelfEsteem_month2 SelfEsteem_month3
##  Min.   :11.00     Min.   :11.00     Min.   :11.00    
##  1st Qu.:13.00     1st Qu.:12.00     1st Qu.:15.00    
##  Median :15.00     Median :14.00     Median :17.00    
##  Mean   :14.91     Mean   :13.82     Mean   :16.21    
##  3rd Qu.:16.00     3rd Qu.:15.00     3rd Qu.:18.00    
##  Max.   :19.00     Max.   :19.00     Max.   :19.00
summary(weightLoss.data)
##        X             group         wl1             wl2       
##  Min.   : 1.00   Control:12   Min.   :3.000   Min.   :2.000  
##  1st Qu.: 9.25   Diet   :12   1st Qu.:4.000   1st Qu.:3.000  
##  Median :17.50   DietEx :10   Median :5.000   Median :4.000  
##  Mean   :17.50                Mean   :5.294   Mean   :4.353  
##  3rd Qu.:25.75                3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   :34.00                Max.   :9.000   Max.   :9.000  
##       wl3             se1             se2             se3       
##  Min.   :1.000   Min.   :11.00   Min.   :11.00   Min.   :11.00  
##  1st Qu.:1.000   1st Qu.:13.00   1st Qu.:12.00   1st Qu.:15.00  
##  Median :2.000   Median :15.00   Median :14.00   Median :17.00  
##  Mean   :2.176   Mean   :14.91   Mean   :13.82   Mean   :16.21  
##  3rd Qu.:3.000   3rd Qu.:16.00   3rd Qu.:15.00   3rd Qu.:18.00  
##  Max.   :4.000   Max.   :19.00   Max.   :19.00   Max.   :19.00
  1. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
head(new.df[, 1])
## [1] Control Control Control Control Control Control
## Levels: Control Diet DietEx
new.df[, 1] <- gsub('Control', 'Ctrl', new.df[, 1])
head(new.df[, 1])
## [1] "Ctrl" "Ctrl" "Ctrl" "Ctrl" "Ctrl" "Ctrl"
  1. Display enough rows to see examples of all of steps 1-5 above.
head(new.df, 15)
##    group WeightLoss_month1 WeightLoss_month2 WeightLoss_month3
## 1   Ctrl                 4                 3                 3
## 2   Ctrl                 4                 4                 3
## 3   Ctrl                 4                 3                 1
## 4   Ctrl                 3                 2                 1
## 5   Ctrl                 5                 3                 2
## 6   Ctrl                 6                 5                 4
## 7   Ctrl                 6                 5                 4
## 8   Ctrl                 5                 4                 1
## 9   Ctrl                 5                 4                 1
## 10  Ctrl                 3                 3                 2
## 11  Ctrl                 4                 2                 2
## 12  Ctrl                 5                 2                 1
## 13  Diet                 6                 3                 2
## 14  Diet                 5                 4                 1
## 15  Diet                 7                 6                 3
##    SelfEsteem_month1 SelfEsteem_month2 SelfEsteem_month3
## 1                 14                13                15
## 2                 13                14                17
## 3                 17                12                16
## 4                 11                11                12
## 5                 16                15                14
## 6                 17                18                18
## 7                 17                16                19
## 8                 13                15                15
## 9                 14                14                15
## 10                14                15                13
## 11                16                16                11
## 12                15                13                16
## 13                12                11                14
## 14                13                14                15
## 15                17                11                18
  1. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

https://raw.githubusercontent.com/ann2014/CUNY/master/WeightLoss.csv