Elina Azrilyan

Week 2 Homework Assignment

Please select one, download it and perform the following tasks: 1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

MyData <- read.csv(file="https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/COUNT/affairs.csv", header=TRUE, sep=",")
summary(MyData)
##        X          naffairs           kids           vryunhap      
##  Min.   :  1   Min.   : 0.000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:151   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :301   Median : 0.000   Median :1.0000   Median :0.00000  
##  Mean   :301   Mean   : 1.456   Mean   :0.7155   Mean   :0.02662  
##  3rd Qu.:451   3rd Qu.: 0.000   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :601   Max.   :12.000   Max.   :1.0000   Max.   :1.00000  
##      unhap           avgmarr           hapavg           vryhap     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.1098   Mean   :0.1547   Mean   :0.3228   Mean   :0.386  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##     antirel            notrel          slghtrel          smerel      
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.07987   Mean   :0.2729   Mean   :0.2146   Mean   :0.3161  
##  3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      vryrel          yrsmarr1          yrsmarr2         yrsmarr3     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0.1165   Mean   :0.08652   Mean   :0.1464   Mean   :0.1747  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##     yrsmarr4         yrsmarr5         yrsmarr6     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1364   Mean   :0.1165   Mean   :0.3394  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000
x <- c(mean(MyData$naffairs), mean(MyData$kids))
y <- c(median(MyData$naffairs), median(MyData$kids))
x
## [1] 1.4559068 0.7154742
y
## [1] 0 1
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
newaffairs <- data.frame(MyData$naffairs, MyData$kids, MyData$vryunhap, MyData$unhap, MyData$avgmarr, MyData$hapavg, MyData$vryhap)
  1. Create new column names for the new data frame.
colnames(newaffairs) <- c("NumberofAffairs","NumberofKids", "VeryUnhappy", "Unhappy", "Average", "Happy", "VeryHappy")
str(newaffairs)
## 'data.frame':    601 obs. of  7 variables:
##  $ NumberofAffairs: int  0 0 3 0 3 0 0 0 7 0 ...
##  $ NumberofKids   : int  0 0 0 1 1 1 0 0 1 0 ...
##  $ VeryUnhappy    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Unhappy        : int  0 0 0 0 0 0 0 0 1 0 ...
##  $ Average        : int  0 0 0 0 0 0 1 0 0 1 ...
##  $ Happy          : int  1 1 1 1 0 0 0 0 0 0 ...
##  $ VeryHappy      : int  0 0 0 0 1 1 0 1 0 0 ...
  1. Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.
summary(newaffairs)
##  NumberofAffairs   NumberofKids     VeryUnhappy         Unhappy      
##  Min.   : 0.000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median : 0.000   Median :1.0000   Median :0.00000   Median :0.0000  
##  Mean   : 1.456   Mean   :0.7155   Mean   :0.02662   Mean   :0.1098  
##  3rd Qu.: 0.000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :12.000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##     Average           Happy          VeryHappy    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.1547   Mean   :0.3228   Mean   :0.386  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000
a <- c(mean(newaffairs$NumberofAffairs), mean(newaffairs$NumberofKids))
b <- c(median(newaffairs$NumberofAffairs), median(newaffairs$NumberofKids))
a
## [1] 1.4559068 0.7154742
b
## [1] 0 1
x
## [1] 1.4559068 0.7154742
y
## [1] 0 1
  1. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
newaffairs$NumberofAffairs[newaffairs$NumberofAffairs==0] <- "Zero"
newaffairs$NumberofAffairs[newaffairs$NumberofAffairs==1] <- "One"
newaffairs$NumberofAffairs[newaffairs$NumberofAffairs==2] <- "Two"
newaffairs$NumberofAffairs[newaffairs$NumberofAffairs==3] <- "Three"
  1. Display enough rows to see examples of all of steps 1-5 above.
head(newaffairs, n=25)
##    NumberofAffairs NumberofKids VeryUnhappy Unhappy Average Happy
## 1             Zero            0           0       0       0     1
## 2             Zero            0           0       0       0     1
## 3            Three            0           0       0       0     1
## 4             Zero            1           0       0       0     1
## 5            Three            1           0       0       0     0
## 6             Zero            1           0       0       0     0
## 7             Zero            0           0       0       1     0
## 8             Zero            0           0       0       0     0
## 9                7            1           0       1       0     0
## 10            Zero            0           0       0       1     0
## 11            Zero            1           0       0       0     1
## 12            Zero            1           0       1       0     0
## 13            Zero            0           0       0       0     0
## 14            Zero            1           0       1       0     0
## 15              12            1           0       1       0     0
## 16            Zero            1           0       0       0     1
## 17            Zero            1           0       0       0     1
## 18             One            0           0       0       0     0
## 19             One            1           0       0       0     0
## 20            Zero            0           0       0       0     1
## 21            Zero            0           0       0       0     1
## 22            Zero            1           0       0       0     0
## 23            Zero            1           0       0       1     0
## 24            Zero            0           0       0       0     1
## 25            Zero            0           0       0       0     0
##    VeryHappy
## 1          0
## 2          0
## 3          0
## 4          0
## 5          1
## 6          1
## 7          0
## 8          1
## 9          0
## 10         0
## 11         0
## 12         0
## 13         1
## 14         0
## 15         0
## 16         0
## 17         0
## 18         1
## 19         1
## 20         0
## 21         0
## 22         1
## 23         0
## 24         0
## 25         1
  1. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
MyDataBonus <- read.csv(file="https://github.com/che10vek/R-HW2/blob/master/affairs.csv", header=TRUE, sep=",")
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : EOF within quoted string
summary(MyData)
##        X          naffairs           kids           vryunhap      
##  Min.   :  1   Min.   : 0.000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:151   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :301   Median : 0.000   Median :1.0000   Median :0.00000  
##  Mean   :301   Mean   : 1.456   Mean   :0.7155   Mean   :0.02662  
##  3rd Qu.:451   3rd Qu.: 0.000   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :601   Max.   :12.000   Max.   :1.0000   Max.   :1.00000  
##      unhap           avgmarr           hapavg           vryhap     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.1098   Mean   :0.1547   Mean   :0.3228   Mean   :0.386  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##     antirel            notrel          slghtrel          smerel      
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.07987   Mean   :0.2729   Mean   :0.2146   Mean   :0.3161  
##  3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      vryrel          yrsmarr1          yrsmarr2         yrsmarr3     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0.1165   Mean   :0.08652   Mean   :0.1464   Mean   :0.1747  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##     yrsmarr4         yrsmarr5         yrsmarr6     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1364   Mean   :0.1165   Mean   :0.3394  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000