Shana Green

R Bridge

HW 2

Due 07/26/20

# loading the data set
frostedflakes<- read.csv("frostedflakes.csv")

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

#calculating summary stats for all columns of data set

summary(frostedflakes)

##        X               Lab            IA400      
##  Min.   :  1.00   Min.   :31.60   Min.   :30.80  
##  1st Qu.: 25.75   1st Qu.:36.00   1st Qu.:36.10  
##  Median : 50.50   Median :37.75   Median :38.50  
##  Mean   : 50.50   Mean   :37.60   Mean   :38.22  
##  3rd Qu.: 75.25   3rd Qu.:39.10   3rd Qu.:40.20  
##  Max.   :100.00   Max.   :43.50   Max.   :45.70

#mean

mean(frostedflakes$X, na.rm = TRUE)

## [1] 50.5

mean(frostedflakes$Lab, na.rm = TRUE)

## [1] 37.596

mean(frostedflakes$IA400, ra.rm = TRUE)

## [1] 38.218

#median

median(frostedflakes$X, ra.rm = TRUE)

## [1] 50.5

median(frostedflakes$Lab, ra.rm = TRUE)

## [1] 37.75

median(frostedflakes$IA400, ra.rm = TRUE)

## [1] 38.5

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

#Create a subset extracting 50 participants from a given range 

sweetfrostedflakes<-subset(frostedflakes[c(1:50),c("X","Lab","IA400")])
View(sweetfrostedflakes)

3. Create new column names for the new data frame.

colnames(sweetfrostedflakes)<-c("Y","Lab2","JA400")
View(sweetfrostedflakes)

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(sweetfrostedflakes)

##        Y              Lab2           JA400      
##  Min.   : 1.00   Min.   :33.00   Min.   :30.80  
##  1st Qu.:13.25   1st Qu.:35.85   1st Qu.:36.10  
##  Median :25.50   Median :37.85   Median :38.20  
##  Mean   :25.50   Mean   :37.72   Mean   :38.13  
##  3rd Qu.:37.75   3rd Qu.:39.77   3rd Qu.:39.60  
##  Max.   :50.00   Max.   :43.50   Max.   :45.70

#mean

mean(sweetfrostedflakes$Y, na.rm = TRUE)

## [1] 25.5

mean(sweetfrostedflakes$Lab2, na.rm = TRUE)

## [1] 37.722

mean(sweetfrostedflakes$JA400, ra.rm = TRUE)

## [1] 38.13

#median

median(sweetfrostedflakes$Y, na.rm = TRUE)

## [1] 25.5

median(sweetfrostedflakes$Lab2, ra.rm = TRUE)

## [1] 37.85

median(sweetfrostedflakes$JA400, ra.rm = TRUE)

## [1] 38.2

**frostedflakes$X vs. sweetfrostedflakes$Y**

Mean: frostedflakes$X (50.50) almost doubles the value of sweetfrostedflakes$Y (25.50)

Median: frostedflakes$X (50.50) almost doubles the value of sweetfrostedflakes$Y (25.50)
 
The mean and median are the same for frostedflakes$X (50.50)*
The mean and median are the same for sweetfrostedflakes$Y (25.50)*

**frostedflakes$Lab1 vs. sweetfrostedflakes$Lab2**

Mean: sweetfrostedflakes$Lab2 (37.72) has a slighly higher value than frostedflakes$Lab (37.60) by 0.12

Median: sweetfrostedflakes$Lab2(37.85) has a slightly higher value than frostedflakes$Lab(37.75) by 0.10

**frostedflakes$IA400 vs. sweetfrostedflakes$JA400**

Mean: frostedflakes$IA400 (38.22) has a slighly higher value than sweetfrostedflakes$JA400 (38.13) by 0.09

Median: frostedflakes$IA400(38.50) has a slightly higher value than sweetfrostedflakes$JA400(38.20)  by 0.30

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

sweetfrostedflakes$Lab2<-replace(sweetfrostedflakes$Lab2, sweetfrostedflakes$Lab2 > 40, "Corn Flakes")

sweetfrostedflakes

##     Y        Lab2 JA400
## 1   1        36.3  35.1
## 2   2        33.2  35.9
## 3   3          39  40.1
## 4   4        37.3  35.5
## 5   5 Corn Flakes  37.9
## 6   6        38.4  39.5
## 7   7        35.8  38.5
## 8   8          36  37.9
## 9   9        37.9  41.2
## 10 10 Corn Flakes  45.7
## 11 11          40  38.3
## 12 12 Corn Flakes  42.3
## 13 13        36.6  39.0
## 14 14        33.7  30.8
## 15 15 Corn Flakes  37.3
## 16 16        38.7  39.5
## 17 17        36.2  40.3
## 18 18 Corn Flakes  42.0
## 19 19        37.8  36.9
## 20 20 Corn Flakes  41.2
## 21 21        38.9  39.3
## 22 22          36  35.6
## 23 23 Corn Flakes  40.9
## 24 24          40  37.6
## 25 25        35.5  35.5
## 26 26        34.3  35.5
## 27 27          33  32.4
## 28 28        36.9  36.1
## 29 29        36.3  36.1
## 30 30        38.5  39.0
## 31 31        35.1  38.5
## 32 32        38.7  40.0
## 33 33          34  35.4
## 34 34 Corn Flakes  40.9
## 35 35 Corn Flakes  39.4
## 36 36        38.2  38.6
## 37 37        38.3  39.6
## 38 38        37.4  39.2
## 39 39        37.5  36.4
## 40 40        36.5  36.1
## 41 41        34.8  38.1
## 42 42        38.1  39.6
## 43 43 Corn Flakes  40.8
## 44 44        35.4  37.4
## 45 45          35  37.6
## 46 46        37.9  36.0
## 47 47        39.1  37.2
## 48 48        33.3  33.0
## 49 49 Corn Flakes  41.9
## 50 50        34.9  37.9

I chose values over Lab values over 40 and replaced it with string "Corn Flakes"

6. Display enough rows to see examples of all of steps 1-5 above.

head(sweetfrostedflakes,10)

##     Y        Lab2 JA400
## 1   1        36.3  35.1
## 2   2        33.2  35.9
## 3   3          39  40.1
## 4   4        37.3  35.5
## 5   5 Corn Flakes  37.9
## 6   6        38.4  39.5
## 7   7        35.8  38.5
## 8   8          36  37.9
## 9   9        37.9  41.2
## 10 10 Corn Flakes  45.7

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

frostedflakes<- read.csv("https://raw.githubusercontent.com/sagreen131/R-Week-2/master/frostedflakes.csv")