By Md Forhad Akbar
July 28, 2019
theUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/airquality.csv"
nyairquality<- read.table(file = theUrl, header = TRUE, sep = ",")
head(nyairquality)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6
#Summary Statistics for the main data table, nyairquality
summary(nyairquality)
## X Ozone Solar.R Wind
## Min. : 1 Min. : 1.00 Min. : 7.0 Min. : 1.700
## 1st Qu.: 39 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400
## Median : 77 Median : 31.50 Median :205.0 Median : 9.700
## Mean : 77 Mean : 42.13 Mean :185.9 Mean : 9.958
## 3rd Qu.:115 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500
## Max. :153 Max. :168.00 Max. :334.0 Max. :20.700
## NA's :37 NA's :7
## Temp Month Day
## Min. :56.00 Min. :5.000 Min. : 1.0
## 1st Qu.:72.00 1st Qu.:6.000 1st Qu.: 8.0
## Median :79.00 Median :7.000 Median :16.0
## Mean :77.88 Mean :6.993 Mean :15.8
## 3rd Qu.:85.00 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :97.00 Max. :9.000 Max. :31.0
##
#Mean and Median of column (attribute) "Wind" (from the main table)
mean(nyairquality$Wind)
## [1] 9.957516
median(nyairquality$Wind)
## [1] 9.7
#Mean and Median of column (attribute) "Temp" (from the main table)
mean(nyairquality$Temp)
## [1] 77.88235
median(nyairquality$Temp)
## [1] 79
nyairqualityDframe<- data.frame (nyairquality[1:20, c(1, 4:5)])
nyairqualityDframe
## X Wind Temp
## 1 1 7.4 67
## 2 2 8.0 72
## 3 3 12.6 74
## 4 4 11.5 62
## 5 5 14.3 56
## 6 6 14.9 66
## 7 7 8.6 65
## 8 8 13.8 59
## 9 9 20.1 61
## 10 10 8.6 69
## 11 11 6.9 74
## 12 12 9.7 69
## 13 13 9.2 66
## 14 14 10.9 68
## 15 15 13.2 58
## 16 16 11.5 64
## 17 17 12.0 66
## 18 18 18.4 57
## 19 19 11.5 68
## 20 20 9.7 62
###3. Create new column names for the new data frame.
names(nyairqualityDframe) <- c("Sequence", "DFrame.Wind", "DFrame.Temp")
###4.Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare
#Summary Statistics for the data frame, nyairqualityDframe
summary(nyairqualityDframe)
## Sequence DFrame.Wind DFrame.Temp
## Min. : 1.00 Min. : 6.90 Min. :56.00
## 1st Qu.: 5.75 1st Qu.: 9.05 1st Qu.:61.75
## Median :10.50 Median :11.50 Median :66.00
## Mean :10.50 Mean :11.64 Mean :65.15
## 3rd Qu.:15.25 3rd Qu.:13.35 3rd Qu.:68.25
## Max. :20.00 Max. :20.10 Max. :74.00
#Mean and Median of column (attribute) "Wind" (from the data frame)
mean(nyairqualityDframe$DFrame.Wind)
## [1] 11.64
median(nyairqualityDframe$DFrame.Wind)
## [1] 11.5
#Mean and Median of column (attribute) "Temp" (from the data frame)
mean(nyairqualityDframe$DFrame.Temp)
## [1] 65.15
median(nyairqualityDframe$DFrame.Temp)
## [1] 66
####Comparison Comment 1: I chose two attributes (Wind and Temp) and 20 observations from the main data table NYAirQ to create a subset data, called nyairqualityDframe.
####Comparison Comment 2: The mean and median of Wind the subset data (DFrame.wind) is 11.64 and 11.50, respectively. Both of which are higher than those of the main data table (Wind) – 9.95 and 9.70, respectively.
####Comparison Comment 3: The mean and median of Temp the subset data (DFrame.Temp) is 65.15 and 66.00, respectively. Both of which are lower than those of the main data table (Temp) – 77.89 and 79, respectively.
####Comparison Comment 4: The results of subset means and medians are clearly bias due to my selection of only 20 first data rows with no randomization.
###*5. For at least 3 values in a column please rename so that every value in that column is renamed.For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
nyairquality$Month[nyairquality$Month == 5] <- "May"
nyairquality$Month[nyairquality$Month == 6] <- "June"
nyairquality$Month[nyairquality$Month == 9] <- "September"
###6. Display enough rows to see examples of all of steps 1-5 above.
head(nyairquality)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 May 1
## 2 2 36 118 8.0 72 May 2
## 3 3 12 149 12.6 74 May 3
## 4 4 18 313 11.5 62 May 4
## 5 5 NA NA 14.3 56 May 5
## 6 6 28 NA 14.9 66 May 6
nyairquality[50:55, 1:7]
## X Ozone Solar.R Wind Temp Month Day
## 50 50 12 120 11.5 73 June 19
## 51 51 13 137 10.3 76 June 20
## 52 52 NA 150 6.3 77 June 21
## 53 53 NA 59 1.7 76 June 22
## 54 54 NA 91 4.6 76 June 23
## 55 55 NA 250 6.3 76 June 24
tail(nyairquality)
## X Ozone Solar.R Wind Temp Month Day
## 148 148 14 20 16.6 63 September 25
## 149 149 30 193 6.9 70 September 26
## 150 150 NA 145 13.2 77 September 27
## 151 151 14 191 14.3 75 September 28
## 152 152 18 131 8.0 76 September 29
## 153 153 20 223 11.5 68 September 30
theUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/airquality.csv"
nyairquality<- read.table(file = theUrl, header = TRUE, sep = ",")
head(nyairquality)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6