#1. What is a model? “Scientific modelling is a scientific activity, the aim of which is to make a
#particular part or feature of the world easier to understand, define, quantify,
#visualize, or simulate by referencing it to existing and usually commonly accepted knowledge.
#2. What are the five groups of tasks of modeling in data mining?(i) exploratory data analysis; (ii) dependency modeling; (iii) clustering; (iv) anomaly detection; and (v) predictive analytics
#3 Typically, what does a data miner do? Search for interesting, unexpected, and useful relationships
# in a dataset
#4. Most data mining techniques can be bifurcated into groups. What are those techniques? (i) searching for relationships among the features (columns) describing the cases in a dataset
# (e.g. anytime some patient shows some set of symptoms, described by some #feature values,
# the diagnostic of a medical doctor is a certain disease); or (ii) searching for relationships
# among the observations (rows) of the dataset (e.g. a certain subset of products (rows) show
# a very similar sales pattern across a set of stores; or a certain transaction (a row) is very
# different from all other transactions).
#5. What is a main goal of exploratory data analysis? To provide useful summaries of a dataset that highlight some characteristics of the data that the users may find useful
# #6. Most datasets have a dimensionality that makes it very difficult for a standard user to inspect the full data and find interesting properties of these data. TRUE or FALSE? TRUE
# '
#7. What are data summaries?
# An overview of key properties of a dataset.
#8. The summarise() function is a function of which package?dplyr
#9. (a) Run the following code (below). Study the data. Explain the #dataset.
library (DMwR2)
## Warning: package 'DMwR2' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library (dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data (algae)
algae
## # A tibble: 200 × 18
## season size speed mxPH mnO2 Cl NO3 NH4 oPO4 PO4 Chla a1
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 winter small medium 8 9.8 60.8 6.24 578 105 170 50 0
## 2 spring small medium 8.35 8 57.8 1.29 370 429. 559. 1.3 1.4
## 3 autumn small medium 8.1 11.4 40.0 5.33 347. 126. 187. 15.6 3.3
## 4 spring small medium 8.07 4.8 77.4 2.30 98.2 61.2 139. 1.4 3.1
## 5 autumn small medium 8.06 9 55.4 10.4 234. 58.2 97.6 10.5 9.2
## 6 winter small high 8.25 13.1 65.8 9.25 430 18.2 56.7 28.4 15.1
## 7 summer small high 8.15 10.3 73.2 1.54 110 61.2 112. 3.2 2.4
## 8 autumn small high 8.05 10.6 59.1 4.99 206. 44.7 77.4 6.9 18.2
## 9 winter small medium 8.7 3.4 22.0 0.886 103. 36.3 71 5.54 25.4
## 10 winter small high 7.93 9.9 8 1.39 5.8 27.2 46.6 0.8 17
## # ℹ 190 more rows
## # ℹ 6 more variables: a2 <dbl>, a3 <dbl>, a4 <dbl>, a5 <dbl>, a6 <dbl>,
## # a7 <dbl>
data (iris)
iris
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
summary (algae)
## season size speed mxPH mnO2
## autumn:40 large :45 high :84 Min. :5.600 Min. : 1.500
## spring:53 medium:84 low :33 1st Qu.:7.700 1st Qu.: 7.725
## summer:45 small :71 medium:83 Median :8.060 Median : 9.800
## winter:62 Mean :8.012 Mean : 9.118
## 3rd Qu.:8.400 3rd Qu.:10.800
## Max. :9.700 Max. :13.400
## NA's :1 NA's :2
## Cl NO3 NH4 oPO4
## Min. : 0.222 Min. : 0.050 Min. : 5.00 Min. : 1.00
## 1st Qu.: 10.981 1st Qu.: 1.296 1st Qu.: 38.33 1st Qu.: 15.70
## Median : 32.730 Median : 2.675 Median : 103.17 Median : 40.15
## Mean : 43.636 Mean : 3.282 Mean : 501.30 Mean : 73.59
## 3rd Qu.: 57.824 3rd Qu.: 4.446 3rd Qu.: 226.95 3rd Qu.: 99.33
## Max. :391.500 Max. :45.650 Max. :24064.00 Max. :564.60
## NA's :10 NA's :2 NA's :2 NA's :2
## PO4 Chla a1 a2
## Min. : 1.00 Min. : 0.200 Min. : 0.00 Min. : 0.000
## 1st Qu.: 41.38 1st Qu.: 2.000 1st Qu.: 1.50 1st Qu.: 0.000
## Median :103.29 Median : 5.475 Median : 6.95 Median : 3.000
## Mean :137.88 Mean : 13.971 Mean :16.92 Mean : 7.458
## 3rd Qu.:213.75 3rd Qu.: 18.308 3rd Qu.:24.80 3rd Qu.:11.375
## Max. :771.60 Max. :110.456 Max. :89.80 Max. :72.600
## NA's :2 NA's :12
## a3 a4 a5 a6
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 1.550 Median : 0.000 Median : 1.900 Median : 0.000
## Mean : 4.309 Mean : 1.992 Mean : 5.064 Mean : 5.964
## 3rd Qu.: 4.925 3rd Qu.: 2.400 3rd Qu.: 7.500 3rd Qu.: 6.925
## Max. :42.800 Max. :44.600 Max. :44.400 Max. :77.600
##
## a7
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 1.000
## Mean : 2.495
## 3rd Qu.: 2.400
## Max. :31.600
##
summary (iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
#(b) What is the algae dataset about?This data set contains #observations on 11 variables as well as the concentration #levels of 7 harmful algae. Values were measured in several #European rivers. The 11 predictor variables include 3 #contextual variables (season, size and speed) describing the #water sample, plus 8 chemical concentration measurements.
#(c) Explain the characteristics of the iris dataset? #This famous (Fisher’s or Anderson’s) iris data set gives the #measurements in centimeters of the variables sepal length and #width and petal length and width, respectively, for 50 flowers #from each of 3 species of iris. The species are Iris setosa, #versicolor, and virginica
#(d) Did you discover any correlation between any pair of #features in the iris dataset? Sepal length seems to be related to Petal length and Petal width.
cor(iris[, 1:4])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
#10 What does the summarise() function do? Used to apply any #function that produces a scalar value to any column of a data #frame table.
#11 We can use the functions, summarise_each() and funs(), to #perform what kind of task? Apply a set of functions to all #columnsof a data frame table.
#12 What is the task of the group_by() function? This #function is included in which package? can be used to form #sub-groups of a dataset using all combinations of the values of #one or more nominal variables. It is found in dplyr.
#13 Which function will you use if you want to study potential #differences among the sub-groups? group_by() followed by summarise()
#14 The top algorithm/code chunk on page 90 (Code 4) gives us a way to create a function to obtain the mode of a variable. Go through this algorithm. Now, replace
Mode <- function(x, na.rm = FALSE) {
if(na.rm) x <- x[!is.na(x)]
ux <- unique(x)
return(ux[which.max(tabulate(match(x, ux)))])
}
Mode(algae$mxPH, na.rm=TRUE)
## [1] 8
Mode(algae$season)
## [1] winter
## Levels: autumn spring summer winter
#“algae\(mxPh” with “iris\)Sepal.Length” and
#“algae\(season” with “iris\)Petal.Length”
Mode <- function(x, na.rm = FALSE) {
if(na.rm) x <- x[!is.na(x)]
ux <- unique(x)
return(ux[which.max(tabulate(match(x, ux)))])
}
Mode(iris$Sepal.Length, na.rm=TRUE)
## [1] 5
Mode(iris$Petal.Length)
## [1] 1.4
#Copy and/or take a screenshot your results for both and include them in this assignment (I only need the first 20 to 40 rows of each sub-group).
iris$Sepal.Length
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
iris$Petal.Length
## [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
## [19] 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
## [37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0
## [55] 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0
## [73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0
## [91] 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3
## [109] 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0
## [127] 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
## [145] 5.7 5.2 5.0 5.2 5.4 5.1
#15. Explain the centralValue() function. What does it do? #used to obtain the more adequate statistic of centrality of a given sample #of values. It will return the median in the case of numeric variables and #the mode for nominal variables
#16 (a) Explain the inter-quartile range (IQR). Is the interval that # contains 50% of the most central values of a continuous variable.
#17 Which measure of spread, or variability, is more susceptible to outliers? Range
#18 (a) Using the Iris dataset, obtain the quantiles of the variable (or feature), Length, by Species.
#library(dplyr)
data(iris)
group_by(iris,Species) %>% summarise(qs=quantile(Sepal.Length))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 2
## # Groups: Species [3]
## Species qs
## <fct> <dbl>
## 1 setosa 4.3
## 2 setosa 4.8
## 3 setosa 5
## 4 setosa 5.2
## 5 setosa 5.8
## 6 versicolor 4.9
## 7 versicolor 5.6
## 8 versicolor 5.9
## 9 versicolor 6.3
## 10 versicolor 7
## 11 virginica 4.9
## 12 virginica 6.22
## 13 virginica 6.5
## 14 virginica 6.9
## 15 virginica 7.9
library(dplyr)
data(iris)
group_by(iris,Species) %>% summarise(qs=quantile(Petal.Length))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 2
## # Groups: Species [3]
## Species qs
## <fct> <dbl>
## 1 setosa 1
## 2 setosa 1.4
## 3 setosa 1.5
## 4 setosa 1.58
## 5 setosa 1.9
## 6 versicolor 3
## 7 versicolor 4
## 8 versicolor 4.35
## 9 versicolor 4.6
## 10 versicolor 5.1
## 11 virginica 4.5
## 12 virginica 5.1
## 13 virginica 5.55
## 14 virginica 5.88
## 15 virginica 6.9
#library(base)
#library(dplyr)
library(stats)
library(sp)
## Warning: package 'sp' was built under R version 4.3.3
data("iris")
x <- aggregate(iris$Sepal.Length, list(Species=iris$Species), quantile)
print(x)
## Species x.0% x.25% x.50% x.75% x.100%
## 1 setosa 4.300 4.800 5.000 5.200 5.800
## 2 versicolor 4.900 5.600 5.900 6.300 7.000
## 3 virginica 4.900 6.225 6.500 6.900 7.900
#20 (a) What are “pipes?” A way to chain multiple #operations together in a concise and expressive way
#Use other resources (Google, …) to help you answer these questions.
#21. In Code 9, the second chunk of code from the top of page 92, interpret
#“Species = iris$Species,” which is in the second argument of the #aggregate ( ) function. What does it all mean? Species is set #to all of the levels of the variable species
#22 In Code 10, the third chunk of code from the top of P.92, interpret all three arguments of the aggregate ( ) What do they all mean? #Sepal.Length ~ Species, data=iris-Dataset for analysis, quantile is function
#23 In some datasets a column (or a feature, or a variable) may contain symbols such as “?” in some of its rows (Look at Section 3.3.1.4 on Pp. 60 and 61). If we use the class ( ) function on that column, we are sure to get the column labeled as “function.” However, assume we want this column to be labeled “integer.” Which function can we use to parse a column, or a vector of values, from “factors” to “integers?” # as.integer() or parse.integer()
#24 (a) What is the following code used for?
#Find the number of missing values by row with apply() then uses sum()to #show the total missing observations in the dataset - algae
#25 (a) What method is used to detect a univariate outlier?
# boxplot method
#(b) What does that method state? # Observations outside of the limits is an outlier of the data
#26 What sort of results does the summary ( ) function yield when applied to a dataset? #Summary of data, frequency of categorical variables, summary statistic-continuous variables, missing variables
#Provides a global summary of data, from package-Hmisc
#27(b) Which package contains the function, describe ( )? #From package - Hmisc
#28 Give a definition of the term, parse. #Parse in R, gives list of expression used with string data to identify its structure.