#1.    What is a model? “Scientific modelling is a scientific activity, the aim of which is to make a 
#particular part or feature of the world easier to understand, define, quantify,
#visualize, or simulate by referencing it to existing and usually commonly accepted knowledge. 

#2.    What are the five groups of tasks of modeling in data mining?(i) exploratory data analysis; (ii) dependency modeling; (iii) clustering; (iv) anomaly detection; and (v) predictive analytics


#3    Typically, what does a data miner do? Search for interesting, unexpected, and useful relationships
# in a dataset

#4.    Most data mining techniques can be bifurcated into groups. What are those techniques? (i) searching for relationships among the features (columns) describing the cases in a dataset
# (e.g. anytime some patient shows some set of symptoms, described by some #feature values,
# the diagnostic of a medical doctor is a certain disease); or (ii) searching for relationships
# among the observations (rows) of the dataset (e.g. a certain subset of products (rows) show
# a very similar sales pattern across a set of stores; or a certain transaction (a row) is very
# different from all other transactions).

#5.    What is a main goal of exploratory data analysis?  To provide useful summaries of a dataset that highlight some characteristics of the data that the users may find useful

# #6.    Most datasets have a dimensionality that makes it very difficult for a standard user to inspect the full data and find interesting properties of these data. TRUE or FALSE? TRUE
# '

#7.    What are data summaries?
# An overview of key properties of a dataset.

#8.    The summarise() function is a function of which package?dplyr

#9.    (a) Run the following code (below).  Study the data.  Explain the #dataset.

library (DMwR2)

## Warning: package 'DMwR2' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library (dplyr)

## Warning: package 'dplyr' was built under R version 4.3.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data (algae)
algae

## # A tibble: 200 × 18
##    season size  speed   mxPH  mnO2    Cl    NO3   NH4  oPO4   PO4  Chla    a1
##    <fct>  <fct> <fct>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 winter small medium  8      9.8  60.8  6.24  578   105   170   50      0  
##  2 spring small medium  8.35   8    57.8  1.29  370   429.  559.   1.3    1.4
##  3 autumn small medium  8.1   11.4  40.0  5.33  347.  126.  187.  15.6    3.3
##  4 spring small medium  8.07   4.8  77.4  2.30   98.2  61.2 139.   1.4    3.1
##  5 autumn small medium  8.06   9    55.4 10.4   234.   58.2  97.6 10.5    9.2
##  6 winter small high    8.25  13.1  65.8  9.25  430    18.2  56.7 28.4   15.1
##  7 summer small high    8.15  10.3  73.2  1.54  110    61.2 112.   3.2    2.4
##  8 autumn small high    8.05  10.6  59.1  4.99  206.   44.7  77.4  6.9   18.2
##  9 winter small medium  8.7    3.4  22.0  0.886 103.   36.3  71    5.54  25.4
## 10 winter small high    7.93   9.9   8    1.39    5.8  27.2  46.6  0.8   17  
## # ℹ 190 more rows
## # ℹ 6 more variables: a2 <dbl>, a3 <dbl>, a4 <dbl>, a5 <dbl>, a6 <dbl>,
## #   a7 <dbl>

data (iris)
iris

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica

summary (algae)

##     season       size       speed         mxPH            mnO2       
##  autumn:40   large :45   high  :84   Min.   :5.600   Min.   : 1.500  
##  spring:53   medium:84   low   :33   1st Qu.:7.700   1st Qu.: 7.725  
##  summer:45   small :71   medium:83   Median :8.060   Median : 9.800  
##  winter:62                           Mean   :8.012   Mean   : 9.118  
##                                      3rd Qu.:8.400   3rd Qu.:10.800  
##                                      Max.   :9.700   Max.   :13.400  
##                                      NA's   :1       NA's   :2       
##        Cl               NO3              NH4                oPO4       
##  Min.   :  0.222   Min.   : 0.050   Min.   :    5.00   Min.   :  1.00  
##  1st Qu.: 10.981   1st Qu.: 1.296   1st Qu.:   38.33   1st Qu.: 15.70  
##  Median : 32.730   Median : 2.675   Median :  103.17   Median : 40.15  
##  Mean   : 43.636   Mean   : 3.282   Mean   :  501.30   Mean   : 73.59  
##  3rd Qu.: 57.824   3rd Qu.: 4.446   3rd Qu.:  226.95   3rd Qu.: 99.33  
##  Max.   :391.500   Max.   :45.650   Max.   :24064.00   Max.   :564.60  
##  NA's   :10        NA's   :2        NA's   :2          NA's   :2       
##       PO4              Chla               a1              a2        
##  Min.   :  1.00   Min.   :  0.200   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 41.38   1st Qu.:  2.000   1st Qu.: 1.50   1st Qu.: 0.000  
##  Median :103.29   Median :  5.475   Median : 6.95   Median : 3.000  
##  Mean   :137.88   Mean   : 13.971   Mean   :16.92   Mean   : 7.458  
##  3rd Qu.:213.75   3rd Qu.: 18.308   3rd Qu.:24.80   3rd Qu.:11.375  
##  Max.   :771.60   Max.   :110.456   Max.   :89.80   Max.   :72.600  
##  NA's   :2        NA's   :12                                        
##        a3               a4               a5               a6        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 1.550   Median : 0.000   Median : 1.900   Median : 0.000  
##  Mean   : 4.309   Mean   : 1.992   Mean   : 5.064   Mean   : 5.964  
##  3rd Qu.: 4.925   3rd Qu.: 2.400   3rd Qu.: 7.500   3rd Qu.: 6.925  
##  Max.   :42.800   Max.   :44.600   Max.   :44.400   Max.   :77.600  
##                                                                     
##        a7        
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 1.000  
##  Mean   : 2.495  
##  3rd Qu.: 2.400  
##  Max.   :31.600  
##

summary (iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

#(b) What is the algae dataset about?This data set contains #observations on 11 variables as well as the concentration #levels of 7 harmful algae. Values were measured in several #European rivers. The 11 predictor variables include 3 #contextual variables (season, size and speed) describing the #water sample, plus 8 chemical concentration measurements.

#(c) Explain the characteristics of the iris dataset? #This famous (Fisher’s or Anderson’s) iris data set gives the #measurements in centimeters of the variables sepal length and #width and petal length and width, respectively, for 50 flowers #from each of 3 species of iris. The species are Iris setosa, #versicolor, and virginica

#(d) Did you discover any correlation between any pair of #features in the iris dataset? Sepal length seems to be related to Petal length and Petal width.

cor(iris[, 1:4])

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

#10 What does the summarise() function do? Used to apply any #function that produces a scalar value to any column of a data #frame table.

#11 We can use the functions, summarise_each() and funs(), to #perform what kind of task? Apply a set of functions to all #columnsof a data frame table.

#12 What is the task of the group_by() function? This #function is included in which package? can be used to form #sub-groups of a dataset using all combinations of the values of #one or more nominal variables. It is found in dplyr.

#13 Which function will you use if you want to study potential #differences among the sub-groups? group_by() followed by summarise()

#14 The top algorithm/code chunk on page 90 (Code 4) gives us a way to create a function to obtain the mode of a variable. Go through this algorithm. Now, replace

Mode <- function(x, na.rm = FALSE) {
  if(na.rm) x <- x[!is.na(x)]
  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}
Mode(algae$mxPH, na.rm=TRUE)

## [1] 8

Mode(algae$season)

## [1] winter
## Levels: autumn spring summer winter

#“algae$mxPh” with “iris$Sepal.Length” and

#“algae$season” with “iris$Petal.Length”

Mode <- function(x, na.rm = FALSE) {
  if(na.rm) x <- x[!is.na(x)]
  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}
Mode(iris$Sepal.Length, na.rm=TRUE)

## [1] 5

Mode(iris$Petal.Length)

## [1] 1.4

#Copy and/or take a screenshot your results for both and include them in this assignment (I only need the first 20 to 40 rows of each sub-group).

iris$Sepal.Length

##   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
##  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
##  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
##  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
##  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
##  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9

iris$Petal.Length

##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
##  [19] 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
##  [37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0
##  [55] 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0
##  [73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0
##  [91] 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3
## [109] 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0
## [127] 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
## [145] 5.7 5.2 5.0 5.2 5.4 5.1

#15. Explain the centralValue() function. What does it do? #used to obtain the more adequate statistic of centrality of a given sample #of values. It will return the median in the case of numeric variables and #the mode for nominal variables

#16 (a) Explain the inter-quartile range (IQR). Is the interval that # contains 50% of the most central values of a continuous variable.

(b) Explain the x-quartile? The x-quantile is the value below which #there are x% of the observed values.

(c) What does a large value of the IQR mean? A large value of the IQR

means that these central values are spread over a large range

(d) What does a small value of the IQR mean? where a small value represents a very packed set of values.

#17 Which measure of spread, or variability, is more susceptible to outliers? Range

#18 (a) Using the Iris dataset, obtain the quantiles of the variable (or feature), Length, by Species.

#library(dplyr)

data(iris)
group_by(iris,Species) %>% summarise(qs=quantile(Sepal.Length))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.

## # A tibble: 15 × 2
## # Groups:   Species [3]
##    Species       qs
##    <fct>      <dbl>
##  1 setosa      4.3 
##  2 setosa      4.8 
##  3 setosa      5   
##  4 setosa      5.2 
##  5 setosa      5.8 
##  6 versicolor  4.9 
##  7 versicolor  5.6 
##  8 versicolor  5.9 
##  9 versicolor  6.3 
## 10 versicolor  7   
## 11 virginica   4.9 
## 12 virginica   6.22
## 13 virginica   6.5 
## 14 virginica   6.9 
## 15 virginica   7.9

library(dplyr)
data(iris)
group_by(iris,Species) %>% summarise(qs=quantile(Petal.Length))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.

## # A tibble: 15 × 2
## # Groups:   Species [3]
##    Species       qs
##    <fct>      <dbl>
##  1 setosa      1   
##  2 setosa      1.4 
##  3 setosa      1.5 
##  4 setosa      1.58
##  5 setosa      1.9 
##  6 versicolor  3   
##  7 versicolor  4   
##  8 versicolor  4.35
##  9 versicolor  4.6 
## 10 versicolor  5.1 
## 11 virginica   4.5 
## 12 virginica   5.1 
## 13 virginica   5.55
## 14 virginica   5.88
## 15 virginica   6.9

#library(base)
#library(dplyr)
library(stats)
library(sp)

## Warning: package 'sp' was built under R version 4.3.3

data("iris")
x  <- aggregate(iris$Sepal.Length, list(Species=iris$Species), quantile)
print(x)

##      Species  x.0% x.25% x.50% x.75% x.100%
## 1     setosa 4.300 4.800 5.000 5.200  5.800
## 2 versicolor 4.900 5.600 5.900 6.300  7.000
## 3  virginica 4.900 6.225 6.500 6.900  7.900

#20 (a) What are “pipes?” A way to chain multiple #operations together in a concise and expressive way

(b) What is the “piping syntax?” pipe(description, #open = ““, encoding = getOption(”encoding”))

(c) What is the “pipe operator” (% > %)? %>%

#Use other resources (Google, …) to help you answer these questions.

#21. In Code 9, the second chunk of code from the top of page 92, interpret

#“Species = iris$Species,” which is in the second argument of the #aggregate ( ) function. What does it all mean? Species is set #to all of the levels of the variable species

#22 In Code 10, the third chunk of code from the top of P.92, interpret all three arguments of the aggregate ( ) What do they all mean? #Sepal.Length ~ Species, data=iris-Dataset for analysis, quantile is function

#23 In some datasets a column (or a feature, or a variable) may contain symbols such as “?” in some of its rows (Look at Section 3.3.1.4 on Pp. 60 and 61). If we use the class ( ) function on that column, we are sure to get the column labeled as “function.” However, assume we want this column to be labeled “integer.” Which function can we use to parse a column, or a vector of values, from “factors” to “integers?” # as.integer() or parse.integer()

#24 (a) What is the following code used for?

data (algae, package = “DMwR2”)

nasRow <- apply (algae, 1, function(r) sum(is.na(r)))

cat (“The algae dataset contains”, sum (nasRow), “NA values.”)

#Find the number of missing values by row with apply() then uses sum()to #show the total missing observations in the dataset - algae

(b) What results are we looking for?

Missing values

#25 (a) What method is used to detect a univariate outlier?
# boxplot method

#(b) What does that method state? # Observations outside of the limits is an outlier of the data

#26 What sort of results does the summary ( ) function yield when applied to a dataset? #Summary of data, frequency of categorical variables, summary statistic-continuous variables, missing variables

27(a) For what is the function, describe ( ) used?

#Provides a global summary of data, from package-Hmisc

#27(b) Which package contains the function, describe ( )? #From package - Hmisc

#28 Give a definition of the term, parse. #Parse in R, gives list of expression used with string data to identify its structure.

Assignment 4

Walter James

2025-03-19