Alternative Assessment Question 1

Name : Masyitah Humaira Binti Mohd Hafidz

Matric Number : U2000518

Question a

EDA

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.5     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
summary(rock)
##       area            peri            shape              perm        
##  Min.   : 1016   Min.   : 308.6   Min.   :0.09033   Min.   :   6.30  
##  1st Qu.: 5305   1st Qu.:1414.9   1st Qu.:0.16226   1st Qu.:  76.45  
##  Median : 7487   Median :2536.2   Median :0.19886   Median : 130.50  
##  Mean   : 7188   Mean   :2682.2   Mean   :0.21811   Mean   : 415.45  
##  3rd Qu.: 8870   3rd Qu.:3989.5   3rd Qu.:0.26267   3rd Qu.: 777.50  
##  Max.   :12212   Max.   :4864.2   Max.   :0.46413   Max.   :1300.00
glimpse(rock)
## Rows: 48
## Columns: 4
## $ area  <int> 4990, 7002, 7558, 7352, 7943, 7979, 9333, 8209, 8393, 6425, 9364~
## $ peri  <dbl> 2791.90, 3892.60, 3930.66, 3869.32, 3948.54, 4010.15, 4345.75, 4~
## $ shape <dbl> 0.0903296, 0.1486220, 0.1833120, 0.1170630, 0.1224170, 0.1670450~
## $ perm  <dbl> 6.3, 6.3, 6.3, 6.3, 17.1, 17.1, 17.1, 17.1, 119.0, 119.0, 119.0,~
## [1] 1016
## [1] 12212
## [1] 7487
## [1] 7187.729
## [1] 2683.849
## [1]  1016 12212
##       0%      25%      50%      75%     100% 
##  1016.00  5305.25  7487.00  8869.50 12212.00
## [1] 3564.25
## [1] 7203045
## [1] 308.642
## [1] 4864.22
## [1] 2536.195
## [1] 2682.212
## [1] 1431.661
## [1]  308.642 4864.220
##       0%      25%      50%      75%     100% 
##  308.642 1414.907 2536.195 3989.523 4864.220
## [1] 2574.615
## [1] 2049654
## [1] 6.3
## [1] 1300
## [1] 130.5
## [1] 415.45
## [1] 437.8182
## [1]    6.3 1300.0
##      0%     25%     50%     75%    100% 
##    6.30   76.45  130.50  777.50 1300.00
## [1] 701.05
## [1] 191684.8
## [1] 0.0903296
## [1] 0.464125
## [1] 0.198862
## [1] 0.2181104
## [1] 0.08349645
## [1] 0.0903296 0.4641250
##        0%       25%       50%       75%      100% 
## 0.0903296 0.1622618 0.1988620 0.2626700 0.4641250
## [1] 0.1004083
## [1] 0.006971657

Plot

Codebook

## Loading required package: lattice
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'memisc'
## The following objects are masked from 'package:dplyr':
## 
##     collect, recode, rename, syms
## The following object is masked from 'package:purrr':
## 
##     %@%
## The following object is masked from 'package:tibble':
## 
##     view
## The following object is masked from 'package:ggplot2':
## 
##     syms
## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
## 
##     as.array
typeof(x)
## [1] "list"
x
## 
## Data set with 48 observations and 4 variables
## 
##    rock.area rock.peri rock.shape rock.perm
##  1      4990   2791.90  0.0903296       6.3
##  2      7002   3892.60  0.1486220       6.3
##  3      7558   3930.66  0.1833120       6.3
##  4      7352   3869.32  0.1170630       6.3
##  5      7943   3948.54  0.1224170      17.1
##  6      7979   4010.15  0.1670450      17.1
##  7      9333   4345.75  0.1896510      17.1
##  8      8209   4344.75  0.1641270      17.1
##  9      8393   3682.04  0.2036540     119.0
## 10      6425   3098.65  0.1623940     119.0
## 11      9364   4480.05  0.1509440     119.0
## 12      8624   3986.24  0.1481410     119.0
## 13     10651   4036.54  0.2285950      82.4
## 14      8868   3518.04  0.2316230      82.4
## 15      9417   3999.37  0.1725670      82.4
## 16      8874   3629.07  0.1534810      82.4
## 17     10962   4608.66  0.2043140      58.6
## 18     10743   4787.62  0.2627270      58.6
## 19     11878   4864.22  0.2000710      58.6
## 20      9867   4479.41  0.1448100      58.6
## 21      7838   3428.74  0.1138520     142.0
## 22     11876   4353.14  0.2910290     142.0
## 23     12212   4697.65  0.2400770     142.0
## 24      8233   3518.44  0.1618650     142.0
## 25      6360   1977.39  0.2808870     740.0
## .. ......... ......... .......... .........
## (25 of 48 observations shown)
summary(x)
##    rock.area       rock.peri        rock.shape        rock.perm      
##  Min.   : 1016   Min.   : 308.6   Min.   :0.09033   Min.   :   6.30  
##  1st Qu.: 5305   1st Qu.:1414.9   1st Qu.:0.16226   1st Qu.:  76.45  
##  Median : 7487   Median :2536.2   Median :0.19886   Median : 130.50  
##  Mean   : 7188   Mean   :2682.2   Mean   :0.21811   Mean   : 415.45  
##  3rd Qu.: 8870   3rd Qu.:3989.5   3rd Qu.:0.26267   3rd Qu.: 777.50  
##  Max.   :12212   Max.   :4864.2   Max.   :0.46413   Max.   :1300.00
sapply(rock, class)
##      area      peri     shape      perm 
## "integer" "numeric" "numeric" "numeric"
sapply(rock, range)
##       area     peri     shape   perm
## [1,]  1016  308.642 0.0903296    6.3
## [2,] 12212 4864.220 0.4641250 1300.0
codebook(x)
## ================================================================================
## 
##    rock.area
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: integer
##    Measurement: interval
## 
##         Min:  1016.000
##         Max: 12212.000
##        Mean:  7187.729
##    Std.Dev.:  2655.745
## 
## ================================================================================
## 
##    rock.peri
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  308.642
##         Max: 4864.220
##        Mean: 2682.212
##    Std.Dev.: 1416.670
## 
## ================================================================================
## 
##    rock.shape
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min: 0.090
##         Max: 0.464
##        Mean: 0.218
##    Std.Dev.: 0.083
## 
## ================================================================================
## 
##    rock.perm
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:    6.300
##         Max: 1300.000
##        Mean:  415.450
##    Std.Dev.:  433.234

Question b

Original Data frame :

##    Label Age Height Weight Money
## 1      A  13    156     43   124
## 2      B  14    158     65   176
## 3      C  15    160     45   157
## 4      D  16    176     56   197
## 5      E  14    145     55   156
## 6      F  19    187     58   147
## 7      G  18    156     49   198
## 8      H  19    158     50   167
## 9      I  16    162     62   184
## 10     J  17    172     63   159
## 11     K  19    159     59   137
## 12     L  15    165     57   180
## 13     M  20    162     55    99
## 14     N  20    169     69   109
## 15     O  18    159     68   144

i. filter( )

filter(ds, Age >= 18)
##   Label Age Height Weight Money
## 1     F  19    187     58   147
## 2     G  18    156     49   198
## 3     H  19    158     50   167
## 4     K  19    159     59   137
## 5     M  20    162     55    99
## 6     N  20    169     69   109
## 7     O  18    159     68   144

The filter( ) function will filter or show a subset of the data frame that satisfy certain condition. In this case, it only shows data that fits the condition of age is 18 or older.

ii. arrange( )

arrange(ds, Age, Height)
##    Label Age Height Weight Money
## 1      A  13    156     43   124
## 2      E  14    145     55   156
## 3      B  14    158     65   176
## 4      C  15    160     45   157
## 5      L  15    165     57   180
## 6      I  16    162     62   184
## 7      D  16    176     56   197
## 8      J  17    172     63   159
## 9      G  18    156     49   198
## 10     O  18    159     68   144
## 11     H  19    158     50   167
## 12     K  19    159     59   137
## 13     F  19    187     58   147
## 14     M  20    162     55    99
## 15     N  20    169     69   109

The arrange( ) function will sort the rows of the data frame by the values of selected features by order. In this case, it sort the rows of the data frame by age and then by height.

iii. mutate( )

mutate(ds, Height_meter = height/100, Weight_Newton = weight*10)
##    Label Age Height Weight Money Height_meter Weight_Newton
## 1      A  13    156     43   124         1.56           430
## 2      B  14    158     65   176         1.58           650
## 3      C  15    160     45   157         1.60           450
## 4      D  16    176     56   197         1.76           560
## 5      E  14    145     55   156         1.45           550
## 6      F  19    187     58   147         1.87           580
## 7      G  18    156     49   198         1.56           490
## 8      H  19    158     50   167         1.58           500
## 9      I  16    162     62   184         1.62           620
## 10     J  17    172     63   159         1.72           630
## 11     K  19    159     59   137         1.59           590
## 12     L  15    165     57   180         1.65           570
## 13     M  20    162     55    99         1.62           550
## 14     N  20    169     69   109         1.69           690
## 15     O  18    159     68   144         1.59           680

The mutate( ) function will create new variable or features. In this case, it create two new feature, height in meter by dividing the height with 100 and weight in newton by multiply the weight feature with 10.

iv. select( )

dplyr::select(ds, Age)
##    Age
## 1   13
## 2   14
## 3   15
## 4   16
## 5   14
## 6   19
## 7   18
## 8   19
## 9   16
## 10  17
## 11  19
## 12  15
## 13  20
## 14  20
## 15  18

The select( ) function will only show a specified features from the data frame. In this case, it shows only the age feature.

v. summarise( )

summarise(ds, mean = mean(Height), sd = sd(Height))
##       mean       sd
## 1 162.9333 9.931671

The summarise( ) function will create a new data frame that shows the output based on specified operation. In this case, it shows a data frame that contains the mean and standard deviation of the height feature.

-End-