Exploring and Cleaning…
str(cereal)
## spec_tbl_df [77 x 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:77] "100% Bran" "100% Natural Bran" "All-Bran" "All-Bran with Extra Fiber" ...
## $ mfr : chr [1:77] "N" "Q" "K" "K" ...
## $ type : chr [1:77] "C" "C" "C" "C" ...
## $ calories: num [1:77] 70 120 70 50 110 110 110 130 90 90 ...
## $ protein : num [1:77] 4 3 4 4 2 2 2 3 2 3 ...
## $ fat : num [1:77] 1 5 1 0 2 2 0 2 1 0 ...
## $ sodium : num [1:77] 130 15 260 140 200 180 125 210 200 210 ...
## $ fiber : num [1:77] 10 2 9 14 1 1.5 1 2 4 5 ...
## $ carbo : num [1:77] 5 8 7 8 14 10.5 11 18 15 13 ...
## $ sugars : num [1:77] 6 8 5 0 8 10 14 8 6 5 ...
## $ potass : num [1:77] 280 135 320 330 -1 70 30 100 125 190 ...
## $ vitamins: num [1:77] 25 0 25 25 25 25 25 25 25 25 ...
## $ shelf : num [1:77] 3 3 3 3 3 1 2 3 1 3 ...
## $ weight : num [1:77] 1 1 1 1 1 1 1 1.33 1 1 ...
## $ cups : num [1:77] 0.33 1 0.33 0.5 0.75 0.75 1 0.75 0.67 0.67 ...
## $ rating : num [1:77] 68.4 34 59.4 93.7 34.4 ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. mfr = col_character(),
## .. type = col_character(),
## .. calories = col_double(),
## .. protein = col_double(),
## .. fat = col_double(),
## .. sodium = col_double(),
## .. fiber = col_double(),
## .. carbo = col_double(),
## .. sugars = col_double(),
## .. potass = col_double(),
## .. vitamins = col_double(),
## .. shelf = col_double(),
## .. weight = col_double(),
## .. cups = col_double(),
## .. rating = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(cereal)
## name mfr type calories
## Length:77 Length:77 Length:77 Min. : 50.0
## Class :character Class :character Class :character 1st Qu.:100.0
## Mode :character Mode :character Mode :character Median :110.0
## Mean :106.9
## 3rd Qu.:110.0
## Max. :160.0
## protein fat sodium fiber
## Min. :1.000 Min. :0.000 Min. : 0.0 Min. : 0.000
## 1st Qu.:2.000 1st Qu.:0.000 1st Qu.:130.0 1st Qu.: 1.000
## Median :3.000 Median :1.000 Median :180.0 Median : 2.000
## Mean :2.545 Mean :1.013 Mean :159.7 Mean : 2.152
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:210.0 3rd Qu.: 3.000
## Max. :6.000 Max. :5.000 Max. :320.0 Max. :14.000
## carbo sugars potass vitamins
## Min. :-1.0 Min. :-1.000 Min. : -1.00 Min. : 0.00
## 1st Qu.:12.0 1st Qu.: 3.000 1st Qu.: 40.00 1st Qu.: 25.00
## Median :14.0 Median : 7.000 Median : 90.00 Median : 25.00
## Mean :14.6 Mean : 6.922 Mean : 96.08 Mean : 28.25
## 3rd Qu.:17.0 3rd Qu.:11.000 3rd Qu.:120.00 3rd Qu.: 25.00
## Max. :23.0 Max. :15.000 Max. :330.00 Max. :100.00
## shelf weight cups rating
## Min. :1.000 Min. :0.50 Min. :0.250 Min. :18.04
## 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:0.670 1st Qu.:33.17
## Median :2.000 Median :1.00 Median :0.750 Median :40.40
## Mean :2.208 Mean :1.03 Mean :0.821 Mean :42.67
## 3rd Qu.:3.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:50.83
## Max. :3.000 Max. :1.50 Max. :1.500 Max. :93.70
glimpse(cereal)
## Rows: 77
## Columns: 16
## $ name <chr> "100% Bran", "100% Natural Bran", "All-Bran", "All-Bran with ~
## $ mfr <chr> "N", "Q", "K", "K", "R", "G", "K", "G", "R", "P", "Q", "G", "~
## $ type <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "~
## $ calories <dbl> 70, 120, 70, 50, 110, 110, 110, 130, 90, 90, 120, 110, 120, 1~
## $ protein <dbl> 4, 3, 4, 4, 2, 2, 2, 3, 2, 3, 1, 6, 1, 3, 1, 2, 2, 1, 1, 3, 3~
## $ fat <dbl> 1, 5, 1, 0, 2, 2, 0, 2, 1, 0, 2, 2, 3, 2, 1, 0, 0, 0, 1, 3, 0~
## $ sodium <dbl> 130, 15, 260, 140, 200, 180, 125, 210, 200, 210, 220, 290, 21~
## $ fiber <dbl> 10.0, 2.0, 9.0, 14.0, 1.0, 1.5, 1.0, 2.0, 4.0, 5.0, 0.0, 2.0,~
## $ carbo <dbl> 5.0, 8.0, 7.0, 8.0, 14.0, 10.5, 11.0, 18.0, 15.0, 13.0, 12.0,~
## $ sugars <dbl> 6, 8, 5, 0, 8, 10, 14, 8, 6, 5, 12, 1, 9, 7, 13, 3, 2, 12, 13~
## $ potass <dbl> 280, 135, 320, 330, -1, 70, 30, 100, 125, 190, 35, 105, 45, 1~
## $ vitamins <dbl> 25, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25~
## $ shelf <dbl> 3, 3, 3, 3, 3, 1, 2, 3, 1, 3, 2, 1, 2, 3, 2, 1, 1, 2, 2, 3, 2~
## $ weight <dbl> 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.33, 1.00, 1.00, 1~
## $ cups <dbl> 0.33, 1.00, 0.33, 0.50, 0.75, 0.75, 1.00, 0.75, 0.67, 0.67, 0~
## $ rating <dbl> 68.40297, 33.98368, 59.42551, 93.70491, 34.38484, 29.50954, 3~
table(cereal$type) # To see how many cereals for each type.
##
## C H
## 74 3
We can see that:
- There are 77 observations of 16 variables.
- There are negative values for carbs, sugars, and potassium.
- Most of the variables are numerical with a few columns made of characters.
- There are 74 cold cereals and only 3 hot cereals.