This mock project consists of data points with different features such as the following:
Name of the item (Nombre).
Cost of the item (Cuesta).
Quantity of the item (Cantidad).
Revenue of the items (Dinero) calculated by taking the product of quantity and cost.
# Summarize key features of the data:
df
## Fecha Nombre Cuesta Cantidad Dinero
## 1 3/25 Cupcake 2 30 60
## 2 3/25 Cookie 1 20 20
## 3 3/25 Muffin 3 12 36
## 4 3/26 Cupcake 2 40 80
## 5 3/26 Pie 5 15 75
## 6 3/27 Cupcake 2 35 70
## 7 3/27 Cookie 1 25 25
## 8 3/27 Muffin 3 14 42
## 9 3/28 Cupcake 2 32 64
## 10 3/28 Pie 5 16 80
## 11 3/29 Cupcake 2 38 76
## 12 3/29 Cookie 1 22 22
## 13 3/29 Muffin 3 13 39
## 14 3/30 Cupcake 2 36 72
## 15 3/30 Pie 5 17 85
## 16 3/31 Cupcake 2 39 78
## 17 3/31 Cookie 1 24 24
## 18 3/31 Muffin 3 15 45
## 19 3/31 Pie 5 18 90
## 20 3/31 Cupcake 2 41 82
## 21 3/31 Cookie 1 26 26
glimpse(df)
## Rows: 21
## Columns: 5
## $ Fecha <chr> "3/25", "3/25", "3/25", "3/26", "3/26", "3/27", "3/27", "3/27…
## $ Nombre <chr> "Cupcake", "Cookie", "Muffin", "Cupcake", "Pie", "Cupcake", "…
## $ Cuesta <int> 2, 1, 3, 2, 5, 2, 1, 3, 2, 5, 2, 1, 3, 2, 5, 2, 1, 3, 5, 2, 1
## $ Cantidad <int> 30, 20, 12, 40, 15, 35, 25, 14, 32, 16, 38, 22, 13, 36, 17, 3…
## $ Dinero <int> 60, 20, 36, 80, 75, 70, 25, 42, 64, 80, 76, 22, 39, 72, 85, 7…
str(df)
## 'data.frame': 21 obs. of 5 variables:
## $ Fecha : chr "3/25" "3/25" "3/25" "3/26" ...
## $ Nombre : chr "Cupcake" "Cookie" "Muffin" "Cupcake" ...
## $ Cuesta : int 2 1 3 2 5 2 1 3 2 5 ...
## $ Cantidad: int 30 20 12 40 15 35 25 14 32 16 ...
## $ Dinero : int 60 20 36 80 75 70 25 42 64 80 ...
summary(df)
## Fecha Nombre Cuesta Cantidad
## Length:21 Length:21 Min. :1.000 Min. :12.00
## Class :character Class :character 1st Qu.:2.000 1st Qu.:16.00
## Mode :character Mode :character Median :2.000 Median :24.00
## Mean :2.524 Mean :25.14
## 3rd Qu.:3.000 3rd Qu.:35.00
## Max. :5.000 Max. :41.00
## Dinero
## Min. :20.00
## 1st Qu.:36.00
## Median :64.00
## Mean :56.71
## 3rd Qu.:78.00
## Max. :90.00
skim_without_charts(df)
| Name | df |
| Number of rows | 21 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Fecha | 0 | 1 | 4 | 4 | 0 | 7 | 0 |
| Nombre | 0 | 1 | 3 | 7 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Cuesta | 0 | 1 | 2.52 | 1.40 | 1 | 2 | 2 | 3 | 5 |
| Cantidad | 0 | 1 | 25.14 | 10.01 | 12 | 16 | 24 | 35 | 41 |
| Dinero | 0 | 1 | 56.71 | 24.43 | 20 | 36 | 64 | 78 | 90 |
We can aggregate the data based on name type. That is, group by name and summarize the total price and number of goods sold by using the group_by() and summarise() functions.
TP <- df %>% group_by(Nombre) %>% summarise(Total_Price = sum(Cuesta * Cantidad), Total_Sold = sum(Cantidad))
TP %>% flextable()
Nombre | Total_Price | Total_Sold |
|---|---|---|
Cookie | 117 | 117 |
Cupcake | 582 | 291 |
Muffin | 162 | 54 |
Pie | 330 | 66 |
Cupcakes are the most popular item at the bakery. Over 250 Cupcakes were sold at the local bakery this week compared to a less popular item, the Muffin, which had 54 of them sold. Although Pies and Muffins had similar quantities (cuantidad) sold, Pies brought much more profit since it cost $5 per each.