In basic package tables are not so visualy contetnual. If we want to use our tables for displaying to broadar public and they would need to be a bit more interesting. Luckily there are many packages in R that are dealing with tabulation, but now we will cover following:

descr package
expss package
tableOne package
formattable package, and

###Lets start with the basic tables

We will use this database that has cathegorical and numerical variables, so we can play with many types of tables.

Tokyo_updated <- readxl::read_xlsx("Tokyo_updated.xlsx")
str(Tokyo_updated)
## tibble [1,884 × 13] (S3: tbl_df/tbl/data.frame)
##  $ Store      : chr [1:1884] "Tokyo" "Tokyo" "Tokyo" "Tokyo" ...
##  $ Brand      : chr [1:1884] "Asics" "Asics" "Asics" "Asics" ...
##  $ Type       : chr [1:1884] "WB1820" "Kayano Single Tab" "WB1820" "WB2585" ...
##  $ Gender     : chr [1:1884] "Female" "Unisex" "Female" "Female" ...
##  $ Size       : chr [1:1884] "42" "42-44" "37" "39" ...
##  $ Color      : chr [1:1884] "Blue" "Blue" "Pink" "Black" ...
##  $ Category   : chr [1:1884] "Pants" "Socks" "Pants" "Pants" ...
##  $ Sales Price: num [1:1884] 89 25 89 99 89 ...
##  $ Date       : POSIXct[1:1884], format: "2015-07-15" "2015-07-15" ...
##  $ Time       : POSIXct[1:1884], format: "1899-12-31 07:32:09" "1899-12-31 07:33:36" ...
##  $ Loyalty    : chr [1:1884] "41842" "46176" "---" "40444" ...
##  $ Month      : num [1:1884] 7 7 7 7 7 7 7 7 7 7 ...
##  $ ...13      : num [1:1884] 29 29 29 29 29 29 29 29 29 29 ...

Now lets tabulate..

table (Tokyo_updated$Brand)
## 
## Adidas  Asics   Nike 
##    213   1058    613

Ako zelimo presjek dvije varijable u tabeli onda nam obje varijable moraju biti kategoricke.

table(Tokyo_updated$Brand, Tokyo_updated$Gender)
##         
##          Female Male Unisex
##   Adidas     87  126      0
##   Asics     570  288    200
##   Nike      253   49    311

U ovoj tabeli nedostaju proporcije, tj. prikazane su samo frekvencije. Da bismo prikazali proporcije koristimo funkciju prop.table.

tabela <- table (Tokyo_updated$Brand, Tokyo_updated$Gender)
prop.table(tabela,1)
##         
##              Female       Male     Unisex
##   Adidas 0.40845070 0.59154930 0.00000000
##   Asics  0.53875236 0.27221172 0.18903592
##   Nike   0.41272431 0.07993475 0.50734095
prop.table(tabela,2)
##         
##             Female      Male    Unisex
##   Adidas 0.0956044 0.2721382 0.0000000
##   Asics  0.6263736 0.6220302 0.3913894
##   Nike   0.2780220 0.1058315 0.6086106

U zavisnosnosti od toga da li zelimo proporcije unutar reda, ili unutar kolone dodajemo 1 ili 2.

Zakljucak Ovaj nacin prikazivanja table koristimo uglavnom prilikom pocetne faza istrazivanja baze podataka. Takodjer, koristi se kao jedan od koraka prilikom chi.squere () testiranja ili kao jedan od koraka za vizualizaciju geom_bar koji koristi frekvencije.

###Sada prelazimo na tableonepaket

#install.packages ("tableone")
#install.packages ("purrr")
#install.packages ("glue")
library(tableone) 

Meni nije moglo instalirati tableone paket a da nisam prvo instalirala paket purrr i glue.

CreateTableOne(data=Tokyo_updated, vars = c("Brand","Sales Price" ), strata = c("Gender"))
##                          Stratified by Gender
##                           Female         Male           Unisex        p     
##   n                          910            463           511               
##   Brand (%)                                                           <0.001
##      Adidas                   87 ( 9.6)     126 (27.2)      0 ( 0.0)        
##      Asics                   570 (62.6)     288 (62.2)    200 (39.1)        
##      Nike                    253 (27.8)      49 (10.6)    311 (60.9)        
##   Sales Price (mean (SD)) 100.31 (44.07) 125.89 (39.76) 21.10 (2.10)  <0.001
##                          Stratified by Gender
##                           test
##   n                           
##   Brand (%)                   
##      Adidas                   
##      Asics                    
##      Nike                     
##   Sales Price (mean (SD))

Ova tabela nam izbacuje mean i sd u odnosu na stratu. Objasnjenje za mean i sd - koristicemo tapply tzv familiju apply funkcija.

tapply(Tokyo_updated$`Sales Price`, Tokyo_updated$Gender, mean)
##    Female      Male    Unisex 
## 100.30777 125.89417  21.10162
tapply(Tokyo_updated$`Sales Price`, Tokyo_updated$Gender, sd)
##    Female      Male    Unisex 
## 44.065790 39.761840  2.096324

###Sada smo dosli do treceg paketa descr Ovaj paket je slican osnovnom. Uz pomoc ovog paketa mozemo da vidimo i proporcije

library(descr)
crosstab(Tokyo_updated$Brand,Tokyo_updated$Category, prop.r=T, plot=T, digits=1)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |             Row Percent | 
## |-------------------------|
## 
## =============================================================
##                        Tokyo_updated$Category
## Tokyo_updated$Brand      Bra   Pants     Shoe   Socks   Total
## -------------------------------------------------------------
## Adidas                    0       0      213       0     213 
##                         0.0%    0.0%   100.0%    0.0%   11.3%
## -------------------------------------------------------------
## Asics                     0     683       82     293    1058 
##                         0.0%   64.6%     7.8%   27.7%   56.2%
## -------------------------------------------------------------
## Nike                    218       0       84     311     613 
##                        35.6%    0.0%    13.7%   50.7%   32.5%
## -------------------------------------------------------------
## Total                   218     683      379     604    1884 
## =============================================================
crosstab(Tokyo_updated$Category,Tokyo_updated$Brand, prop.r=T, plot=T, digits=1) #zamjeniti mjesta za druge proporcije

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |             Row Percent | 
## |-------------------------|
## 
## ==========================================================
##                           Tokyo_updated$Brand
## Tokyo_updated$Category    Adidas    Asics     Nike   Total
## ----------------------------------------------------------
## Bra                           0        0      218     218 
##                             0.0%     0.0%   100.0%   11.6%
## ----------------------------------------------------------
## Pants                         0      683        0     683 
##                             0.0%   100.0%     0.0%   36.3%
## ----------------------------------------------------------
## Shoe                        213       82       84     379 
##                            56.2%    21.6%    22.2%   20.1%
## ----------------------------------------------------------
## Socks                         0      293      311     604 
##                             0.0%    48.5%    51.5%   32.1%
## ----------------------------------------------------------
## Total                       213     1058      613    1884 
## ==========================================================

Argumenti znace slijedece:
> prop.r = T zelimo proporcije
> plot = T zelmo grafikon tih proporcija
> digits = 1 zelimo da svi brojevi u tabli imaju jedan decimalni razmak.

###Cetvrti nacin je uz pomoc expss paketa

library(expss)
names(Tokyo_updated)
##  [1] "Store"       "Brand"       "Type"        "Gender"      "Size"       
##  [6] "Color"       "Category"    "Sales Price" "Date"        "Time"       
## [11] "Loyalty"     "Month"       "...13"

Ovaj paket je koristan i za dodavanja etiketa (labels) varijablama. Ti kodovi su obradjeni posebno.

cro(Tokyo_updated$Category, Tokyo_updated$Brand)
 Tokyo_updated$Brand 
 Adidas   Asics   Nike 
 Tokyo_updated$Category 
   Bra  218
   Pants  683
   Shoe  213 82 84
   Socks  293 311
   #Total cases  213 1058 613

Vidimo da je ova tabela najvise vizuelno prihvatljiva. Probajmo je spasiti na komp.

library(openxlsx)
crotabela <- cro(Tokyo_updated$Category, Tokyo_updated$Brand)
wb = createWorkbook()
sh = addWorksheet(wb, "Tables")

xl_write(crotabela, wb, sh)

saveWorkbook(wb, "report.xlsx", overwrite = TRUE)

Na ovaj nacin spasili smo tabelu u excelu pod nazivom "report" i u sheet-u Tables.

.. jos primjera u expss paketu:

cro_cpct(Tokyo_updated$Brand, list(total(), Tokyo_updated$Type, Tokyo_updated$Gender, Tokyo_updated$Category))
 #Total   Air Zoom   Boston 5   Cummulus 17   Dri-Fit No-Show   Elite Compression   Free 5.0   Gel Quantics   Hera Deux Single Tab   Hyper Elite Crew   Kayano Single Tab   MB1878   Nike Elite Match Fit Mercurial Crew   Nimbus 17   Pro Classic   Pro Fierce   Pro Hero   Pro Indy   Quick Light Cushion   Rocket Boost   Ultra Boost   WB1820   WB2585   Female   Male   Unisex   Bra   Pants   Shoe   Socks 
 Tokyo_updated$Brand 
   Adidas  11.3 100 100 100 9.6 27.2 56.2
   Asics  56.2 100 100 100 100 100 100 100 100 100 62.6 62.2 39.1 100 21.6 48.5
   Nike  32.5 100 100 100 100 100 100 100 100 100 100 27.8 10.6 60.9 100 22.2 51.5
   #Total cases  1884 42 92 45 103 3 42 11 93 77 104 219 128 26 27 112 46 33 96 73 48 415 49 910 463 511 218 683 379 604

Pod list() mozemo da dodajemo koliko zelimo. Bez obzira, po koloni ili po redovima. Npr. ako zelimo gornju tabelu koja je preduga, da “razbijemo” i po redovima onda je kod

cro_cpct(list (total (),Tokyo_updated$Type,total (), Tokyo_updated$Category),  list(total(),Tokyo_updated$Brand, Tokyo_updated$Gender))
 #Total   Adidas   Asics   Nike   Female   Male   Unisex 
 #Total  100 100 100 100 100 100 100
 #Total cases  1 213 1058 613 910 463 511
 Air Zoom  2.2 6.9 1.1 6.9
 Boston 5  4.9 43.2 4.0 12.1
 Cummulus 17  2.4 4.3 0.9 8.0
 Dri-Fit No-Show  5.5 16.8 20.2
 Elite Compression  0.2 0.5 0.6
 Free 5.0  2.2 6.9 2.7 3.7
 Gel Quantics  0.6 1.0 0.3 1.7
 Hera Deux Single Tab  4.9 8.8 10.2
 Hyper Elite Crew  4.1 12.6 15.1
 Kayano Single Tab  5.5 9.8 20.4
 MB1878  11.6 20.7 47.3
 Nike Elite Match Fit Mercurial Crew  6.8 20.9 25.0
 Nimbus 17  1.4 2.5 0.2 5.2
 Pro Classic  1.4 4.4 3.0
 Pro Fierce  5.9 18.3 12.3
 Pro Hero  2.4 7.5 5.1
 Pro Indy  1.8 5.4 3.6
 Quick Light Cushion  5.1 9.1 18.8
 Rocket Boost  3.9 34.3 3.6 8.6
 Ultra Boost  2.5 22.5 2.0 6.5
 WB1820  22.0 39.2 45.6
 WB2585  2.6 4.6 5.4
 #Total cases  1884 213 1058 613 910 463 511
 #Total  100 100 100 100 100 100 100
 #Total cases  1 213 1058 613 910 463 511
 Bra  11.6 35.6 24.0
 Pants  36.3 64.6 51.0 47.3
 Shoe  20.1 100.0 7.8 13.7 14.8 52.7
 Socks  32.1 27.7 50.7 10.2 100.0
 #Total cases  1884 213 1058 613 910 463 511

Funkcija total () se dodaje ako zelimo da se prizu za narednu varijablu totalu u 100%. Ako zelimo da imamo banere tj. da iznad svake grupe stoji naziv varijable, tj. da je Male, Female i Unisex grupisan u Gender onda imamo drugaciji kod:

library(dplyr)
Tokyo_updated %>%
  tab_cells(total(), Type, Category) %>% 
  tab_cols(total (), Brand, Gender) %>% 
  tab_stat_cpct() %>% 
  tab_pivot()
 #Total     Brand     Gender 
   Adidas   Asics   Nike     Female   Male   Unisex 
 #Total 
   100   100 100 100   100 100 100
 #Total cases 
   1   213 1058 613   910 463 511
 Type 
   Air Zoom  2.2   6.9   1.1 6.9
   Boston 5  4.9   43.2   4.0 12.1
   Cummulus 17  2.4   4.3   0.9 8.0
   Dri-Fit No-Show  5.5   16.8   20.2
   Elite Compression  0.2   0.5   0.6
   Free 5.0  2.2   6.9   2.7 3.7
   Gel Quantics  0.6   1.0   0.3 1.7
   Hera Deux Single Tab  4.9   8.8   10.2
   Hyper Elite Crew  4.1   12.6   15.1
   Kayano Single Tab  5.5   9.8   20.4
   MB1878  11.6   20.7   47.3
   Nike Elite Match Fit Mercurial Crew  6.8   20.9   25.0
   Nimbus 17  1.4   2.5   0.2 5.2
   Pro Classic  1.4   4.4   3.0
   Pro Fierce  5.9   18.3   12.3
   Pro Hero  2.4   7.5   5.1
   Pro Indy  1.8   5.4   3.6
   Quick Light Cushion  5.1   9.1   18.8
   Rocket Boost  3.9   34.3   3.6 8.6
   Ultra Boost  2.5   22.5   2.0 6.5
   WB1820  22.0   39.2   45.6
   WB2585  2.6   4.6   5.4
   #Total cases  1884   213 1058 613   910 463 511
 Category 
   Bra  11.6   35.6   24.0
   Pants  36.3   64.6   51.0 47.3
   Shoe  20.1   100.0 7.8 13.7   14.8 52.7
   Socks  32.1   27.7 50.7   10.2 100.0
   #Total cases  1884   213 1058 613   910 463 511

Gornju tabelu smo mogli da prikazemo i u slucajevima a ne proporcijama samo umjesto funkcije tab_stat_cpct () koristili bi tab_stat_cases

Ako zelimo da prikazemo brand pod gender kao nested onda je funkcija, i ovaj put ne u procentima nego u frekvencijama

cro_cases(Tokyo_updated$Category, list(total(), Tokyo_updated$Brand %nest% Tokyo_updated$Gender))
 #Total     Adidas     Asics     Nike 
   Female   Male   Unisex     Female   Male   Unisex     Female   Male   Unisex 
 Tokyo_updated$Category 
   Bra  218       218
   Pants  683     464 219  
   Shoe  379   87 126   13 69   35 49
   Socks  604     93 200   311
   #Total cases  1884   87 126   570 288 200   253 49 311

If we want to show some descriptive statistics for our numeric variabls then the code is>

library(dplyr)

Tokyo_updated %>% tab_cells(`Sales Price`, Month, ...13) %>% tab_cols(total(label = "Total")) %>% tab_stat_fun(w_mean, w_sd, w_n, method = list) %>% tab_pivot()
 Total 
 w_mean   w_sd   w_n 
 Sales Price  85.1 54.4 1884
 Month  7.0 0.0 1884
 …13  30.0 0.8 1884

Ako zelimo da promjenimo ime u kolonoma onda..

Tokyo_updated %>% tab_cells(`Sales Price`, Month, ...13) %>% tab_cols(total(label = "Total"), Gender) %>% tab_stat_fun(Mean = w_mean, Std = w_sd, N = w_n, method = list) %>% tab_pivot()
 Total     Gender 
 Mean     Std     N     Female     Male     Unisex 
       Mean   Std   N     Mean   Std   N     Mean   Std   N 
 Sales Price  85.1   54.4   1884   100.3 44.1 910   125.9 39.8 463   21.1 2.1 511
 Month  7.0   0.0   1884   7.0 0.0 910   7.0 0.0 463   7.0 0.0 511
 …13  30.0   0.8   1884   30.0 0.8 910   30.0 0.8 463   29.9 0.8 511

Ako zelimo da nam Mean, Std i N budu u istoj ravni onda nam je label = "Total| |"

Tokyo_updated %>% tab_cells(`Sales Price`, Month, ...13) %>% tab_cols(total(label = "Total| |"), Gender) %>% tab_stat_fun(Mean = w_mean, Std = w_sd, N = w_n, method = list) %>% tab_pivot()
 Total     Gender 
       Female     Male     Unisex 
 Mean   Std   N     Mean   Std   N     Mean   Std   N     Mean   Std   N 
 Sales Price  85.1 54.4 1884   100.3 44.1 910   125.9 39.8 463   21.1 2.1 511
 Month  7.0 0.0 1884   7.0 0.0 910   7.0 0.0 463   7.0 0.0 511
 …13  30.0 0.8 1884   30.0 0.8 910   30.0 0.8 463   29.9 0.8 511

za izradu tabelu moguce je jos varijante koje mozete vidjeti na: https://cran.r-project.org/web/packages/expss/vignettes/tables-with-labels.html
https://gdemin.github.io/expss/ ili na https://cran.r-project.org/web/packages/expss/expss.pdf

###Sada prelazimo na peti paket formattable.

library(data.table)
library(dplyr)
library(formattable)
library(tidyr)
i <- Tokyo_updated %>% filter(Type== c("WB1820", "Ultra Boost","Nike Elite Match Fit Mercurial Crew", "MB1878", "Kayano Single Tab","Hera Deux Single Tab", "Dri-Fit No-Show", "Boston 5")) %>% select(Gender, Brand, `Sales Price`, Category, Type)
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `==...`.
## Caused by warning in `Type == c("WB1820", "Ultra Boost", "Nike Elite Match Fit Mercurial Crew",
##     "MB1878", "Kayano Single Tab", "Hera Deux Single Tab", "Dri-Fit No-Show",
##     "Boston 5")`:
## ! longer object length is not a multiple of shorter object length
a <- group_by(i, Type) %>% summarise("AverageSP" = mean(`Sales Price`, na.rm = T), "MedianSP" = median(`Sales Price`, na.rm = T), "StdSP" = sd (`Sales Price`, na.rm = T))

Da vidimo sada kakav je format od a i a kada upotrijebimo formattable

a
## # A tibble: 8 × 4
##   Type                                AverageSP MedianSP StdSP
##   <chr>                                   <dbl>    <dbl> <dbl>
## 1 Boston 5                                123.     125    2.61
## 2 Dri-Fit No-Show                          20       20    0   
## 3 Hera Deux Single Tab                     13.0     13.0  0   
## 4 Kayano Single Tab                        25.0     25.0  0   
## 5 MB1878                                   99       99    0   
## 6 Nike Elite Match Fit Mercurial Crew      21       21    0   
## 7 Ultra Boost                             139      139    0   
## 8 WB1820                                   89       89    0
formattable(a)
Type AverageSP MedianSP StdSP
Boston 5 122.7273 125.00 2.611165
Dri-Fit No-Show 20.0000 20.00 0.000000
Hera Deux Single Tab 12.9900 12.99 0.000000
Kayano Single Tab 24.9900 24.99 0.000000
MB1878 99.0000 99.00 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 21.00 0.000000
Ultra Boost 139.0000 139.00 0.000000
WB1820 89.0000 89.00 0.000000

Ako zelimo drugacije da centirramo sadrzaj

formattable(a, align = c("l","c", "c","r"))
Type AverageSP MedianSP StdSP
Boston 5 122.7273 125.00 2.611165
Dri-Fit No-Show 20.0000 20.00 0.000000
Hera Deux Single Tab 12.9900 12.99 0.000000
Kayano Single Tab 24.9900 24.99 0.000000
MB1878 99.0000 99.00 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 21.00 0.000000
Ultra Boost 139.0000 139.00 0.000000
WB1820 89.0000 89.00 0.000000

Vise o mogucnostima format table mozete pogledati na https://cran.r-project.org/web/packages/formattable/formattable.pdf

Ako zelim da mi oboji neke brojeve pod odredjenim uslvom onda

formattable(a, list(AverageSP = formatter("span",style = x ~ ifelse(x > median(x), "color:red", NA))))
Type AverageSP MedianSP StdSP
Boston 5 122.7273 125.00 2.611165
Dri-Fit No-Show 20.0000 20.00 0.000000
Hera Deux Single Tab 12.9900 12.99 0.000000
Kayano Single Tab 24.9900 24.99 0.000000
MB1878 99.0000 99.00 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 21.00 0.000000
Ultra Boost 139.0000 139.00 0.000000
WB1820 89.0000 89.00 0.000000

#razlicita jacina crvene u yavisnosti od toga koliko se udaljava

formattable(a, list(MedianSP = formatter("span",style = x ~ style(color = rgb(x/max(x), 0, 0)))))
Type AverageSP MedianSP StdSP
Boston 5 122.7273 125.00 2.611165
Dri-Fit No-Show 20.0000 20.00 0.000000
Hera Deux Single Tab 12.9900 12.99 0.000000
Kayano Single Tab 24.9900 24.99 0.000000
MB1878 99.0000 99.00 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 21.00 0.000000
Ultra Boost 139.0000 139.00 0.000000
WB1820 89.0000 89.00 0.000000

Pozadina da je crvnea

formattable(a, list(MedianSP = formatter("span",style = x ~ style(display = "block",
"border-radius" = "4px",
"padding-right" = "4px",
color = "white","background-color" = rgb(x/max(x), 0, 0)))))
Type AverageSP MedianSP StdSP
Boston 5 122.7273 125.00 2.611165
Dri-Fit No-Show 20.0000 20.00 0.000000
Hera Deux Single Tab 12.9900 12.99 0.000000
Kayano Single Tab 24.9900 24.99 0.000000
MB1878 99.0000 99.00 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 21.00 0.000000
Ultra Boost 139.0000 139.00 0.000000
WB1820 89.0000 89.00 0.000000

U ovoj tabeli je uslov da boja varira u zavisnosti od svake podijeljeno sa maximum ..takle 125/139 pa 20/139, pa 12.99/139, i tako se dobije boja za red pod rgb , a maksimalno je 139 tj. ta treba da bude najsvjetlija. Uslove smo mogli i drugacije postavitii.

formattable(a, list(MedianSP = formatter("span",style = x ~ style(display = "block")),`MedianSP` = color_bar("red")))
Type AverageSP MedianSP StdSP
Boston 5 122.7273 <span style=“display: block”>125.00</span> 2.611165
Dri-Fit No-Show 20.0000 <span style=“display: block”>20.00</span> 0.000000
Hera Deux Single Tab 12.9900 <span style=“display: block”>12.99</span> 0.000000
Kayano Single Tab 24.9900 <span style=“display: block”>24.99</span> 0.000000
MB1878 99.0000 <span style=“display: block”>99.00</span> 0.000000
Nike Elite Match Fit Mercurial Crew 21.0000 <span style=“display: block”>21.00</span> 0.000000
Ultra Boost 139.0000 <span style=“display: block”>139.00</span> 0.000000
WB1820 89.0000 <span style=“display: block”>89.00</span> 0.000000