These are guidelines and templates on how to prepare publishable tables in R Markdown printed into HTML format for SMI205 students.
Before running this R Markdown document install the following packages:
install.packages("rmarkdown")
install.packages("rmdformats")
install.packages("essurvey")
install.packages("xtable")
install.packages("summarytools")
install.packages("knitr")
install.packages("sjPlot")material theme from rmdformats package is here.
html_document: highlight: tango code_download: true toc: true toc_depth: 2 toc_float: collapsed: false smooth_scroll: true
More YAML settings are overviewed here: https://rmarkdown.rstudio.com/docs/reference/html_document.html
I’m also setting the global chunk options, so for all r chunks the code is diplayed (echo=TRUE - note that default in this theme is FALSE), but any additional messages or warnings are not displayed (both set up as FALSE).
library(knitr)
library(rmdformats)
## Global options
options(max.print="85")
opts_chunk$set(echo=TRUE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=85)include=FALSE for this r chunk, so it is not displayed at the begining of your report while the code is still run.I use ESS wave 2016 (wave 8) for this excercise. As you remember from our previous R labs, this data can be easily dowloaded using essurvey package.
In the settings of this r chunk I specify results = 'hide' so R Markdown does not print all information about loaded data. Try changing this to see what happens.
I subset data and keep the following variables in my new, smaller dataset called round_8_final: idno, cntry, blgetmg, imbgeco, imueclt, imwbcnt, gndr, agea, maritalb, eisced, polintr, hincfel, uemp5yr.
library("essurvey")
set_email("a.piekut@sheffield.ac.uk")
round_8 <- import_rounds(8)
round_8_final = subset(round_8, select = c(idno, cntry, blgetmg, imbgeco, imueclt, imwbcnt, gndr, agea, maritalb, eisced, polintr, hincfel, uemp5yr))In your reports, please do the same for all data manipulation, such as subsetting or merging datasets - display r chunks, but hide the results.
summary optionThe summary option is well known to you from previous R classes. Let’s produce a summary table of all variables in our smaller dataset and a seperate one for variable age.
idno cntry blgetmg imbgeco
Min. : 1 Length:44387 Min. :1.000 Min. : 0.000
1st Qu.: 1208 Class :character 1st Qu.:2.000 1st Qu.: 3.000
Median : 2589 Mode :character Median :2.000 Median : 5.000
Mean : 31545782 Mean :1.935 Mean : 5.006
3rd Qu.: 11058 3rd Qu.:2.000 3rd Qu.: 7.000
Max. :551603139 Max. :2.000 Max. :10.000
imueclt imwbcnt gndr agea
Min. : 0.000 Min. : 0.000 Min. :1.000 Min. : 15.00
1st Qu.: 4.000 1st Qu.: 3.000 1st Qu.:1.000 1st Qu.: 34.00
Median : 5.000 Median : 5.000 Median :2.000 Median : 49.00
Mean : 5.368 Mean : 4.904 Mean :1.526 Mean : 49.14
3rd Qu.: 7.000 3rd Qu.: 7.000 3rd Qu.:2.000 3rd Qu.: 64.00
Max. :10.000 Max. :10.000 Max. :2.000 Max. :100.00
maritalb eisced polintr hincfel
Min. :1.000 Min. : 1.000 Min. :1.000 Min. :1.000
1st Qu.:1.000 1st Qu.: 2.000 1st Qu.:2.000 1st Qu.:1.000
Median :2.000 Median : 4.000 Median :3.000 Median :2.000
Mean :3.153 Mean : 4.109 Mean :2.587 Mean :1.948
3rd Qu.:6.000 3rd Qu.: 5.000 3rd Qu.:3.000 3rd Qu.:2.000
Max. :6.000 Max. :55.000 Max. :4.000 Max. :4.000
uemp5yr
Min. :1.00
1st Qu.:1.00
Median :2.00
Mean :1.53
3rd Qu.:2.00
Max. :2.00
[ reached getOption("max.print") -- omitted 1 row ]
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
15.00 34.00 49.00 49.14 64.00 100.00 155
Both outputs are ‘readable’, but not presented in a publishable format.
In next sections I overview a few functions that are useful in producing nice tables, which are easily pinted in HTML format in R Markdown.
xtableFirst, let’s use xtable function which exports tables to LaTeX or HTML format.
You need to install this package first on your machine (go back to Technical notes section to see names of packages). xtable overview is here: https://cran.r-project.org/web/packages/xtable/vignettes/xtableGallery.pdf
xtable needs to be supported with print function, which helps in ‘translating’ the output into HTML code associated with xtable object. See more here
I use this function to display first 10 rows of my data for all variables in the dataset. My r chunk is displayed first, then the result - I made them both visisble.
library("xtable")
xtable_10rows <- xtable(round_8_final[1:10, ], digits = 0)
print(xtable_10rows, type = "html")<!-- html table generated in R 3.6.3 by xtable 1.8-4 package -->
<!-- Fri May 01 15:12:50 2020 -->
<table border=1>
<tr> <th> </th> <th> idno </th> <th> cntry </th> <th> blgetmg </th> <th> imbgeco </th> <th> imueclt </th> <th> imwbcnt </th> <th> gndr </th> <th> agea </th> <th> maritalb </th> <th> eisced </th> <th> polintr </th> <th> hincfel </th> <th> uemp5yr </th> </tr>
<tr> <td align="right"> 1 </td> <td align="right"> 1 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 10 </td> <td align="right"> 10 </td> <td align="right"> 10 </td> <td align="right"> 2 </td> <td align="right"> 34 </td> <td align="right"> 6 </td> <td align="right"> 7 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> </tr>
<tr> <td align="right"> 2 </td> <td align="right"> 2 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 8 </td> <td align="right"> 8 </td> <td align="right"> 5 </td> <td align="right"> 1 </td> <td align="right"> 52 </td> <td align="right"> 1 </td> <td align="right"> 4 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> </tr>
<tr> <td align="right"> 3 </td> <td align="right"> 4 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 5 </td> <td align="right"> 4 </td> <td align="right"> 2 </td> <td align="right"> 68 </td> <td align="right"> 6 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> </tr>
<tr> <td align="right"> 4 </td> <td align="right"> 6 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 7 </td> <td align="right"> 7 </td> <td align="right"> 10 </td> <td align="right"> 1 </td> <td align="right"> 54 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 5 </td> <td align="right"> 10 </td> <td> AT </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 5 </td> <td align="right"> 2 </td> <td align="right"> 20 </td> <td align="right"> 1 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 6 </td> <td align="right"> 11 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 8 </td> <td align="right"> 6 </td> <td align="right"> 8 </td> <td align="right"> 2 </td> <td align="right"> 65 </td> <td align="right"> 4 </td> <td align="right"> 4 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 7 </td> <td align="right"> 12 </td> <td> AT </td> <td align="right"> 1 </td> <td align="right"> 6 </td> <td align="right"> 5 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 52 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 8 </td> <td align="right"> 13 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 44 </td> <td align="right"> 1 </td> <td align="right"> 5 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 9 </td> <td align="right"> 14 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 9 </td> <td align="right"> 10 </td> <td align="right"> 9 </td> <td align="right"> 2 </td> <td align="right"> 22 </td> <td align="right"> 6 </td> <td align="right"> 3 </td> <td align="right"> 4 </td> <td align="right"> 1 </td> <td align="right"> </td> </tr>
<tr> <td align="right"> 10 </td> <td align="right"> 15 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 7 </td> <td align="right"> 9 </td> <td align="right"> 9 </td> <td align="right"> 2 </td> <td align="right"> 41 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> </tr>
</table>
Ups - this does not look so well. You have to specify results='asis' in the r chunk. The deafault is results='markup' which writes the resutls in the raw html form (which is read by LaTeX). The table looks like that when you specify results='asis':
library("xtable")
xtable_10rows <- xtable(round_8_final[1:10, ], digits = 0)
print(xtable_10rows, type = "html")| idno | cntry | blgetmg | imbgeco | imueclt | imwbcnt | gndr | agea | maritalb | eisced | polintr | hincfel | uemp5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | AT | 2 | 10 | 10 | 10 | 2 | 34 | 6 | 7 | 1 | 2 | 1 |
| 2 | 2 | AT | 2 | 8 | 8 | 5 | 1 | 52 | 1 | 4 | 1 | 2 | 2 |
| 3 | 4 | AT | 2 | 5 | 5 | 4 | 2 | 68 | 6 | 3 | 3 | 2 | 2 |
| 4 | 6 | AT | 2 | 7 | 7 | 10 | 1 | 54 | 4 | 3 | 2 | 2 | |
| 5 | 10 | AT | 1 | 2 | 5 | 5 | 2 | 20 | 1 | 3 | 3 | 2 | |
| 6 | 11 | AT | 2 | 8 | 6 | 8 | 2 | 65 | 4 | 4 | 2 | 1 | |
| 7 | 12 | AT | 1 | 6 | 5 | 3 | 2 | 52 | 1 | 2 | 3 | 3 | |
| 8 | 13 | AT | 2 | 5 | 4 | 3 | 2 | 44 | 1 | 5 | 3 | 2 | |
| 9 | 14 | AT | 2 | 9 | 10 | 9 | 2 | 22 | 6 | 3 | 4 | 1 | |
| 10 | 15 | AT | 2 | 7 | 9 | 9 | 2 | 41 | 4 | 3 | 2 | 2 | 1 |
Let’s now hide the code (option echo=FALSE in r chunk) and add table name before the output. Option caption in xtable allows adding title, but it apears below the table. You can however add one more option to print - caption.placement="top". Finally, you can use option align in xtable to indicate whether you want text in cells to be placed in the centre (“c”), on the right (“r”) or on the left (“l”). This has to be specified for all columns and one additional column at the begining (so number of columns + 1 times).
| idno | cntry | blgetmg | imbgeco | imueclt | imwbcnt | gndr | agea | maritalb | eisced | polintr | hincfel | uemp5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | AT | 2 | 10 | 10 | 10 | 2 | 34 | 6 | 7 | 1 | 2 | 1 |
| 2 | 2 | AT | 2 | 8 | 8 | 5 | 1 | 52 | 1 | 4 | 1 | 2 | 2 |
| 3 | 4 | AT | 2 | 5 | 5 | 4 | 2 | 68 | 6 | 3 | 3 | 2 | 2 |
| 4 | 6 | AT | 2 | 7 | 7 | 10 | 1 | 54 | 4 | 3 | 2 | 2 | |
| 5 | 10 | AT | 1 | 2 | 5 | 5 | 2 | 20 | 1 | 3 | 3 | 2 | |
| 6 | 11 | AT | 2 | 8 | 6 | 8 | 2 | 65 | 4 | 4 | 2 | 1 | |
| 7 | 12 | AT | 1 | 6 | 5 | 3 | 2 | 52 | 1 | 2 | 3 | 3 | |
| 8 | 13 | AT | 2 | 5 | 4 | 3 | 2 | 44 | 1 | 5 | 3 | 2 | |
| 9 | 14 | AT | 2 | 9 | 10 | 9 | 2 | 22 | 6 | 3 | 4 | 1 | |
| 10 | 15 | AT | 2 | 7 | 9 | 9 | 2 | 41 | 4 | 3 | 2 | 2 | 1 |
kableIt turns out that kable function poduces much better looking tables in the HTML format (xtable is preffered for PDF/LaTeX formats).
This section of R Markdown book provides more examples on how to use kable: https://bookdown.org/yihui/rmarkdown-cookbook/kable.html
Let’s display the same first 10 rows of our data, as previously, but now using kable function. Table title is more straighforward option here.
library("knitr")
kable(round_8_final[1:10, ], caption = "Table 1. Overview of 10 first rows in data") | idno | cntry | blgetmg | imbgeco | imueclt | imwbcnt | gndr | agea | maritalb | eisced | polintr | hincfel | uemp5yr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AT | 2 | 10 | 10 | 10 | 2 | 34 | 6 | 7 | 1 | 2 | 1 |
| 2 | AT | 2 | 8 | 8 | 5 | 1 | 52 | 1 | 4 | 1 | 2 | 2 |
| 4 | AT | 2 | 5 | 5 | 4 | 2 | 68 | 6 | 3 | 3 | 2 | 2 |
| 6 | AT | 2 | 7 | 7 | 10 | 1 | 54 | 4 | 3 | 2 | 2 | NA |
| 10 | AT | 1 | 2 | 5 | 5 | 2 | 20 | 1 | 3 | 3 | 2 | NA |
| 11 | AT | 2 | 8 | 6 | 8 | 2 | 65 | 4 | 4 | 2 | 1 | NA |
| 12 | AT | 1 | 6 | 5 | 3 | 2 | 52 | 1 | 2 | 3 | 3 | NA |
| 13 | AT | 2 | 5 | 4 | 3 | 2 | 44 | 1 | 5 | 3 | 2 | NA |
| 14 | AT | 2 | 9 | 10 | 9 | 2 | 22 | 6 | 3 | 4 | 1 | NA |
| 15 | AT | 2 | 7 | 9 | 9 | 2 | 41 | 4 | 3 | 2 | 2 | 1 |
summarytools packageBefore diving into any more complex data modelling, you should always explore frequency distribution of your key variables and compute desriptive statistics in order to understand what is happening in the data better.
summarytools package offers a wide range of functions to do so:
descrFunction descr produces a table with all descritpive statistics for all variables. In most functions from summarytools package you have to specify style = "rmarkdown" (but see below), so it produces nice looking tables in this format.
N: 44387
| agea | blgetmg | eisced | gndr | hincfel | idno | |
|---|---|---|---|---|---|---|
| Mean | 49.14 | 1.94 | 4.11 | 1.53 | 1.95 | 31545782.19 |
| Std.Dev | 18.61 | 0.25 | 2.93 | 0.50 | 0.83 | 115541717.82 |
| Min | 15.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Q1 | 34.00 | 2.00 | 2.00 | 1.00 | 1.00 | 1208.00 |
| Median | 49.00 | 2.00 | 4.00 | 2.00 | 2.00 | 2589.00 |
| Q3 | 64.00 | 2.00 | 5.00 | 2.00 | 2.00 | 11058.00 |
| Max | 100.00 | 2.00 | 55.00 | 2.00 | 4.00 | 551603139.00 |
| MAD | 22.24 | 0.00 | 1.48 | 0.00 | 1.48 | 2640.51 |
| IQR | 30.00 | 0.00 | 3.00 | 1.00 | 1.00 | 9849.50 |
| CV | 0.38 | 0.13 | 0.71 | 0.33 | 0.43 | 3.66 |
| Skewness | 0.05 | -3.54 | 10.44 | -0.10 | 0.67 | 4.14 |
| SE.Skewness | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
| Kurtosis | -0.93 | 10.53 | 178.93 | -1.99 | -0.05 | 15.68 |
| N.Valid | 44232.00 | 43946.00 | 44258.00 | 44378.00 | 43863.00 | 44387.00 |
| Pct.Valid | 99.65 | 99.01 | 99.71 | 99.98 | 98.82 | 100.00 |
| imbgeco | imueclt | imwbcnt | maritalb | polintr | uemp5yr | |
|---|---|---|---|---|---|---|
| Mean | 5.01 | 5.37 | 4.90 | 3.15 | 2.59 | 1.53 |
| Std.Dev | 2.52 | 2.64 | 2.40 | 2.26 | 0.92 | 0.50 |
| Min | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| Q1 | 3.00 | 4.00 | 3.00 | 1.00 | 2.00 | 1.00 |
| Median | 5.00 | 5.00 | 5.00 | 2.00 | 3.00 | 2.00 |
| Q3 | 7.00 | 7.00 | 7.00 | 6.00 | 3.00 | 2.00 |
| Max | 10.00 | 10.00 | 10.00 | 6.00 | 4.00 | 2.00 |
| MAD | 2.97 | 2.97 | 2.97 | 1.48 | 1.48 | 0.00 |
| IQR | 4.00 | 3.00 | 4.00 | 5.00 | 1.00 | 1.00 |
| CV | 0.50 | 0.49 | 0.49 | 0.72 | 0.36 | 0.33 |
| Skewness | -0.24 | -0.28 | -0.16 | 0.22 | -0.03 | -0.13 |
| SE.Skewness | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 |
| Kurtosis | -0.51 | -0.60 | -0.29 | -1.80 | -0.86 | -1.98 |
| N.Valid | 42825.00 | 42984.00 | 42825.00 | 43509.00 | 44290.00 | 12406.00 |
| Pct.Valid | 96.48 | 96.84 | 96.48 | 98.02 | 99.78 | 27.95 |
As we do not want to display so much statistical informaiton we can constrain stats options to more basic descriptive statistics: mean, standard deviation, minimum and maximum value. I specify to swap columns with rows (transpose), hide headings and I add title with caption.
| Mean | Std.Dev | Min | Max | |
|---|---|---|---|---|
| agea | 49.14 | 18.61 | 15.00 | 100.00 |
| blgetmg | 1.94 | 0.25 | 1.00 | 2.00 |
| eisced | 4.11 | 2.93 | 1.00 | 55.00 |
| gndr | 1.53 | 0.50 | 1.00 | 2.00 |
| hincfel | 1.95 | 0.83 | 1.00 | 4.00 |
| idno | 31545782.19 | 115541717.82 | 1.00 | 551603139.00 |
| imbgeco | 5.01 | 2.52 | 0.00 | 10.00 |
| imueclt | 5.37 | 2.64 | 0.00 | 10.00 |
| imwbcnt | 4.90 | 2.40 | 0.00 | 10.00 |
| maritalb | 3.15 | 2.26 | 1.00 | 6.00 |
| polintr | 2.59 | 0.92 | 1.00 | 4.00 |
| uemp5yr | 1.53 | 0.50 | 1.00 | 2.00 |
freqIf you want to display frequency distribution for one variable, use fuction freq
Label: Gender
| Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
|---|---|---|---|---|---|
| 1 | 21027 | 47.38 | 47.38 | 47.37 | 47.37 |
| 2 | 23351 | 52.62 | 100.00 | 52.61 | 99.98 |
| <NA> | 9 | 0.02 | 100.00 | ||
| Total | 44387 | 100.00 | 100.00 | 100.00 | 100.00 |
Let’s add table title with caption option, and in order to remove the additional information before the table (e.g. round_8_final$gndr), specify option headings = FALSE in freq. I will add it to all remaining examples below.
| Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
|---|---|---|---|---|---|
| 1 | 21027 | 47.38 | 47.38 | 47.37 | 47.37 |
| 2 | 23351 | 52.62 | 100.00 | 52.61 | 99.98 |
| <NA> | 9 | 0.02 | 100.00 | ||
| Total | 44387 | 100.00 | 100.00 | 100.00 | 100.00 |
ctableWhen you want to display proportions by groups use ctable function. You can specify pecentages to be calculate for rows (prop="r") or columns (prop="c").
| hincfel | 1 | 2 | 3 | 4 | <NA> | Total | |
| gndr | |||||||
| 1 | 7101 (33.8%) | 9807 (46.6%) | 2906 (13.8%) | 973 (4.6%) | 240 ( 1.1%) | 21027 (100.0%) | |
| 2 | 6922 (29.6%) | 10684 (45.8%) | 4039 (17.3%) | 1427 (6.1%) | 279 ( 1.2%) | 23351 (100.0%) | |
| <NA> | 3 (33.3%) | 1 (11.1%) | 0 ( 0.0%) | 0 (0.0%) | 5 (55.6%) | 9 (100.0%) | |
| Total | 14026 (31.6%) | 20492 (46.2%) | 6945 (15.6%) | 2400 (5.4%) | 524 ( 1.2%) | 44387 (100.0%) |
dfSummaryThis function prodces summary of a data frame consisting of: variable names, labels if any, factor levels, frequencies and/or numerical summary statistics, and valid/missing observation counts. plain.ascii = FALSE makes the graphs look better and style = 'grid' is recommended.
| No | Variable | Label | Stats / Values | Freqs (% of Valid) | Graph | Missing |
|---|---|---|---|---|---|---|
| 1 | idno [numeric] |
Mean (sd) : 31545782.2 (115541717.8) min < med < max: 1 < 2589 < 551603139 IQR (CV) : 9849.5 (3.7) |
18328 distinct values | 0 (0%) |
||
| 2 | cntry [character] |
1. DE 2. IE 3. IT 4. IL 5. RU 6. CZ 7. LT 8. FR 9. EE 10. AT [ 13 others ] |
2852 ( 6.4%) 2757 ( 6.2%) 2626 ( 5.9%) 2557 ( 5.8%) 2430 ( 5.5%) 2269 ( 5.1%) 2122 ( 4.8%) 2070 ( 4.7%) 2019 ( 4.5%) 2010 ( 4.5%) 20675 (46.6%) |
0 (0%) |
||
| 3 | blgetmg [haven_labelled] |
Belong to minority ethnic group in country | Min : 1 Mean : 1.9 Max : 2 |
1 : 2843 ( 6.5%) 2 : 41103 (93.5%) |
441 (0.99%) |
|
| 4 | imbgeco [haven_labelled] |
Immigration bad or good for country’s economy | Mean (sd) : 5 (2.5) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) |
11 distinct values | 1562 (3.52%) |
|
| 5 | imueclt [haven_labelled] |
Country’s cultural life undermined or enriched by immigrants | Mean (sd) : 5.4 (2.6) min < med < max: 0 < 5 < 10 IQR (CV) : 3 (0.5) |
11 distinct values | 1403 (3.16%) |
|
| 6 | imwbcnt [haven_labelled] |
Immigrants make country worse or better place to live | Mean (sd) : 4.9 (2.4) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) |
11 distinct values | 1562 (3.52%) |
|
| 7 | gndr [haven_labelled] |
Gender | Min : 1 Mean : 1.5 Max : 2 |
1 : 21027 (47.4%) 2 : 23351 (52.6%) |
9 (0.02%) |
|
| 8 | agea [haven_labelled] |
Age of respondent, calculated | Mean (sd) : 49.1 (18.6) min < med < max: 15 < 49 < 100 IQR (CV) : 30 (0.4) |
86 distinct values | 155 (0.35%) |
|
| 9 | maritalb [haven_labelled] |
Legal marital status, post coded | Mean (sd) : 3.2 (2.3) min < med < max: 1 < 2 < 6 IQR (CV) : 5 (0.7) |
1 : 21711 (49.9%) 2 : 443 ( 1.0%) 3 : 648 ( 1.5%) 4 : 3912 ( 9.0%) 5 : 3756 ( 8.6%) 6 : 13039 (30.0%) |
878 (1.98%) |
|
| 10 | eisced [haven_labelled] |
Highest level of education, ES - ISCED | Mean (sd) : 4.1 (2.9) min < med < max: 1 < 4 < 55 IQR (CV) : 3 (0.7) |
1 : 3861 ( 8.7%) 2 : 7388 (16.7%) 3 : 7153 (16.2%) 4 : 8720 (19.7%) 5 : 6275 (14.2%) 6 : 4760 (10.8%) 7 : 6013 (13.6%) 55 : 88 ( 0.2%) |
129 (0.29%) |
|
| 11 | polintr [haven_labelled] |
How interested in politics | Mean (sd) : 2.6 (0.9) min < med < max: 1 < 3 < 4 IQR (CV) : 1 (0.4) |
1 : 5415 (12.2%) 2 : 15539 (35.1%) 3 : 15248 (34.4%) 4 : 8088 (18.3%) |
97 (0.22%) |
|
| 12 | hincfel [haven_labelled] |
Feeling about household’s income nowadays | Mean (sd) : 1.9 (0.8) min < med < max: 1 < 2 < 4 IQR (CV) : 1 (0.4) |
1 : 14026 (32.0%) 2 : 20492 (46.7%) 3 : 6945 (15.8%) 4 : 2400 ( 5.5%) |
524 (1.18%) |
|
| 13 | uemp5yr [haven_labelled] |
Any period of unemployment and work seeking within last 5 years | Min : 1 Mean : 1.5 Max : 2 |
1 : 5794 (46.7%) 2 : 6612 (53.3%) |
31981 (72.05%) |
This looks great, but might take too much space.
As this produces large table in the middle of your report, you might either display it at the end in Appendix, or constrain height of the table, by embedding dfSummary in print function.
In the table below I also excluded the column with numbers (varnumbers = FALSE), and constrained the width of each column (col.widths = 25, ...). It has to be specified for each remaining column.
print(dfSummary(round_8_final, plain.ascii = FALSE, style = 'grid', graph.magnif = 0.85, valid.col = FALSE, tmp.img.dir = "/tmp", varnumbers = FALSE, col.widths = c(35, 35, 35, 35, 35, 35), headings = FALSE, caption="Table 1. Summary of data frame"),
max.tbl.height = 400, method = "render")| Variable | Label | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| idno [numeric] | Mean (sd) : 31545782.2 (115541717.8) min < med < max: 1 < 2589 < 551603139 IQR (CV) : 9849.5 (3.7) | 18328 distinct values | 0 (0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| cntry [character] | 1. DE 2. IE 3. IT 4. IL 5. RU 6. CZ 7. LT 8. FR 9. EE 10. AT [ 13 others ] |
|
0 (0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| blgetmg [haven_labelled] | Belong to minority ethnic group in country | Min : 1 Mean : 1.9 Max : 2 |
|
441 (0.99%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| imbgeco [haven_labelled] | Immigration bad or good for country's economy | Mean (sd) : 5 (2.5) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) | 11 distinct values | 1562 (3.52%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| imueclt [haven_labelled] | Country's cultural life undermined or enriched by immigrants | Mean (sd) : 5.4 (2.6) min < med < max: 0 < 5 < 10 IQR (CV) : 3 (0.5) | 11 distinct values | 1403 (3.16%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| imwbcnt [haven_labelled] | Immigrants make country worse or better place to live | Mean (sd) : 4.9 (2.4) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) | 11 distinct values | 1562 (3.52%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| gndr [haven_labelled] | Gender | Min : 1 Mean : 1.5 Max : 2 |
|
9 (0.02%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| agea [haven_labelled] | Age of respondent, calculated | Mean (sd) : 49.1 (18.6) min < med < max: 15 < 49 < 100 IQR (CV) : 30 (0.4) | 86 distinct values | 155 (0.35%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| maritalb [haven_labelled] | Legal marital status, post coded | Mean (sd) : 3.2 (2.3) min < med < max: 1 < 2 < 6 IQR (CV) : 5 (0.7) |
|
878 (1.98%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| eisced [haven_labelled] | Highest level of education, ES - ISCED | Mean (sd) : 4.1 (2.9) min < med < max: 1 < 4 < 55 IQR (CV) : 3 (0.7) |
|
129 (0.29%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| polintr [haven_labelled] | How interested in politics | Mean (sd) : 2.6 (0.9) min < med < max: 1 < 3 < 4 IQR (CV) : 1 (0.4) |
|
97 (0.22%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| hincfel [haven_labelled] | Feeling about household's income nowadays | Mean (sd) : 1.9 (0.8) min < med < max: 1 < 2 < 4 IQR (CV) : 1 (0.4) |
|
524 (1.18%) | |||||||||||||||||||||||||||||||||||||||||||||||||
| uemp5yr [haven_labelled] | Any period of unemployment and work seeking within last 5 years | Min : 1 Mean : 1.5 Max : 2 |
|
31981 (72.05%) |
Generated by summarytools 0.9.6 (R version 3.6.3)
2020-05-01
kablekable can be used to disply results of models. You have to first prepare a summary of the model.
model1 <- lm(imueclt ~ agea + gndr, data=round_8_final)
regression1 <- coef(summary(model1))
kable(regression1, caption = "Table 1. Linear regression model predicting attitudes to immigration")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.7416851 | 0.0520508 | 110.309299 | 0.0000000 |
| agea | -0.0099040 | 0.0006862 | -14.432407 | 0.0000000 |
| gndr | 0.0741516 | 0.0254428 | 2.914445 | 0.0035651 |
It does not dispaly automatically \(R^2\) statistic, so you would have to extract it with a code:
[1] 0.004975681
Let’s add this information now to the table.
model1 <- lm(imueclt ~ agea + gndr, data=round_8_final)
regression1 <- coef(summary(model1))
kable(regression1, caption = "Table 1. Linear regression model predicting attitudes to immigration")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.7416851 | 0.0520508 | 110.309299 | 0.0000000 |
| agea | -0.0099040 | 0.0006862 | -14.432407 | 0.0000000 |
| gndr | 0.0741516 | 0.0254428 | 2.914445 | 0.0035651 |
| \(R^2\)=0.005; | N=42842. |
More kable options are discussed here: https://bookdown.org/yihui/rmarkdown-cookbook/kable.html
More advanced option and changing table style are here: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html#overview
tab_modelThe regression table prepared with kable does not look bad, but it can look even better with tab_model fucntion avaiable in sjPlot library. You can plot more than one model in the table and it automatically adds a few summary statistics on model fit.
library("sjPlot")
model2 <- lm(imueclt ~ agea + gndr, data=round_8_final)
model3 <- lm(imwbcnt ~ agea + gndr, data=round_8_final)
tab_model(model2, model3)|
Country’s cultural life undermined or enriched by immigrants |
Immigrants make country worse or better place to live |
|||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 5.74 | 5.64 – 5.84 | <0.001 | 5.37 | 5.27 – 5.46 | <0.001 |
|
Age of respondent, calculated |
-0.01 | -0.01 – -0.01 | <0.001 | -0.01 | -0.01 – -0.01 | <0.001 |
| Gender | 0.07 | 0.02 – 0.12 | 0.004 | 0.01 | -0.03 – 0.06 | 0.567 |
| Observations | 42842 | 42679 | ||||
| R2 / R2 adjusted | 0.005 / 0.005 | 0.006 / 0.006 | ||||
This is a very good website which introduces more options, such as changing the default settings, removing/adding or defining own labels: https://cran.r-project.org/web/packages/sjPlot/vignettes/tab_model_estimates.html. You will notice that tab_model works with many types of regressions, including multilevel models.
Let’s add one more variable to the model above - hincfel, which measures relative deprivation as a ordinal variable (4 categories). I had to first tell R to treat it as a factor variable.
round_8_final$hincfel <- factor(round_8_final$hincfel)
model3 <- lm(imwbcnt ~ agea + gndr, data=round_8_final)
model4 <- lm(imwbcnt ~ agea + gndr + hincfel, data=round_8_final)
tab_model(model3, model4)|
Immigrants make country worse or better place to live |
Immigrants make country worse or better place to live |
|||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 5.37 | 5.27 – 5.46 | <0.001 | 5.86 | 5.76 – 5.95 | <0.001 |
|
Age of respondent, calculated |
-0.01 | -0.01 – -0.01 | <0.001 | -0.01 | -0.01 – -0.01 | <0.001 |
| Gender | 0.01 | -0.03 – 0.06 | 0.567 | 0.08 | 0.04 – 0.13 | <0.001 |
| hincfel: hincfel2 | -0.75 | -0.80 – -0.69 | <0.001 | |||
| hincfel: hincfel3 | -1.36 | -1.43 – -1.29 | <0.001 | |||
| hincfel: hincfel4 | -1.82 | -1.93 – -1.72 | <0.001 | |||
| Observations | 42679 | 42254 | ||||
| R2 / R2 adjusted | 0.006 / 0.006 | 0.056 / 0.056 | ||||
Finally, I add show.reflvl = TRUE option, which adds rows to the table with referece categories for caegorical variables.
tab_model(model3, model4,
pred.labels = c("Intercept", "Age", "Female", "Ref. Living comfortably: Coping",
"Difficult", "Very difficult"),
dv.labels = c("Model 1", "Model 2"),
string.est = "Coeffcient",
string.ci = "Conf. Int (95%)",
string.p = "P-Value"
)| Model 1 | Model 2 | |||||
|---|---|---|---|---|---|---|
| Predictors | Coeffcient | Conf. Int (95%) | P-Value | Coeffcient | Conf. Int (95%) | P-Value |
| Intercept | 5.37 | 5.27 – 5.46 | <0.001 | 5.86 | 5.76 – 5.95 | <0.001 |
| Age | -0.01 | -0.01 – -0.01 | <0.001 | -0.01 | -0.01 – -0.01 | <0.001 |
| Female | 0.01 | -0.03 – 0.06 | 0.567 | 0.08 | 0.04 – 0.13 | <0.001 |
| Ref. Living comfortably: Coping | -0.75 | -0.80 – -0.69 | <0.001 | |||
| Difficult | -1.36 | -1.43 – -1.29 | <0.001 | |||
| Very difficult | -1.82 | -1.93 – -1.72 | <0.001 | |||
| Observations | 42679 | 42254 | ||||
| R2 / R2 adjusted | 0.006 / 0.006 | 0.056 / 0.056 | ||||
Same model but with stars to indicate the level of significance p.style = "a" and I also add title.
tab_model(model3, model4,
pred.labels = c("Intercept", "Age", "Female", "Ref. Living comfortably: </br> Coping",
"Difficult", "Very difficult"),
dv.labels = c("Model 1", "Model 2"),
string.est = "Coeffcient",
string.ci = "Conf. Int (95%)",
p.style = "a", title = "Table 1. Linear regression models prediting attitudes to immigration (Immigrants make country worse or better place to live (0-10))"
)| Model 1 | Model 2 | |||
|---|---|---|---|---|
| Predictors | Coeffcient | Conf. Int (95%) | Coeffcient | Conf. Int (95%) |
| Intercept | 5.37 *** | 5.27 – 5.46 | 5.86 *** | 5.76 – 5.95 |
| Age | -0.01 *** | -0.01 – -0.01 | -0.01 *** | -0.01 – -0.01 |
| Female | 0.01 | -0.03 – 0.06 | 0.08 *** | 0.04 – 0.13 |
| Ref. Living comfortably: Coping | -0.75 *** | -0.80 – -0.69 | ||
| Difficult | -1.36 *** | -1.43 – -1.29 | ||
| Very difficult | -1.82 *** | -1.93 – -1.72 | ||
| Observations | 42679 | 42254 | ||
| R2 / R2 adjusted | 0.006 / 0.006 | 0.056 / 0.056 | ||
|
||||
As you can see in r chunk I add </br> HTML language command in the middle of the first subjective income label to split it into two rows. wrap.labels option might be handy in case of long labels. All tab_model options are listed in R documentation here: https://www.rdocumentation.org/packages/sjPlot/versions/2.8.3/topics/tab_model
Using the same ESS 2016 data prepare a short report in R Markdown consisting of the following:
polintr)gndr and prepare a table with row percentagesimueclt - Country’s cultural life undermined or enriched by immigrantsimbgeco - Immigration bad or good for country’s economyagea, gndr, polintr as independent variablespolintr into a factor variabletab_model functionWhen you finish coding the results, go to YAML section:
rmdformats::readthedown (install first rmdformats package)Finally, publish your report on https://Rpubs.com (create an account) and send me a link to what you have done before next virtual lab on 15th May, Friday.
Enjoy exploring R Markdown!