Technical notes

  • These are guidelines and templates on how to prepare publishable tables in R Markdown printed into HTML format for SMI205 students.

  • Before running this R Markdown document install the following packages:

  • This HTML page was produced using the default R Markdown theme and ‘tango’ highlight, with a button to download Rmd code. The same document, but in material theme from rmdformats package is here.

html_document: highlight: tango code_download: true toc: true toc_depth: 2 toc_float: collapsed: false smooth_scroll: true

  • More YAML settings are overviewed here: https://rmarkdown.rstudio.com/docs/reference/html_document.html

  • I’m also setting the global chunk options, so for all r chunks the code is diplayed (echo=TRUE - note that default in this theme is FALSE), but any additional messages or warnings are not displayed (both set up as FALSE).

  • With these global settings I do not have to specify these options seperately for all my r chunks, only if I want to change them. You might want to specify include=FALSE for this r chunk, so it is not displayed at the begining of your report while the code is still run.

Opening and subsetting data

I use ESS wave 2016 (wave 8) for this excercise. As you remember from our previous R labs, this data can be easily dowloaded using essurvey package.

In the settings of this r chunk I specify results = 'hide' so R Markdown does not print all information about loaded data. Try changing this to see what happens.

I subset data and keep the following variables in my new, smaller dataset called round_8_final: idno, cntry, blgetmg, imbgeco, imueclt, imwbcnt, gndr, agea, maritalb, eisced, polintr, hincfel, uemp5yr.

In your reports, please do the same for all data manipulation, such as subsetting or merging datasets - display r chunks, but hide the results.

Do not just use summary option

The summary option is well known to you from previous R classes. Let’s produce a summary table of all variables in our smaller dataset and a seperate one for variable age.

      idno              cntry              blgetmg         imbgeco      
 Min.   :        1   Length:44387       Min.   :1.000   Min.   : 0.000  
 1st Qu.:     1208   Class :character   1st Qu.:2.000   1st Qu.: 3.000  
 Median :     2589   Mode  :character   Median :2.000   Median : 5.000  
 Mean   : 31545782                      Mean   :1.935   Mean   : 5.006  
 3rd Qu.:    11058                      3rd Qu.:2.000   3rd Qu.: 7.000  
 Max.   :551603139                      Max.   :2.000   Max.   :10.000  
    imueclt          imwbcnt            gndr            agea       
 Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   : 15.00  
 1st Qu.: 4.000   1st Qu.: 3.000   1st Qu.:1.000   1st Qu.: 34.00  
 Median : 5.000   Median : 5.000   Median :2.000   Median : 49.00  
 Mean   : 5.368   Mean   : 4.904   Mean   :1.526   Mean   : 49.14  
 3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.:2.000   3rd Qu.: 64.00  
 Max.   :10.000   Max.   :10.000   Max.   :2.000   Max.   :100.00  
    maritalb         eisced          polintr         hincfel     
 Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.: 2.000   1st Qu.:2.000   1st Qu.:1.000  
 Median :2.000   Median : 4.000   Median :3.000   Median :2.000  
 Mean   :3.153   Mean   : 4.109   Mean   :2.587   Mean   :1.948  
 3rd Qu.:6.000   3rd Qu.: 5.000   3rd Qu.:3.000   3rd Qu.:2.000  
 Max.   :6.000   Max.   :55.000   Max.   :4.000   Max.   :4.000  
    uemp5yr     
 Min.   :1.00   
 1st Qu.:1.00   
 Median :2.00   
 Mean   :1.53   
 3rd Qu.:2.00   
 Max.   :2.00   
 [ reached getOption("max.print") -- omitted 1 row ]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  15.00   34.00   49.00   49.14   64.00  100.00     155 


Both outputs are ‘readable’, but not presented in a publishable format.

In next sections I overview a few functions that are useful in producing nice tables, which are easily pinted in HTML format in R Markdown.

Producing tables with xtable

First, let’s use xtable function which exports tables to LaTeX or HTML format.

You need to install this package first on your machine (go back to Technical notes section to see names of packages). xtable overview is here: https://cran.r-project.org/web/packages/xtable/vignettes/xtableGallery.pdf

xtable needs to be supported with print function, which helps in ‘translating’ the output into HTML code associated with xtable object. See more here

I use this function to display first 10 rows of my data for all variables in the dataset. My r chunk is displayed first, then the result - I made them both visisble.

<!-- html table generated in R 3.6.3 by xtable 1.8-4 package -->
<!-- Fri May 01 15:12:50 2020 -->
<table border=1>
<tr> <th>  </th> <th> idno </th> <th> cntry </th> <th> blgetmg </th> <th> imbgeco </th> <th> imueclt </th> <th> imwbcnt </th> <th> gndr </th> <th> agea </th> <th> maritalb </th> <th> eisced </th> <th> polintr </th> <th> hincfel </th> <th> uemp5yr </th>  </tr>
  <tr> <td align="right"> 1 </td> <td align="right"> 1 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 10 </td> <td align="right"> 10 </td> <td align="right"> 10 </td> <td align="right"> 2 </td> <td align="right"> 34 </td> <td align="right"> 6 </td> <td align="right"> 7 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> </tr>
  <tr> <td align="right"> 2 </td> <td align="right"> 2 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 8 </td> <td align="right"> 8 </td> <td align="right"> 5 </td> <td align="right"> 1 </td> <td align="right"> 52 </td> <td align="right"> 1 </td> <td align="right"> 4 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> </tr>
  <tr> <td align="right"> 3 </td> <td align="right"> 4 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 5 </td> <td align="right"> 4 </td> <td align="right"> 2 </td> <td align="right"> 68 </td> <td align="right"> 6 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> </tr>
  <tr> <td align="right"> 4 </td> <td align="right"> 6 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 7 </td> <td align="right"> 7 </td> <td align="right"> 10 </td> <td align="right"> 1 </td> <td align="right"> 54 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 5 </td> <td align="right"> 10 </td> <td> AT </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 5 </td> <td align="right"> 2 </td> <td align="right"> 20 </td> <td align="right"> 1 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 6 </td> <td align="right"> 11 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 8 </td> <td align="right"> 6 </td> <td align="right"> 8 </td> <td align="right"> 2 </td> <td align="right"> 65 </td> <td align="right"> 4 </td> <td align="right"> 4 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 7 </td> <td align="right"> 12 </td> <td> AT </td> <td align="right"> 1 </td> <td align="right"> 6 </td> <td align="right"> 5 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 52 </td> <td align="right"> 1 </td> <td align="right"> 2 </td> <td align="right"> 3 </td> <td align="right"> 3 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 8 </td> <td align="right"> 13 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 5 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 44 </td> <td align="right"> 1 </td> <td align="right"> 5 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 9 </td> <td align="right"> 14 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 9 </td> <td align="right"> 10 </td> <td align="right"> 9 </td> <td align="right"> 2 </td> <td align="right"> 22 </td> <td align="right"> 6 </td> <td align="right"> 3 </td> <td align="right"> 4 </td> <td align="right"> 1 </td> <td align="right">  </td> </tr>
  <tr> <td align="right"> 10 </td> <td align="right"> 15 </td> <td> AT </td> <td align="right"> 2 </td> <td align="right"> 7 </td> <td align="right"> 9 </td> <td align="right"> 9 </td> <td align="right"> 2 </td> <td align="right"> 41 </td> <td align="right"> 4 </td> <td align="right"> 3 </td> <td align="right"> 2 </td> <td align="right"> 2 </td> <td align="right"> 1 </td> </tr>
   </table>


Ups - this does not look so well. You have to specify results='asis' in the r chunk. The deafault is results='markup' which writes the resutls in the raw html form (which is read by LaTeX). The table looks like that when you specify results='asis':

idno cntry blgetmg imbgeco imueclt imwbcnt gndr agea maritalb eisced polintr hincfel uemp5yr
1 1 AT 2 10 10 10 2 34 6 7 1 2 1
2 2 AT 2 8 8 5 1 52 1 4 1 2 2
3 4 AT 2 5 5 4 2 68 6 3 3 2 2
4 6 AT 2 7 7 10 1 54 4 3 2 2
5 10 AT 1 2 5 5 2 20 1 3 3 2
6 11 AT 2 8 6 8 2 65 4 4 2 1
7 12 AT 1 6 5 3 2 52 1 2 3 3
8 13 AT 2 5 4 3 2 44 1 5 3 2
9 14 AT 2 9 10 9 2 22 6 3 4 1
10 15 AT 2 7 9 9 2 41 4 3 2 2 1

Let’s now hide the code (option echo=FALSE in r chunk) and add table name before the output. Option caption in xtable allows adding title, but it apears below the table. You can however add one more option to print - caption.placement="top". Finally, you can use option align in xtable to indicate whether you want text in cells to be placed in the centre (“c”), on the right (“r”) or on the left (“l”). This has to be specified for all columns and one additional column at the begining (so number of columns + 1 times).

Table 1. Overview of 10 first rows in data
idno cntry blgetmg imbgeco imueclt imwbcnt gndr agea maritalb eisced polintr hincfel uemp5yr
1 1 AT 2 10 10 10 2 34 6 7 1 2 1
2 2 AT 2 8 8 5 1 52 1 4 1 2 2
3 4 AT 2 5 5 4 2 68 6 3 3 2 2
4 6 AT 2 7 7 10 1 54 4 3 2 2
5 10 AT 1 2 5 5 2 20 1 3 3 2
6 11 AT 2 8 6 8 2 65 4 4 2 1
7 12 AT 1 6 5 3 2 52 1 2 3 3
8 13 AT 2 5 4 3 2 44 1 5 3 2
9 14 AT 2 9 10 9 2 22 6 3 4 1
10 15 AT 2 7 9 9 2 41 4 3 2 2 1

Data overview with kable

It turns out that kable function poduces much better looking tables in the HTML format (xtable is preffered for PDF/LaTeX formats).

This section of R Markdown book provides more examples on how to use kable: https://bookdown.org/yihui/rmarkdown-cookbook/kable.html

Let’s display the same first 10 rows of our data, as previously, but now using kable function. Table title is more straighforward option here.

Table 1. Overview of 10 first rows in data
idno cntry blgetmg imbgeco imueclt imwbcnt gndr agea maritalb eisced polintr hincfel uemp5yr
1 AT 2 10 10 10 2 34 6 7 1 2 1
2 AT 2 8 8 5 1 52 1 4 1 2 2
4 AT 2 5 5 4 2 68 6 3 3 2 2
6 AT 2 7 7 10 1 54 4 3 2 2 NA
10 AT 1 2 5 5 2 20 1 3 3 2 NA
11 AT 2 8 6 8 2 65 4 4 2 1 NA
12 AT 1 6 5 3 2 52 1 2 3 3 NA
13 AT 2 5 4 3 2 44 1 5 3 2 NA
14 AT 2 9 10 9 2 22 6 3 4 1 NA
15 AT 2 7 9 9 2 41 4 3 2 2 1

Summary statistics with summarytools package

Before diving into any more complex data modelling, you should always explore frequency distribution of your key variables and compute desriptive statistics in order to understand what is happening in the data better.

summarytools package offers a wide range of functions to do so:

Descriptive statistics for all variables in your data - descr

Function descr produces a table with all descritpive statistics for all variables. In most functions from summarytools package you have to specify style = "rmarkdown" (but see below), so it produces nice looking tables in this format.

Descriptive Statistics

round_8_final

N: 44387

Table continues below
  agea blgetmg eisced gndr hincfel idno
Mean 49.14 1.94 4.11 1.53 1.95 31545782.19
Std.Dev 18.61 0.25 2.93 0.50 0.83 115541717.82
Min 15.00 1.00 1.00 1.00 1.00 1.00
Q1 34.00 2.00 2.00 1.00 1.00 1208.00
Median 49.00 2.00 4.00 2.00 2.00 2589.00
Q3 64.00 2.00 5.00 2.00 2.00 11058.00
Max 100.00 2.00 55.00 2.00 4.00 551603139.00
MAD 22.24 0.00 1.48 0.00 1.48 2640.51
IQR 30.00 0.00 3.00 1.00 1.00 9849.50
CV 0.38 0.13 0.71 0.33 0.43 3.66
Skewness 0.05 -3.54 10.44 -0.10 0.67 4.14
SE.Skewness 0.01 0.01 0.01 0.01 0.01 0.01
Kurtosis -0.93 10.53 178.93 -1.99 -0.05 15.68
N.Valid 44232.00 43946.00 44258.00 44378.00 43863.00 44387.00
Pct.Valid 99.65 99.01 99.71 99.98 98.82 100.00
  imbgeco imueclt imwbcnt maritalb polintr uemp5yr
Mean 5.01 5.37 4.90 3.15 2.59 1.53
Std.Dev 2.52 2.64 2.40 2.26 0.92 0.50
Min 0.00 0.00 0.00 1.00 1.00 1.00
Q1 3.00 4.00 3.00 1.00 2.00 1.00
Median 5.00 5.00 5.00 2.00 3.00 2.00
Q3 7.00 7.00 7.00 6.00 3.00 2.00
Max 10.00 10.00 10.00 6.00 4.00 2.00
MAD 2.97 2.97 2.97 1.48 1.48 0.00
IQR 4.00 3.00 4.00 5.00 1.00 1.00
CV 0.50 0.49 0.49 0.72 0.36 0.33
Skewness -0.24 -0.28 -0.16 0.22 -0.03 -0.13
SE.Skewness 0.01 0.01 0.01 0.01 0.01 0.02
Kurtosis -0.51 -0.60 -0.29 -1.80 -0.86 -1.98
N.Valid 42825.00 42984.00 42825.00 43509.00 44290.00 12406.00
Pct.Valid 96.48 96.84 96.48 98.02 99.78 27.95

As we do not want to display so much statistical informaiton we can constrain stats options to more basic descriptive statistics: mean, standard deviation, minimum and maximum value. I specify to swap columns with rows (transpose), hide headings and I add title with caption.

Table 1. Descriptive statistics for key variables
Mean Std.Dev Min Max
agea 49.14 18.61 15.00 100.00
blgetmg 1.94 0.25 1.00 2.00
eisced 4.11 2.93 1.00 55.00
gndr 1.53 0.50 1.00 2.00
hincfel 1.95 0.83 1.00 4.00
idno 31545782.19 115541717.82 1.00 551603139.00
imbgeco 5.01 2.52 0.00 10.00
imueclt 5.37 2.64 0.00 10.00
imwbcnt 4.90 2.40 0.00 10.00
maritalb 3.15 2.26 1.00 6.00
polintr 2.59 0.92 1.00 4.00
uemp5yr 1.53 0.50 1.00 2.00

Frequency distribution - freq

If you want to display frequency distribution for one variable, use fuction freq

Frequencies

round_8_final$gndr

Label: Gender

  Freq % Valid % Valid Cum. % Total % Total Cum.
1 21027 47.38 47.38 47.37 47.37
2 23351 52.62 100.00 52.61 99.98
<NA> 9 0.02 100.00
Total 44387 100.00 100.00 100.00 100.00

Let’s add table title with caption option, and in order to remove the additional information before the table (e.g. round_8_final$gndr), specify option headings = FALSE in freq. I will add it to all remaining examples below.

Table 1. Sample distribution by gender, ESS 2016
  Freq % Valid % Valid Cum. % Total % Total Cum.
1 21027 47.38 47.38 47.37 47.37
2 23351 52.62 100.00 52.61 99.98
<NA> 9 0.02 100.00
Total 44387 100.00 100.00 100.00 100.00

Cross-tabulate two variables - ctable

When you want to display proportions by groups use ctable function. You can specify pecentages to be calculate for rows (prop="r") or columns (prop="c").

Table 1. Subjective income by gender in ESS 2016
hincfel 1 2 3 4 <NA> Total
gndr
1 7101 (33.8%) 9807 (46.6%) 2906 (13.8%) 973 (4.6%) 240 ( 1.1%) 21027 (100.0%)
2 6922 (29.6%) 10684 (45.8%) 4039 (17.3%) 1427 (6.1%) 279 ( 1.2%) 23351 (100.0%)
<NA> 3 (33.3%) 1 (11.1%) 0 ( 0.0%) 0 (0.0%) 5 (55.6%) 9 (100.0%)
Total 14026 (31.6%) 20492 (46.2%) 6945 (15.6%) 2400 (5.4%) 524 ( 1.2%) 44387 (100.0%)

Data frame summary - dfSummary

This function prodces summary of a data frame consisting of: variable names, labels if any, factor levels, frequencies and/or numerical summary statistics, and valid/missing observation counts. plain.ascii = FALSE makes the graphs look better and style = 'grid' is recommended.

Table 1. Summary of data frame
No Variable Label Stats / Values Freqs (% of Valid) Graph Missing
1 idno
[numeric]
Mean (sd) : 31545782.2 (115541717.8)
min < med < max:
1 < 2589 < 551603139
IQR (CV) : 9849.5 (3.7)
18328 distinct values 0
(0%)
2 cntry
[character]
1. DE
2. IE
3. IT
4. IL
5. RU
6. CZ
7. LT
8. FR
9. EE
10. AT
[ 13 others ]
2852 ( 6.4%)
2757 ( 6.2%)
2626 ( 5.9%)
2557 ( 5.8%)
2430 ( 5.5%)
2269 ( 5.1%)
2122 ( 4.8%)
2070 ( 4.7%)
2019 ( 4.5%)
2010 ( 4.5%)
20675 (46.6%)
0
(0%)
3 blgetmg
[haven_labelled]
Belong to minority ethnic group in country Min : 1
Mean : 1.9
Max : 2
1 : 2843 ( 6.5%)
2 : 41103 (93.5%)
441
(0.99%)
4 imbgeco
[haven_labelled]
Immigration bad or good for country’s economy Mean (sd) : 5 (2.5)
min < med < max:
0 < 5 < 10
IQR (CV) : 4 (0.5)
11 distinct values 1562
(3.52%)
5 imueclt
[haven_labelled]
Country’s cultural life undermined or enriched by immigrants Mean (sd) : 5.4 (2.6)
min < med < max:
0 < 5 < 10
IQR (CV) : 3 (0.5)
11 distinct values 1403
(3.16%)
6 imwbcnt
[haven_labelled]
Immigrants make country worse or better place to live Mean (sd) : 4.9 (2.4)
min < med < max:
0 < 5 < 10
IQR (CV) : 4 (0.5)
11 distinct values 1562
(3.52%)
7 gndr
[haven_labelled]
Gender Min : 1
Mean : 1.5
Max : 2
1 : 21027 (47.4%)
2 : 23351 (52.6%)
9
(0.02%)
8 agea
[haven_labelled]
Age of respondent, calculated Mean (sd) : 49.1 (18.6)
min < med < max:
15 < 49 < 100
IQR (CV) : 30 (0.4)
86 distinct values 155
(0.35%)
9 maritalb
[haven_labelled]
Legal marital status, post coded Mean (sd) : 3.2 (2.3)
min < med < max:
1 < 2 < 6
IQR (CV) : 5 (0.7)
1 : 21711 (49.9%)
2 : 443 ( 1.0%)
3 : 648 ( 1.5%)
4 : 3912 ( 9.0%)
5 : 3756 ( 8.6%)
6 : 13039 (30.0%)
878
(1.98%)
10 eisced
[haven_labelled]
Highest level of education, ES - ISCED Mean (sd) : 4.1 (2.9)
min < med < max:
1 < 4 < 55
IQR (CV) : 3 (0.7)
1 : 3861 ( 8.7%)
2 : 7388 (16.7%)
3 : 7153 (16.2%)
4 : 8720 (19.7%)
5 : 6275 (14.2%)
6 : 4760 (10.8%)
7 : 6013 (13.6%)
55 : 88 ( 0.2%)
129
(0.29%)
11 polintr
[haven_labelled]
How interested in politics Mean (sd) : 2.6 (0.9)
min < med < max:
1 < 3 < 4
IQR (CV) : 1 (0.4)
1 : 5415 (12.2%)
2 : 15539 (35.1%)
3 : 15248 (34.4%)
4 : 8088 (18.3%)
97
(0.22%)
12 hincfel
[haven_labelled]
Feeling about household’s income nowadays Mean (sd) : 1.9 (0.8)
min < med < max:
1 < 2 < 4
IQR (CV) : 1 (0.4)
1 : 14026 (32.0%)
2 : 20492 (46.7%)
3 : 6945 (15.8%)
4 : 2400 ( 5.5%)
524
(1.18%)
13 uemp5yr
[haven_labelled]
Any period of unemployment and work seeking within last 5 years Min : 1
Mean : 1.5
Max : 2
1 : 5794 (46.7%)
2 : 6612 (53.3%)
31981
(72.05%)

This looks great, but might take too much space.

As this produces large table in the middle of your report, you might either display it at the end in Appendix, or constrain height of the table, by embedding dfSummary in print function.

In the table below I also excluded the column with numbers (varnumbers = FALSE), and constrained the width of each column (col.widths = 25, ...). It has to be specified for each remaining column.

Variable Label Stats / Values Freqs (% of Valid) Graph Missing
idno [numeric] Mean (sd) : 31545782.2 (115541717.8) min < med < max: 1 < 2589 < 551603139 IQR (CV) : 9849.5 (3.7) 18328 distinct values 0 (0%)
cntry [character] 1. DE 2. IE 3. IT 4. IL 5. RU 6. CZ 7. LT 8. FR 9. EE 10. AT [ 13 others ]
2852(6.4%)
2757(6.2%)
2626(5.9%)
2557(5.8%)
2430(5.5%)
2269(5.1%)
2122(4.8%)
2070(4.7%)
2019(4.5%)
2010(4.5%)
20675(46.6%)
0 (0%)
blgetmg [haven_labelled] Belong to minority ethnic group in country Min : 1 Mean : 1.9 Max : 2
1:2843(6.5%)
2:41103(93.5%)
441 (0.99%)
imbgeco [haven_labelled] Immigration bad or good for country's economy Mean (sd) : 5 (2.5) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) 11 distinct values 1562 (3.52%)
imueclt [haven_labelled] Country's cultural life undermined or enriched by immigrants Mean (sd) : 5.4 (2.6) min < med < max: 0 < 5 < 10 IQR (CV) : 3 (0.5) 11 distinct values 1403 (3.16%)
imwbcnt [haven_labelled] Immigrants make country worse or better place to live Mean (sd) : 4.9 (2.4) min < med < max: 0 < 5 < 10 IQR (CV) : 4 (0.5) 11 distinct values 1562 (3.52%)
gndr [haven_labelled] Gender Min : 1 Mean : 1.5 Max : 2
1:21027(47.4%)
2:23351(52.6%)
9 (0.02%)
agea [haven_labelled] Age of respondent, calculated Mean (sd) : 49.1 (18.6) min < med < max: 15 < 49 < 100 IQR (CV) : 30 (0.4) 86 distinct values 155 (0.35%)
maritalb [haven_labelled] Legal marital status, post coded Mean (sd) : 3.2 (2.3) min < med < max: 1 < 2 < 6 IQR (CV) : 5 (0.7)
1:21711(49.9%)
2:443(1.0%)
3:648(1.5%)
4:3912(9.0%)
5:3756(8.6%)
6:13039(30.0%)
878 (1.98%)
eisced [haven_labelled] Highest level of education, ES - ISCED Mean (sd) : 4.1 (2.9) min < med < max: 1 < 4 < 55 IQR (CV) : 3 (0.7)
1:3861(8.7%)
2:7388(16.7%)
3:7153(16.2%)
4:8720(19.7%)
5:6275(14.2%)
6:4760(10.8%)
7:6013(13.6%)
55:88(0.2%)
129 (0.29%)
polintr [haven_labelled] How interested in politics Mean (sd) : 2.6 (0.9) min < med < max: 1 < 3 < 4 IQR (CV) : 1 (0.4)
1:5415(12.2%)
2:15539(35.1%)
3:15248(34.4%)
4:8088(18.3%)
97 (0.22%)
hincfel [haven_labelled] Feeling about household's income nowadays Mean (sd) : 1.9 (0.8) min < med < max: 1 < 2 < 4 IQR (CV) : 1 (0.4)
1:14026(32.0%)
2:20492(46.7%)
3:6945(15.8%)
4:2400(5.5%)
524 (1.18%)
uemp5yr [haven_labelled] Any period of unemployment and work seeking within last 5 years Min : 1 Mean : 1.5 Max : 2
1:5794(46.7%)
2:6612(53.3%)
31981 (72.05%)

Generated by summarytools 0.9.6 (R version 3.6.3)
2020-05-01

Regression results with kable

kable can be used to disply results of models. You have to first prepare a summary of the model.

Table 1. Linear regression model predicting attitudes to immigration
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7416851 0.0520508 110.309299 0.0000000
agea -0.0099040 0.0006862 -14.432407 0.0000000
gndr 0.0741516 0.0254428 2.914445 0.0035651

It does not dispaly automatically \(R^2\) statistic, so you would have to extract it with a code:

[1] 0.004975681

Let’s add this information now to the table.

Table 1. Linear regression model predicting attitudes to immigration
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7416851 0.0520508 110.309299 0.0000000
agea -0.0099040 0.0006862 -14.432407 0.0000000
gndr 0.0741516 0.0254428 2.914445 0.0035651
\(R^2\)=0.005; N=42842.

More kable options are discussed here: https://bookdown.org/yihui/rmarkdown-cookbook/kable.html

More advanced option and changing table style are here: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html#overview

Regression results with tab_model

The regression table prepared with kable does not look bad, but it can look even better with tab_model fucntion avaiable in sjPlot library. You can plot more than one model in the table and it automatically adds a few summary statistics on model fit.

  Country’s cultural life
undermined or enriched by
immigrants
Immigrants make country
worse or better place to
live
Predictors Estimates CI p Estimates CI p
(Intercept) 5.74 5.64 – 5.84 <0.001 5.37 5.27 – 5.46 <0.001
Age of respondent,
calculated
-0.01 -0.01 – -0.01 <0.001 -0.01 -0.01 – -0.01 <0.001
Gender 0.07 0.02 – 0.12 0.004 0.01 -0.03 – 0.06 0.567
Observations 42842 42679
R2 / R2 adjusted 0.005 / 0.005 0.006 / 0.006

This is a very good website which introduces more options, such as changing the default settings, removing/adding or defining own labels: https://cran.r-project.org/web/packages/sjPlot/vignettes/tab_model_estimates.html. You will notice that tab_model works with many types of regressions, including multilevel models.

Let’s add one more variable to the model above - hincfel, which measures relative deprivation as a ordinal variable (4 categories). I had to first tell R to treat it as a factor variable.

  Immigrants make country
worse or better place to
live
Immigrants make country
worse or better place to
live
Predictors Estimates CI p Estimates CI p
(Intercept) 5.37 5.27 – 5.46 <0.001 5.86 5.76 – 5.95 <0.001
Age of respondent,
calculated
-0.01 -0.01 – -0.01 <0.001 -0.01 -0.01 – -0.01 <0.001
Gender 0.01 -0.03 – 0.06 0.567 0.08 0.04 – 0.13 <0.001
hincfel: hincfel2 -0.75 -0.80 – -0.69 <0.001
hincfel: hincfel3 -1.36 -1.43 – -1.29 <0.001
hincfel: hincfel4 -1.82 -1.93 – -1.72 <0.001
Observations 42679 42254
R2 / R2 adjusted 0.006 / 0.006 0.056 / 0.056

Finally, I add show.reflvl = TRUE option, which adds rows to the table with referece categories for caegorical variables.

  Model 1 Model 2
Predictors Coeffcient Conf. Int (95%) P-Value Coeffcient Conf. Int (95%) P-Value
Intercept 5.37 5.27 – 5.46 <0.001 5.86 5.76 – 5.95 <0.001
Age -0.01 -0.01 – -0.01 <0.001 -0.01 -0.01 – -0.01 <0.001
Female 0.01 -0.03 – 0.06 0.567 0.08 0.04 – 0.13 <0.001
Ref. Living comfortably: Coping -0.75 -0.80 – -0.69 <0.001
Difficult -1.36 -1.43 – -1.29 <0.001
Very difficult -1.82 -1.93 – -1.72 <0.001
Observations 42679 42254
R2 / R2 adjusted 0.006 / 0.006 0.056 / 0.056

Same model but with stars to indicate the level of significance p.style = "a" and I also add title.

Table 1. Linear regression models prediting attitudes to immigration (Immigrants make country worse or better place to live (0-10))
  Model 1 Model 2
Predictors Coeffcient Conf. Int (95%) Coeffcient Conf. Int (95%)
Intercept 5.37 *** 5.27 – 5.46 5.86 *** 5.76 – 5.95
Age -0.01 *** -0.01 – -0.01 -0.01 *** -0.01 – -0.01
Female 0.01 -0.03 – 0.06 0.08 *** 0.04 – 0.13
Ref. Living comfortably:
Coping
-0.75 *** -0.80 – -0.69
Difficult -1.36 *** -1.43 – -1.29
Very difficult -1.82 *** -1.93 – -1.72
Observations 42679 42254
R2 / R2 adjusted 0.006 / 0.006 0.056 / 0.056
  • p<0.05   ** p<0.01   *** p<0.001

As you can see in r chunk I add </br> HTML language command in the middle of the first subjective income label to split it into two rows. wrap.labels option might be handy in case of long labels. All tab_model options are listed in R documentation here: https://www.rdocumentation.org/packages/sjPlot/versions/2.8.3/topics/tab_model

Excercise

Using the same ESS 2016 data prepare a short report in R Markdown consisting of the following:

  1. Subset ESS data for one country (you choose which one)
  2. Prepare a summary table displaying frequency distribution of variable measuring interest in politics (polintr)
  3. Crosstabulate this variable by gndr and prepare a table with row percentages
  • visualise this (see examples in my next practical on figures)
  1. For data from this country run two linear regressions which model the response for two variables measuring attitudes to immigration:
  • imueclt - Country’s cultural life undermined or enriched by immigrants
  • imbgeco - Immigration bad or good for country’s economy
  • add agea, gndr, polintr as independent variables
  • there might be a need to change polintr into a factor variable
  1. Display the resutls of both models as one table using tab_model function
  • make the table look transparent and nice
  1. Organise these results in sections (using hashtags) and save as a fresh report (delete my notes)
  • it is up to you which sections or r chunks you will display and which hide

When you finish coding the results, go to YAML section:

  • change the theme in YAML section into rmdformats::readthedown (install first rmdformats package)
  • update authorship and date there
  • experiment with any other settings of the YAML section, e.g. use a different highlight option

Finally, publish your report on https://Rpubs.com (create an account) and send me a link to what you have done before next virtual lab on 15th May, Friday.

Enjoy exploring R Markdown!

