POLONIUS: What do you read, my lord?
HAMLET: Words, words, words.
W.Shakespeare, “Hamlet”

Inaugural address

President Trump’s Inaugural Address, January 2017

President Trump’s Inaugural Address, January 2017

Newly sworn-in presidents usually give a speech referred to as an inaugural address. As with many inaugural customs, this one was started by George Washington in 1789. Every President since Washington has delivered an Inaugural address. With the 2017 inauguration of Donald Trump, the oath has been taken 75 different times by 44 persons.

See https://en.wikipedia.org/wiki/United_States_presidential_inauguration#Inaugural_address

Problem

We want to compare US president’s inaugural address speech in terms of Words (number of words), Seconds (timeliness in seconds) and WPM (words per minute). The contents of speech is out of scope for this paper.

Data

For this purpose we use open source data https://www.statslife.org.uk/politics/3156-slow-talking-the-inaugural. Also we use some additional data particularly corresponding both to the person of the president (Party, Height, Old) and situation in economics (Inflation, Unemployment, GDP): https://en.wikipedia.org/wiki/Heights_of_presidents_and_presidential_candidates_of_the_United_States, http://www.multpl.com/inflation/table, http://www.multpl.com/unemployment/table, http://www.multpl.com/us-gdp-growth-rate/table/by-year.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(FactoMineR)
library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
library(DT)
library(scatterplot3d)
library(popbio)
speech<-read.table(file="speech.txt")
dim(speech)
## [1] 15 10
#datatable(speech)
speech
##          Year Words Seconds WPM Party Old Height Inflation Unemployment
## Trump    2017  1444     980  88   Rep  70    184      2.07          4.7
## Obama2   2013  2096    1121 112   Dem  52    185      1.59          8.0
## Obama1   2009  2395    1119 128   Dem  48    185      0.03          7.8
## Bush-2   2005  2071    1268  98   Rep  59    182      2.97          5.3
## Bush-1   2001  1592     866 110   Rep  55    182      3.73          4.2
## Clinton2 1997  2155    1315  98   Dem  51    188      3.04          5.3
## Clinton1 1993  1598     840 114   Dem  47    188      3.26          7.3
## Bush     1989  2320    1220 114   Rep  65    188      4.67          5.4
## Reagan2  1985  2561    1296 119   Rep  74    185      3.53          7.5
## Reagan1  1981  2427    1191 122   Rep  70    185     11.83          7.5
## Carter   1977  1229     866  85   Dem  53    177      5.22          7.5
## Nixon2   1973  1803     968 112   Rep  60    182      3.65          4.9
## Nixon1   1969  2128    1035 123   Rep  56    182      4.40          3.4
## Johnson  1965  1507     671 135   Dem  55    192      0.97          4.9
## Kennedy  1961  1366     841  97   Dem  44    183      1.71          6.6
##          GDPgr
## Trump     2.94
## Obama2    4.31
## Obama1    0.11
## Bush-2    6.52
## Bush-1    2.19
## Clinton2  6.05
## Clinton1  5.00
## Bush      6.48
## Reagan2   7.37
## Reagan1   9.69
## Carter   11.88
## Nixon2   11.04
## Nixon1    7.28
## Johnson  10.70
## Kennedy   7.48

Some interesting facts

We use descriptive statistics to produce some interesting facts from our data.

summary(speech[,c(1:10)])
##       Year          Words         Seconds          WPM        Party  
##  Min.   :1961   Min.   :1229   Min.   : 671   Min.   : 85.0   Dem:7  
##  1st Qu.:1975   1st Qu.:1550   1st Qu.: 866   1st Qu.: 98.0   Rep:8  
##  Median :1989   Median :2071   Median :1035   Median :112.0          
##  Mean   :1989   Mean   :1913   Mean   :1040   Mean   :110.3          
##  3rd Qu.:2003   3rd Qu.:2238   3rd Qu.:1206   3rd Qu.:120.5          
##  Max.   :2017   Max.   :2561   Max.   :1315   Max.   :135.0          
##       Old            Height        Inflation       Unemployment 
##  Min.   :44.00   Min.   :177.0   Min.   : 0.030   Min.   :3.40  
##  1st Qu.:51.50   1st Qu.:182.0   1st Qu.: 1.890   1st Qu.:4.90  
##  Median :55.00   Median :185.0   Median : 3.260   Median :5.40  
##  Mean   :57.27   Mean   :184.5   Mean   : 3.511   Mean   :6.02  
##  3rd Qu.:62.50   3rd Qu.:186.5   3rd Qu.: 4.065   3rd Qu.:7.50  
##  Max.   :74.00   Max.   :192.0   Max.   :11.830   Max.   :8.00  
##      GDPgr       
##  Min.   : 0.110  
##  1st Qu.: 4.655  
##  Median : 6.520  
##  Mean   : 6.603  
##  3rd Qu.: 8.585  
##  Max.   :11.880
#Seconds by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Seconds), Median=median(Seconds),SD=sd(Seconds), Q10=quantile(Seconds,0.1),Q90=quantile(Seconds,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party      Mean Median       SD   Q10    Q90
##   <fctr>     <dbl>  <dbl>    <dbl> <dbl>  <dbl>
## 1    Dem  967.5714    866 222.7643 772.4 1198.6
## 2    Rep 1103.0000   1113 160.3808 937.4 1276.4
#Words by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Words), Median=median(Words),SD=sd(Words), Q10=quantile(Words,0.1),Q90=quantile(Words,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party     Mean Median       SD    Q10    Q90
##   <fctr>    <dbl>  <dbl>    <dbl>  <dbl>  <dbl>
## 1    Dem 1763.714 1598.0 447.1203 1311.2 2251.0
## 2    Rep 2043.250 2099.5 400.0570 1547.6 2467.2
#WPM by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(WPM), Median=median(WPM),SD=sd(WPM), Q10=quantile(WPM,0.1),Q90=quantile(WPM,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party     Mean Median       SD   Q10   Q90
##   <fctr>    <dbl>  <dbl>    <dbl> <dbl> <dbl>
## 1    Dem 109.8571    112 17.82721  92.2 130.8
## 2    Rep 110.7500    113 12.17433  95.0 122.3
#Height by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Height), Median=median(Height),SD=sd(Height), Q10=quantile(Height,0.1),Q90=quantile(Height,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party     Mean Median       SD   Q10   Q90
##   <fctr>    <dbl>  <dbl>    <dbl> <dbl> <dbl>
## 1    Dem 185.4286    185 4.720775 180.6 189.6
## 2    Rep 183.7500    183 2.187628 182.0 185.9
#Old by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Old), Median=median(Old),SD=sd(Old), Q10=quantile(Old,0.1),Q90=quantile(Old,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party   Mean Median       SD   Q10   Q90
##   <fctr>  <dbl>  <dbl>    <dbl> <dbl> <dbl>
## 1    Dem 50.000   51.0 3.829708  45.8  53.8
## 2    Rep 63.625   62.5 7.150175  55.7  71.2
#Inflation by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Inflation), Median=median(Inflation),SD=sd(Inflation), Q10=quantile(Inflation,0.1),Q90=quantile(Inflation,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party    Mean Median       SD   Q10   Q90
##   <fctr>   <dbl>  <dbl>    <dbl> <dbl> <dbl>
## 1    Dem 2.26000   1.71 1.720388 0.594 4.044
## 2    Rep 4.60625   3.69 3.027955 2.700 6.818
#Unemployment by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Unemployment), Median=median(Unemployment),SD=sd(Unemployment), Q10=quantile(Unemployment,0.1),Q90=quantile(Unemployment,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party     Mean Median       SD   Q10   Q90
##   <fctr>    <dbl>  <dbl>    <dbl> <dbl> <dbl>
## 1    Dem 6.771429    7.3 1.229789  5.14  7.88
## 2    Rep 5.362500    5.1 1.463789  3.96  7.50
#GDP grow by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(GDPgr), Median=median(GDPgr),SD=sd(GDPgr), Q10=quantile(GDPgr,0.1),Q90=quantile(GDPgr,0.9)) 
## # A tibble: 2 <U+00D7> 6
##    Party     Mean Median       SD   Q10    Q90
##   <fctr>    <dbl>  <dbl>    <dbl> <dbl>  <dbl>
## 1    Dem 6.504286   6.05 3.992188 2.630 11.172
## 2    Rep 6.688750   6.90 3.002820 2.715 10.095

Exploring data

attach(speech)
colset=c("blue", "red")
boxplot(Words~Party,main="Words by Party", col=colset)

boxplot(Seconds~Party,main="Seconds by Party", col=colset)

boxplot(WPM~Party,main="WPM by Party", col=colset)

boxplot(Old~Party, main="Old by Party", col=colset)

boxplot(Height~Party, main="Height by Party", col=colset)

boxplot(Inflation~Party, main="Inflation % by Party", col=colset)

boxplot(Unemployment~Party, main="Unemployment % by Party", col=colset)

boxplot(GDPgr~Party, main="GDP growth % by Party", col=colset)

Correlations

cor(speech[,c(2:4,6:10)])
##                   Words     Seconds         WPM          Old       Height
## Words         1.0000000  0.84038215  0.49656322  0.410262167  0.249142123
## Seconds       0.8403821  1.00000000 -0.03967570  0.433937090  0.010108136
## WPM           0.4965632 -0.03967570  1.00000000  0.055490010  0.558650865
## Old           0.4102622  0.43393709  0.05549001  1.000000000  0.006371528
## Height        0.2491421  0.01010814  0.55865086  0.006371528  1.000000000
## Inflation     0.2591305  0.24977710  0.01814621  0.482780014 -0.201685405
## Unemployment  0.2078670  0.16964941  0.02363594 -0.115545547 -0.011496590
## GDPgr        -0.2021760 -0.19841851 -0.02838487  0.174926696 -0.120933212
##                Inflation Unemployment       GDPgr
## Words         0.25913046   0.20786703 -0.20217598
## Seconds       0.24977710   0.16964941 -0.19841851
## WPM           0.01814621   0.02363594 -0.02838487
## Old           0.48278001  -0.11554555  0.17492670
## Height       -0.20168541  -0.01149659 -0.12093321
## Inflation     1.00000000   0.09468571  0.43749992
## Unemployment  0.09468571   1.00000000 -0.02909618
## GDPgr         0.43749992  -0.02909618  1.00000000

Student test

t.test(Words~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Words by Party
## t = -1.2685, df = 12.213, p-value = 0.2283
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -758.7637  199.6923
## sample estimates:
## mean in group Dem mean in group Rep 
##          1763.714          2043.250
t.test(Seconds~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Seconds by Party
## t = -1.3341, df = 10.777, p-value = 0.2097
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -359.41795   88.56081
## sample estimates:
## mean in group Dem mean in group Rep 
##          967.5714         1103.0000
t.test(WPM~Party)
## 
##  Welch Two Sample t-test
## 
## data:  WPM by Party
## t = -0.11167, df = 10.41, p-value = 0.9132
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.61329  16.82758
## sample estimates:
## mean in group Dem mean in group Rep 
##          109.8571          110.7500
t.test(Old~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Old by Party
## t = -4.6772, df = 10.967, p-value = 0.00068
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -20.038926  -7.211074
## sample estimates:
## mean in group Dem mean in group Rep 
##            50.000            63.625
t.test(Height~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Height by Party
## t = 0.86315, df = 8.2179, p-value = 0.4125
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.785314  6.142457
## sample estimates:
## mean in group Dem mean in group Rep 
##          185.4286          183.7500
t.test(Inflation~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Inflation by Party
## t = -1.8732, df = 11.32, p-value = 0.08708
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0936108  0.4011108
## sample estimates:
## mean in group Dem mean in group Rep 
##           2.26000           4.60625
t.test(Unemployment~Party)
## 
##  Welch Two Sample t-test
## 
## data:  Unemployment by Party
## t = 2.0254, df = 12.988, p-value = 0.06388
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.09400938  2.91186652
## sample estimates:
## mean in group Dem mean in group Rep 
##          6.771429          5.362500
t.test(GDPgr~Party)
## 
##  Welch Two Sample t-test
## 
## data:  GDPgr by Party
## t = -0.099982, df = 11.083, p-value = 0.9221
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.241511  3.872583
## sample estimates:
## mean in group Dem mean in group Rep 
##          6.504286          6.688750

Principal component analysis

set.seed(12345)
res<-PCA(speech,quali.sup =5, quanti.sup = 1, ncp = 5)

res$var$coord
##                    Dim.1       Dim.2      Dim.3        Dim.4       Dim.5
## Words         0.94151251 -0.20406885 -0.1300651  0.009929855 -0.16632064
## Seconds       0.81838708  0.08246410 -0.4629521 -0.123140342  0.07717574
## WPM           0.42363123 -0.57016080  0.5646760  0.139983559 -0.34987874
## Old           0.64446651  0.41999452  0.1754459 -0.345258106  0.28448891
## Height        0.26244745 -0.69022891  0.4459318 -0.003708254  0.43679959
## Inflation     0.47561795  0.66945373  0.2796611  0.176354852 -0.18406167
## Unemployment  0.21008583 -0.04389426 -0.3198235  0.885599834  0.18080476
## GDPgr        -0.05477596  0.59373663  0.6171776  0.261483740  0.12357795
res$ind$contrib
##               Dim.1       Dim.2      Dim.3        Dim.4        Dim.5
## Trump     2.3496014  1.39010729  3.5494127 22.295920911 1.924570e+01
## Obama2    0.2021360  4.13297162  6.3604455  7.129746579 2.609333e-01
## Obama1    1.1729184 25.69236792 11.3974220  4.315359858 9.986921e+00
## Bush-2    0.5201044  1.92389068  6.3381488  4.120109603 7.699236e-03
## Bush-1    4.2467539  0.07957458  0.2849776  9.804371212 1.404437e+01
## Clinton2  1.0172532  0.96204691  3.0917220  1.988001246 5.920746e+00
## Clinton1  3.4152103  5.53848152  0.4791077  8.880348667 7.812825e-01
## Bush      8.1794799  0.17896445  0.6726308  2.953573763 3.105363e+00
## Reagan2  19.8439092  0.12864491  0.2434394  0.358717769 5.128672e+00
## Reagan1  24.8629775 12.53969644  7.3531308  9.518984957 1.527206e+00
## Carter   13.1913734 31.09914389  1.2932203 13.266580768 4.405621e-04
## Nixon2    0.5378265  4.32771246  4.5336978  0.819897135 1.480411e+00
## Nixon1    0.1177470  0.12952743  3.0225470 10.452722730 3.281057e+01
## Johnson   4.4803157 11.86624356 50.4365985  0.005399632 5.554535e+00
## Kennedy  15.8623932  0.01062634  0.9434990  4.090265170 1.451563e-01
sort(res$ind$dist)
##   Bush-2   Nixon2   Obama2     Bush Clinton2   Nixon1 Clinton1   Bush-1 
## 1.794310 1.823314 1.917294 2.023706 2.242873 2.252837 2.273451 2.335375 
##  Kennedy    Trump  Reagan2   Obama1  Johnson   Carter  Reagan1 
## 2.632792 2.908462 3.066472 3.392481 3.959544 3.988866 4.130896

Factor analysis

resF=FAMD(speech[,-1],ncp = 3)

resF
## *The results are available in the following objects:
## 
##   name          description                             
## 1 "$eig"        "eigenvalues and inertia"               
## 2 "$var"        "Results for the variables"             
## 3 "$ind"        "results for the individuals"           
## 4 "$quali.var"  "Results for the qualitative variables" 
## 5 "$quanti.var" "Results for the quantitative variables"
3D view

3D view

K means clustering

set.seed(12345)
kmeans(speech[,c(2:4,6:10)],centers = 5,iter.max = 3000,nstart = 30)
## K-means clustering with 5 clusters of sizes 3, 1, 4, 3, 4
## 
## Cluster means:
##      Words   Seconds      WPM      Old   Height Inflation Unemployment
## 1 1346.333  895.6667  90.0000 55.66667 181.3333  3.000000     6.266667
## 2 1803.000  968.0000 112.0000 60.00000 182.0000  3.650000     4.900000
## 3 2425.750 1206.5000 120.7500 64.25000 185.7500  5.015000     7.050000
## 4 1565.667  792.3333 119.6667 52.33333 187.3333  2.653333     5.466667
## 5 2112.500 1184.7500 107.7500 54.50000 184.2500  3.000000     5.500000
##       GDPgr
## 1  7.433333
## 2 11.040000
## 3  5.912500
## 4  5.963333
## 5  6.040000
## 
## Clustering vector:
##    Trump   Obama2   Obama1   Bush-2   Bush-1 Clinton2 Clinton1     Bush 
##        1        5        3        5        4        5        4        3 
##  Reagan2  Reagan1   Carter   Nixon2   Nixon1  Johnson  Kennedy 
##        3        3        1        2        5        4        1 
## 
## Within cluster sum of squares by cluster:
## [1] 35180.18     0.00 47141.69 28102.58 54951.70
##  (between_SS / total_SS =  94.8 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"

Linear regression model

fit.sp<-lm(Words~Height + Seconds:Party,data = speech)
anova(fit.sp,test="Chsqr")
## Analysis of Variance Table
## 
## Response: Words
##               Df  Sum Sq Mean Sq F value    Pr(>F)    
## Height         1  162103  162103  3.0584 0.1081307    
## Seconds:Party  2 1866416  933208 17.6070 0.0003727 ***
## Residuals     11  583023   53002                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit.sp)
## 
## Call:
## lm(formula = Words ~ Height + Seconds:Party, data = speech)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -387.59  -95.24   29.94  107.84  389.08 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -5871.7701  3261.8991  -1.800 0.099298 .  
## Height              32.2908    17.7173   1.823 0.095641 .  
## Seconds:PartyDem     1.7014     0.3530   4.820 0.000536 ***
## Seconds:PartyRep     1.7978     0.3145   5.716 0.000135 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 230.2 on 11 degrees of freedom
## Multiple R-squared:  0.7768, Adjusted R-squared:  0.7159 
## F-statistic: 12.76 on 3 and 11 DF,  p-value: 0.0006658
shapiro.test(fit.sp$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit.sp$residuals
## W = 0.99142, p-value = 0.9998
par(mfrow=c(2,2))
plot(fit.sp)

leveragePlots(fit.sp)

par(mfrow=c(1,1))

fit.sp<-lm(Words~Height + Seconds,data = speech)
s3d<-scatterplot3d(Height,  Seconds, Words, color = ifelse(speech$Party=="Dem","blue","red"),pch = 8,main = "Inauguration speech 3D plot",xlab = "Height, cm", ylab = "Seconds", zlab = "Words")
s3d$plane3d(fit.sp,draw_lines = T,draw_polygon = T)

fit.sp_2<-lm(Height~Unemployment+Words+Party+Old,data = speech)
anova(fit.sp_2,test="Chsqr")
## Analysis of Variance Table
## 
## Response: Height
##              Df Sum Sq Mean Sq F value  Pr(>F)  
## Unemployment  1  0.023   0.023  0.0025 0.96106  
## Words         1 11.753  11.753  1.2538 0.28900  
## Party         1 41.374  41.374  4.4139 0.06198 .
## Old           1 30.846  30.846  3.2907 0.09974 .
## Residuals    10 93.737   9.374                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit.sp_2)
## 
## Call:
## lm(formula = Height ~ Unemployment + Words + Party + Old, data = speech)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7996 -0.7749 -0.3657  1.0318  5.0607 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  173.757726   7.552651  23.006 5.44e-10 ***
## Unemployment  -1.633386   0.771201  -2.118   0.0602 .  
## Words          0.004376   0.002252   1.943   0.0807 .  
## PartyRep      -9.294306   3.412363  -2.724   0.0214 *  
## Old            0.300276   0.165529   1.814   0.0997 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.062 on 10 degrees of freedom
## Multiple R-squared:  0.4726, Adjusted R-squared:  0.2616 
## F-statistic:  2.24 on 4 and 10 DF,  p-value: 0.1372
shapiro.test(fit.sp_2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit.sp_2$residuals
## W = 0.96312, p-value = 0.7464
par(mfrow=c(2,2))
plot(fit.sp_2)

leveragePlots(fit.sp_2)

par(mfrow=c(1,1))

fit.sp_2<-lm(Height~Unemployment+Words,data = speech)
s3d2<-scatterplot3d(Unemployment, Words, Height,  color = ifelse(speech$Party=="Dem","blue","red"),pch = 8,main = "Inauguration speech 3D plot",xlab = "Unemployment, %", ylab = "Words", zlab = "Height, cm")
s3d2$plane3d(fit.sp_2,draw_lines = T,draw_polygon = T)

Naive Bayes Classifier

library(e1071)
model <- naiveBayes(Party ~ Inflation + Unemployment, data = speech)
pred_party<-predict(model,speech)
table(pred_party,speech$Party)
##           
## pred_party Dem Rep
##        Dem   5   1
##        Rep   2   7
prd<-predict(model, speech[1:15,], type = "raw")
prd<-as.data.frame(prd)
PartyNB<-ifelse(prd$Dem>0.5,"Dem","Rep")
cbind(prd,PartyNB,speech[,c(5,8,9)])
##                   Dem       Rep PartyNB Party Inflation Unemployment
## Trump    0.4096401778 0.5903598     Rep   Rep      2.07          4.7
## Obama2   0.8957235090 0.1042765     Dem   Dem      1.59          8.0
## Obama1   0.8748612001 0.1251388     Dem   Dem      0.03          7.8
## Bush-2   0.4879916857 0.5120083     Rep   Rep      2.97          5.3
## Bush-1   0.1696839427 0.8303161     Rep   Rep      3.73          4.2
## Clinton2 0.4805379399 0.5194621     Rep   Dem      3.04          5.3
## Clinton1 0.7891007461 0.2108993     Dem   Dem      3.26          7.3
## Bush     0.2696398318 0.7303602     Rep   Rep      4.67          5.4
## Reagan2  0.7837005443 0.2162995     Dem   Rep      3.53          7.5
## Reagan1  0.0000146743 0.9999853     Rep   Rep     11.83          7.5
## Carter   0.5092684858 0.4907315     Dem   Dem      5.22          7.5
## Nixon2   0.3146474276 0.6853526     Rep   Rep      3.65          4.9
## Nixon1   0.0463343819 0.9536656     Rep   Rep      4.40          3.4
## Johnson  0.4845045930 0.5154954     Rep   Dem      0.97          4.9
## Kennedy  0.7957566867 0.2042433     Dem   Dem      1.71          6.6
sum(ifelse(PartyNB==speech$Party,1,0))/length(speech$Party)
## [1] 0.8

Support vector machine

plot(Inflation,Unemployment,col=colset,pch=8)

model.svm<-svm(Party~Inflation+Unemployment,data = speech,kernel="linear",type="C-classification",scale=T,probability=T)
pred_svm<-predict(model.svm, speech, probability=T)
cbind(attr(pred_svm,which = "probabilities"),speech[,c(5,8,9)])
##                Rep       Dem Party Inflation Unemployment
## Trump    0.5758109 0.4241891   Rep      2.07          4.7
## Obama2   0.3595821 0.6404179   Dem      1.59          8.0
## Obama1   0.3376480 0.6623520   Dem      0.03          7.8
## Bush-2   0.5619694 0.4380306   Rep      2.97          5.3
## Bush-1   0.6438054 0.3561946   Rep      3.73          4.2
## Clinton2 0.5637247 0.4362753   Dem      3.04          5.3
## Clinton1 0.4456091 0.5543909   Dem      3.26          7.3
## Bush     0.5979102 0.4020898   Rep      4.67          5.4
## Reagan2  0.4401330 0.5598670   Rep      3.53          7.5
## Reagan1  0.6459998 0.3540002   Rep     11.83          7.5
## Carter   0.4833000 0.5167000   Dem      5.22          7.5
## Nixon2   0.6025424 0.3974576   Rep      3.65          4.9
## Nixon1   0.7044984 0.2955016   Rep      4.40          3.4
## Johnson  0.5359028 0.4640972   Dem      0.97          4.9
## Kennedy  0.4493649 0.5506351   Dem      1.71          6.6
table(pred_svm,Party)
##         Party
## pred_svm Dem Rep
##      Dem   5   1
##      Rep   2   7
sum(ifelse(pred_svm==speech$Party,1,0))/length(speech$Party)
## [1] 0.8
plot(Party~Inflation,speech,col=colset)

plot(Party~Unemployment,speech,col=colset)

plot(Party~Old,speech,col=colset)

plot(model.svm,speech,Inflation~Unemployment,col=colset)

scatterplot3d(Unemployment, Inflation, Old,  color = ifelse(Party=="Dem","blue","red"),pch = 8,main = "Parties domain 3D plot",xlab = "Unemployment, %", ylab = "Inflation, %", zlab = "Old",type = "h")

3D view

3D view

Logistic regression

Here we use additional data for elected US presidents (1945-2017) without Words, Seconds and WPM values. We want to estimate probability of the Party as a function of the years Old.

uspres<-read.table(file="uspres.txt")
uspres$Pind<-ifelse(uspres$Party=="Rep",1,0)
uspres
##             Year Party Old Height  Infl Unempl GDPgr Pind
## Trump       2017   Rep  70    184  2.07    4.7  2.94    1
## Obama2      2013   Dem  52    185  1.59    8.0  4.31    0
## Obama1      2009   Dem  48    185  0.03    7.8  0.11    0
## Bush-2      2005   Rep  59    182  2.97    5.3  6.52    1
## Bush-1      2001   Rep  55    182  3.73    4.2  2.19    1
## Clinton2    1997   Dem  51    188  3.04    5.3  6.05    0
## Clinton1    1993   Dem  47    188  3.26    7.3  5.00    0
## Bush        1989   Rep  65    188  4.67    5.4  6.48    1
## Reagan2     1985   Rep  74    185  3.53    7.5  7.37    1
## Reagan1     1981   Rep  70    185 11.83    7.5  9.69    1
## Carter      1977   Dem  53    177  5.22    7.5 11.88    0
## Ford        1974   Rep  61    183  9.39    5.1  8.38    1
## Nixon2      1973   Rep  60    182  3.65    4.9 11.04    1
## Nixon1      1969   Rep  56    182  4.40    3.4  7.28    1
## Johnson     1965   Dem  55    192  0.97    4.9 10.70    0
## Kennedy     1961   Dem  44    183  1.71    6.6  7.48    0
## Eisenhower2 1957   Rep  67    179  2.99    4.2  3.12    1
## Eisenhower1 1953   Rep  63    179  0.38    2.9  1.39    1
## Truman2     1949   Dem  65    175  1.27    4.3 -3.46    0
## Truman1     1945   Dem  61    175  2.30    3.0 76.35    0
fit.lg<-glm(Pind~Old,data = uspres,family = binomial(link = "logit"))
summary(fit.lg)
## 
## Call:
## glm(formula = Pind ~ Old, family = binomial(link = "logit"), 
##     data = uspres)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0746  -0.5972   0.2077   0.6781   1.4569  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -15.2944     6.6288  -2.307   0.0210 *
## Old           0.2665     0.1143   2.332   0.0197 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 27.526  on 19  degrees of freedom
## Residual deviance: 16.695  on 18  degrees of freedom
## AIC: 20.695
## 
## Number of Fisher Scoring iterations: 5
anova(fit.lg)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Pind
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev
## NULL                    19     27.526
## Old   1   10.831        18     16.695
logi.hist.plot(uspres$Old,uspres$Pind,xlabel = "Old",boxp = F,mainlabel = "US president Republican versus Democrat")
points(uspres$Old,fitted(fit.lg),pch=8)
grid(col = "black")

Conclusions

So what’s this all math about? Here are some results in brief (more results with machine learning will be later):

  1. The mean Republican’s presidents speech is more longer in seconds as compared to the Democrat’s $T_{Rep}=1103, $ \(T_{Dem}=967.6\) while the overall mean president inauguration speech duration \(T_{all}=1040\). D.Trump spoke for a long as 980 s that is closer to the Democrats than to the Republicans. So he is a real business man not a politician one so far. Remember 1997 Bill Clinton’s speech who made the record of 1315 s being not a Republican! And what about number of Words?

  2. The mean Republican’s presidents speech is more longer in words as compared to the Democrat’s \(W_{Rep}=2043\), \(W_{Dem}=1764\) while the overall mean president inauguration speech duration \(W_{all}=1912\). D.Trump told the crowd 1444 words that is closer to the Democrats than to the Republicans. We don’t discuss here the contents of his speech. Remember 1985 Ronald Reagan’s speech who made the record of 2561 words being a Republican! And what about Words per minute?

  3. The mean Republican’s presidents speech tempo WPM as compared to the Democrat’s \(WPM_{Rep}=110.75\), \(WPM_{Dem}=109.85\). D.Trump speech tempo is 88 that is one of the lowest. Remember 1965 JFK’s speech who made the record of WPM=135 and time of speech T=671 s being a Democrat! And what about years Old?

  4. The mean Republican’s president as compared to the Democrat’s \(OLD_{Rep}=63.6\), \(OLD_{Dem}=50\). D.Trump is 70 that is one of the oldest. Remember 1985 when Ronald Reagan had been elected being 74 and 1961 when JFK had been elected being 44! And what about Height?

  5. The mean Republican’s president Height compared to the Democrat’s \(H_{Rep}=183.75\), \(H_{Dem}=185.42\). D.Trump is 184 cm that is he stands in the middle. Remember 1965 Lyndon Johnson 192 cm and 1977 Jimmy Carter, both Democrats! And what about Inflation %?

  6. The mean Republican’s president Inflation rate in % at election compared to the Democrat’s \(Infl_{Rep}=4.61\), \(Infl_{Dem}=2.26\). D.Trump has got 2.07% Inflation from Democrat B.Obama. Remember 1981 Ronald Reagan 11.83% and 2009 B.Obama 0.03%! And what about Unemployment %?

  7. The mean Republican’s president Unemployment in % at election compared to the Democrat’s \(Unempl_{Rep}=5.36\), \(Unempl_{Dem}=6.77\). D.Trump has got 4.7% Unemployment from Democrat B.Obama. Remember 2013 B.Obama 8.0% and 1969 R.Nixon 3.4%! And what about GDP growth rate in %?

  8. The mean Republican’s president GDP growth in % at election compared to the Democrat’s \(GDPgr_{Rep}=6.68\), \(GDPgr_{Dem}=6.50\). D.Trump has got 2.94% GDP growth from Democrat B.Obama. Remember 1977 Jimmy Carter 11.88% and 2009 B.Obama 0.11%!