POLONIUS: What do you read, my lord?
HAMLET: Words, words, words.
W.Shakespeare, “Hamlet”
President Trump’s Inaugural Address, January 2017
Newly sworn-in presidents usually give a speech referred to as an inaugural address. As with many inaugural customs, this one was started by George Washington in 1789. Every President since Washington has delivered an Inaugural address. With the 2017 inauguration of Donald Trump, the oath has been taken 75 different times by 44 persons.
See https://en.wikipedia.org/wiki/United_States_presidential_inauguration#Inaugural_address
We want to compare US president’s inaugural address speech in terms of Words (number of words), Seconds (timeliness in seconds) and WPM (words per minute). The contents of speech is out of scope for this paper.
For this purpose we use open source data https://www.statslife.org.uk/politics/3156-slow-talking-the-inaugural. Also we use some additional data particularly corresponding both to the person of the president (Party, Height, Old) and situation in economics (Inflation, Unemployment, GDP): https://en.wikipedia.org/wiki/Heights_of_presidents_and_presidential_candidates_of_the_United_States, http://www.multpl.com/inflation/table, http://www.multpl.com/unemployment/table, http://www.multpl.com/us-gdp-growth-rate/table/by-year.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(FactoMineR)
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(DT)
library(scatterplot3d)
library(popbio)
speech<-read.table(file="speech.txt")
dim(speech)
## [1] 15 10
#datatable(speech)
speech
## Year Words Seconds WPM Party Old Height Inflation Unemployment
## Trump 2017 1444 980 88 Rep 70 184 2.07 4.7
## Obama2 2013 2096 1121 112 Dem 52 185 1.59 8.0
## Obama1 2009 2395 1119 128 Dem 48 185 0.03 7.8
## Bush-2 2005 2071 1268 98 Rep 59 182 2.97 5.3
## Bush-1 2001 1592 866 110 Rep 55 182 3.73 4.2
## Clinton2 1997 2155 1315 98 Dem 51 188 3.04 5.3
## Clinton1 1993 1598 840 114 Dem 47 188 3.26 7.3
## Bush 1989 2320 1220 114 Rep 65 188 4.67 5.4
## Reagan2 1985 2561 1296 119 Rep 74 185 3.53 7.5
## Reagan1 1981 2427 1191 122 Rep 70 185 11.83 7.5
## Carter 1977 1229 866 85 Dem 53 177 5.22 7.5
## Nixon2 1973 1803 968 112 Rep 60 182 3.65 4.9
## Nixon1 1969 2128 1035 123 Rep 56 182 4.40 3.4
## Johnson 1965 1507 671 135 Dem 55 192 0.97 4.9
## Kennedy 1961 1366 841 97 Dem 44 183 1.71 6.6
## GDPgr
## Trump 2.94
## Obama2 4.31
## Obama1 0.11
## Bush-2 6.52
## Bush-1 2.19
## Clinton2 6.05
## Clinton1 5.00
## Bush 6.48
## Reagan2 7.37
## Reagan1 9.69
## Carter 11.88
## Nixon2 11.04
## Nixon1 7.28
## Johnson 10.70
## Kennedy 7.48
We use descriptive statistics to produce some interesting facts from our data.
summary(speech[,c(1:10)])
## Year Words Seconds WPM Party
## Min. :1961 Min. :1229 Min. : 671 Min. : 85.0 Dem:7
## 1st Qu.:1975 1st Qu.:1550 1st Qu.: 866 1st Qu.: 98.0 Rep:8
## Median :1989 Median :2071 Median :1035 Median :112.0
## Mean :1989 Mean :1913 Mean :1040 Mean :110.3
## 3rd Qu.:2003 3rd Qu.:2238 3rd Qu.:1206 3rd Qu.:120.5
## Max. :2017 Max. :2561 Max. :1315 Max. :135.0
## Old Height Inflation Unemployment
## Min. :44.00 Min. :177.0 Min. : 0.030 Min. :3.40
## 1st Qu.:51.50 1st Qu.:182.0 1st Qu.: 1.890 1st Qu.:4.90
## Median :55.00 Median :185.0 Median : 3.260 Median :5.40
## Mean :57.27 Mean :184.5 Mean : 3.511 Mean :6.02
## 3rd Qu.:62.50 3rd Qu.:186.5 3rd Qu.: 4.065 3rd Qu.:7.50
## Max. :74.00 Max. :192.0 Max. :11.830 Max. :8.00
## GDPgr
## Min. : 0.110
## 1st Qu.: 4.655
## Median : 6.520
## Mean : 6.603
## 3rd Qu.: 8.585
## Max. :11.880
#Seconds by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Seconds), Median=median(Seconds),SD=sd(Seconds), Q10=quantile(Seconds,0.1),Q90=quantile(Seconds,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 967.5714 866 222.7643 772.4 1198.6
## 2 Rep 1103.0000 1113 160.3808 937.4 1276.4
#Words by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Words), Median=median(Words),SD=sd(Words), Q10=quantile(Words,0.1),Q90=quantile(Words,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 1763.714 1598.0 447.1203 1311.2 2251.0
## 2 Rep 2043.250 2099.5 400.0570 1547.6 2467.2
#WPM by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(WPM), Median=median(WPM),SD=sd(WPM), Q10=quantile(WPM,0.1),Q90=quantile(WPM,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 109.8571 112 17.82721 92.2 130.8
## 2 Rep 110.7500 113 12.17433 95.0 122.3
#Height by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Height), Median=median(Height),SD=sd(Height), Q10=quantile(Height,0.1),Q90=quantile(Height,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 185.4286 185 4.720775 180.6 189.6
## 2 Rep 183.7500 183 2.187628 182.0 185.9
#Old by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Old), Median=median(Old),SD=sd(Old), Q10=quantile(Old,0.1),Q90=quantile(Old,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 50.000 51.0 3.829708 45.8 53.8
## 2 Rep 63.625 62.5 7.150175 55.7 71.2
#Inflation by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Inflation), Median=median(Inflation),SD=sd(Inflation), Q10=quantile(Inflation,0.1),Q90=quantile(Inflation,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 2.26000 1.71 1.720388 0.594 4.044
## 2 Rep 4.60625 3.69 3.027955 2.700 6.818
#Unemployment by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(Unemployment), Median=median(Unemployment),SD=sd(Unemployment), Q10=quantile(Unemployment,0.1),Q90=quantile(Unemployment,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 6.771429 7.3 1.229789 5.14 7.88
## 2 Rep 5.362500 5.1 1.463789 3.96 7.50
#GDP grow by Party
filter(speech) %>% group_by(Party) %>% summarise(Mean=mean(GDPgr), Median=median(GDPgr),SD=sd(GDPgr), Q10=quantile(GDPgr,0.1),Q90=quantile(GDPgr,0.9))
## # A tibble: 2 <U+00D7> 6
## Party Mean Median SD Q10 Q90
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dem 6.504286 6.05 3.992188 2.630 11.172
## 2 Rep 6.688750 6.90 3.002820 2.715 10.095
attach(speech)
colset=c("blue", "red")
boxplot(Words~Party,main="Words by Party", col=colset)
boxplot(Seconds~Party,main="Seconds by Party", col=colset)
boxplot(WPM~Party,main="WPM by Party", col=colset)
boxplot(Old~Party, main="Old by Party", col=colset)
boxplot(Height~Party, main="Height by Party", col=colset)
boxplot(Inflation~Party, main="Inflation % by Party", col=colset)
boxplot(Unemployment~Party, main="Unemployment % by Party", col=colset)
boxplot(GDPgr~Party, main="GDP growth % by Party", col=colset)
cor(speech[,c(2:4,6:10)])
## Words Seconds WPM Old Height
## Words 1.0000000 0.84038215 0.49656322 0.410262167 0.249142123
## Seconds 0.8403821 1.00000000 -0.03967570 0.433937090 0.010108136
## WPM 0.4965632 -0.03967570 1.00000000 0.055490010 0.558650865
## Old 0.4102622 0.43393709 0.05549001 1.000000000 0.006371528
## Height 0.2491421 0.01010814 0.55865086 0.006371528 1.000000000
## Inflation 0.2591305 0.24977710 0.01814621 0.482780014 -0.201685405
## Unemployment 0.2078670 0.16964941 0.02363594 -0.115545547 -0.011496590
## GDPgr -0.2021760 -0.19841851 -0.02838487 0.174926696 -0.120933212
## Inflation Unemployment GDPgr
## Words 0.25913046 0.20786703 -0.20217598
## Seconds 0.24977710 0.16964941 -0.19841851
## WPM 0.01814621 0.02363594 -0.02838487
## Old 0.48278001 -0.11554555 0.17492670
## Height -0.20168541 -0.01149659 -0.12093321
## Inflation 1.00000000 0.09468571 0.43749992
## Unemployment 0.09468571 1.00000000 -0.02909618
## GDPgr 0.43749992 -0.02909618 1.00000000
t.test(Words~Party)
##
## Welch Two Sample t-test
##
## data: Words by Party
## t = -1.2685, df = 12.213, p-value = 0.2283
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -758.7637 199.6923
## sample estimates:
## mean in group Dem mean in group Rep
## 1763.714 2043.250
t.test(Seconds~Party)
##
## Welch Two Sample t-test
##
## data: Seconds by Party
## t = -1.3341, df = 10.777, p-value = 0.2097
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -359.41795 88.56081
## sample estimates:
## mean in group Dem mean in group Rep
## 967.5714 1103.0000
t.test(WPM~Party)
##
## Welch Two Sample t-test
##
## data: WPM by Party
## t = -0.11167, df = 10.41, p-value = 0.9132
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.61329 16.82758
## sample estimates:
## mean in group Dem mean in group Rep
## 109.8571 110.7500
t.test(Old~Party)
##
## Welch Two Sample t-test
##
## data: Old by Party
## t = -4.6772, df = 10.967, p-value = 0.00068
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -20.038926 -7.211074
## sample estimates:
## mean in group Dem mean in group Rep
## 50.000 63.625
t.test(Height~Party)
##
## Welch Two Sample t-test
##
## data: Height by Party
## t = 0.86315, df = 8.2179, p-value = 0.4125
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.785314 6.142457
## sample estimates:
## mean in group Dem mean in group Rep
## 185.4286 183.7500
t.test(Inflation~Party)
##
## Welch Two Sample t-test
##
## data: Inflation by Party
## t = -1.8732, df = 11.32, p-value = 0.08708
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.0936108 0.4011108
## sample estimates:
## mean in group Dem mean in group Rep
## 2.26000 4.60625
t.test(Unemployment~Party)
##
## Welch Two Sample t-test
##
## data: Unemployment by Party
## t = 2.0254, df = 12.988, p-value = 0.06388
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09400938 2.91186652
## sample estimates:
## mean in group Dem mean in group Rep
## 6.771429 5.362500
t.test(GDPgr~Party)
##
## Welch Two Sample t-test
##
## data: GDPgr by Party
## t = -0.099982, df = 11.083, p-value = 0.9221
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.241511 3.872583
## sample estimates:
## mean in group Dem mean in group Rep
## 6.504286 6.688750
set.seed(12345)
res<-PCA(speech,quali.sup =5, quanti.sup = 1, ncp = 5)
res$var$coord
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## Words 0.94151251 -0.20406885 -0.1300651 0.009929855 -0.16632064
## Seconds 0.81838708 0.08246410 -0.4629521 -0.123140342 0.07717574
## WPM 0.42363123 -0.57016080 0.5646760 0.139983559 -0.34987874
## Old 0.64446651 0.41999452 0.1754459 -0.345258106 0.28448891
## Height 0.26244745 -0.69022891 0.4459318 -0.003708254 0.43679959
## Inflation 0.47561795 0.66945373 0.2796611 0.176354852 -0.18406167
## Unemployment 0.21008583 -0.04389426 -0.3198235 0.885599834 0.18080476
## GDPgr -0.05477596 0.59373663 0.6171776 0.261483740 0.12357795
res$ind$contrib
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## Trump 2.3496014 1.39010729 3.5494127 22.295920911 1.924570e+01
## Obama2 0.2021360 4.13297162 6.3604455 7.129746579 2.609333e-01
## Obama1 1.1729184 25.69236792 11.3974220 4.315359858 9.986921e+00
## Bush-2 0.5201044 1.92389068 6.3381488 4.120109603 7.699236e-03
## Bush-1 4.2467539 0.07957458 0.2849776 9.804371212 1.404437e+01
## Clinton2 1.0172532 0.96204691 3.0917220 1.988001246 5.920746e+00
## Clinton1 3.4152103 5.53848152 0.4791077 8.880348667 7.812825e-01
## Bush 8.1794799 0.17896445 0.6726308 2.953573763 3.105363e+00
## Reagan2 19.8439092 0.12864491 0.2434394 0.358717769 5.128672e+00
## Reagan1 24.8629775 12.53969644 7.3531308 9.518984957 1.527206e+00
## Carter 13.1913734 31.09914389 1.2932203 13.266580768 4.405621e-04
## Nixon2 0.5378265 4.32771246 4.5336978 0.819897135 1.480411e+00
## Nixon1 0.1177470 0.12952743 3.0225470 10.452722730 3.281057e+01
## Johnson 4.4803157 11.86624356 50.4365985 0.005399632 5.554535e+00
## Kennedy 15.8623932 0.01062634 0.9434990 4.090265170 1.451563e-01
sort(res$ind$dist)
## Bush-2 Nixon2 Obama2 Bush Clinton2 Nixon1 Clinton1 Bush-1
## 1.794310 1.823314 1.917294 2.023706 2.242873 2.252837 2.273451 2.335375
## Kennedy Trump Reagan2 Obama1 Johnson Carter Reagan1
## 2.632792 2.908462 3.066472 3.392481 3.959544 3.988866 4.130896
resF=FAMD(speech[,-1],ncp = 3)
resF
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues and inertia"
## 2 "$var" "Results for the variables"
## 3 "$ind" "results for the individuals"
## 4 "$quali.var" "Results for the qualitative variables"
## 5 "$quanti.var" "Results for the quantitative variables"
3D view
set.seed(12345)
kmeans(speech[,c(2:4,6:10)],centers = 5,iter.max = 3000,nstart = 30)
## K-means clustering with 5 clusters of sizes 3, 1, 4, 3, 4
##
## Cluster means:
## Words Seconds WPM Old Height Inflation Unemployment
## 1 1346.333 895.6667 90.0000 55.66667 181.3333 3.000000 6.266667
## 2 1803.000 968.0000 112.0000 60.00000 182.0000 3.650000 4.900000
## 3 2425.750 1206.5000 120.7500 64.25000 185.7500 5.015000 7.050000
## 4 1565.667 792.3333 119.6667 52.33333 187.3333 2.653333 5.466667
## 5 2112.500 1184.7500 107.7500 54.50000 184.2500 3.000000 5.500000
## GDPgr
## 1 7.433333
## 2 11.040000
## 3 5.912500
## 4 5.963333
## 5 6.040000
##
## Clustering vector:
## Trump Obama2 Obama1 Bush-2 Bush-1 Clinton2 Clinton1 Bush
## 1 5 3 5 4 5 4 3
## Reagan2 Reagan1 Carter Nixon2 Nixon1 Johnson Kennedy
## 3 3 1 2 5 4 1
##
## Within cluster sum of squares by cluster:
## [1] 35180.18 0.00 47141.69 28102.58 54951.70
## (between_SS / total_SS = 94.8 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"
fit.sp<-lm(Words~Height + Seconds:Party,data = speech)
anova(fit.sp,test="Chsqr")
## Analysis of Variance Table
##
## Response: Words
## Df Sum Sq Mean Sq F value Pr(>F)
## Height 1 162103 162103 3.0584 0.1081307
## Seconds:Party 2 1866416 933208 17.6070 0.0003727 ***
## Residuals 11 583023 53002
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit.sp)
##
## Call:
## lm(formula = Words ~ Height + Seconds:Party, data = speech)
##
## Residuals:
## Min 1Q Median 3Q Max
## -387.59 -95.24 29.94 107.84 389.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5871.7701 3261.8991 -1.800 0.099298 .
## Height 32.2908 17.7173 1.823 0.095641 .
## Seconds:PartyDem 1.7014 0.3530 4.820 0.000536 ***
## Seconds:PartyRep 1.7978 0.3145 5.716 0.000135 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 230.2 on 11 degrees of freedom
## Multiple R-squared: 0.7768, Adjusted R-squared: 0.7159
## F-statistic: 12.76 on 3 and 11 DF, p-value: 0.0006658
shapiro.test(fit.sp$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit.sp$residuals
## W = 0.99142, p-value = 0.9998
par(mfrow=c(2,2))
plot(fit.sp)
leveragePlots(fit.sp)
par(mfrow=c(1,1))
fit.sp<-lm(Words~Height + Seconds,data = speech)
s3d<-scatterplot3d(Height, Seconds, Words, color = ifelse(speech$Party=="Dem","blue","red"),pch = 8,main = "Inauguration speech 3D plot",xlab = "Height, cm", ylab = "Seconds", zlab = "Words")
s3d$plane3d(fit.sp,draw_lines = T,draw_polygon = T)
fit.sp_2<-lm(Height~Unemployment+Words+Party+Old,data = speech)
anova(fit.sp_2,test="Chsqr")
## Analysis of Variance Table
##
## Response: Height
## Df Sum Sq Mean Sq F value Pr(>F)
## Unemployment 1 0.023 0.023 0.0025 0.96106
## Words 1 11.753 11.753 1.2538 0.28900
## Party 1 41.374 41.374 4.4139 0.06198 .
## Old 1 30.846 30.846 3.2907 0.09974 .
## Residuals 10 93.737 9.374
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit.sp_2)
##
## Call:
## lm(formula = Height ~ Unemployment + Words + Party + Old, data = speech)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7996 -0.7749 -0.3657 1.0318 5.0607
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 173.757726 7.552651 23.006 5.44e-10 ***
## Unemployment -1.633386 0.771201 -2.118 0.0602 .
## Words 0.004376 0.002252 1.943 0.0807 .
## PartyRep -9.294306 3.412363 -2.724 0.0214 *
## Old 0.300276 0.165529 1.814 0.0997 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.062 on 10 degrees of freedom
## Multiple R-squared: 0.4726, Adjusted R-squared: 0.2616
## F-statistic: 2.24 on 4 and 10 DF, p-value: 0.1372
shapiro.test(fit.sp_2$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit.sp_2$residuals
## W = 0.96312, p-value = 0.7464
par(mfrow=c(2,2))
plot(fit.sp_2)
leveragePlots(fit.sp_2)
par(mfrow=c(1,1))
fit.sp_2<-lm(Height~Unemployment+Words,data = speech)
s3d2<-scatterplot3d(Unemployment, Words, Height, color = ifelse(speech$Party=="Dem","blue","red"),pch = 8,main = "Inauguration speech 3D plot",xlab = "Unemployment, %", ylab = "Words", zlab = "Height, cm")
s3d2$plane3d(fit.sp_2,draw_lines = T,draw_polygon = T)
library(e1071)
model <- naiveBayes(Party ~ Inflation + Unemployment, data = speech)
pred_party<-predict(model,speech)
table(pred_party,speech$Party)
##
## pred_party Dem Rep
## Dem 5 1
## Rep 2 7
prd<-predict(model, speech[1:15,], type = "raw")
prd<-as.data.frame(prd)
PartyNB<-ifelse(prd$Dem>0.5,"Dem","Rep")
cbind(prd,PartyNB,speech[,c(5,8,9)])
## Dem Rep PartyNB Party Inflation Unemployment
## Trump 0.4096401778 0.5903598 Rep Rep 2.07 4.7
## Obama2 0.8957235090 0.1042765 Dem Dem 1.59 8.0
## Obama1 0.8748612001 0.1251388 Dem Dem 0.03 7.8
## Bush-2 0.4879916857 0.5120083 Rep Rep 2.97 5.3
## Bush-1 0.1696839427 0.8303161 Rep Rep 3.73 4.2
## Clinton2 0.4805379399 0.5194621 Rep Dem 3.04 5.3
## Clinton1 0.7891007461 0.2108993 Dem Dem 3.26 7.3
## Bush 0.2696398318 0.7303602 Rep Rep 4.67 5.4
## Reagan2 0.7837005443 0.2162995 Dem Rep 3.53 7.5
## Reagan1 0.0000146743 0.9999853 Rep Rep 11.83 7.5
## Carter 0.5092684858 0.4907315 Dem Dem 5.22 7.5
## Nixon2 0.3146474276 0.6853526 Rep Rep 3.65 4.9
## Nixon1 0.0463343819 0.9536656 Rep Rep 4.40 3.4
## Johnson 0.4845045930 0.5154954 Rep Dem 0.97 4.9
## Kennedy 0.7957566867 0.2042433 Dem Dem 1.71 6.6
sum(ifelse(PartyNB==speech$Party,1,0))/length(speech$Party)
## [1] 0.8
plot(Inflation,Unemployment,col=colset,pch=8)
model.svm<-svm(Party~Inflation+Unemployment,data = speech,kernel="linear",type="C-classification",scale=T,probability=T)
pred_svm<-predict(model.svm, speech, probability=T)
cbind(attr(pred_svm,which = "probabilities"),speech[,c(5,8,9)])
## Rep Dem Party Inflation Unemployment
## Trump 0.5758109 0.4241891 Rep 2.07 4.7
## Obama2 0.3595821 0.6404179 Dem 1.59 8.0
## Obama1 0.3376480 0.6623520 Dem 0.03 7.8
## Bush-2 0.5619694 0.4380306 Rep 2.97 5.3
## Bush-1 0.6438054 0.3561946 Rep 3.73 4.2
## Clinton2 0.5637247 0.4362753 Dem 3.04 5.3
## Clinton1 0.4456091 0.5543909 Dem 3.26 7.3
## Bush 0.5979102 0.4020898 Rep 4.67 5.4
## Reagan2 0.4401330 0.5598670 Rep 3.53 7.5
## Reagan1 0.6459998 0.3540002 Rep 11.83 7.5
## Carter 0.4833000 0.5167000 Dem 5.22 7.5
## Nixon2 0.6025424 0.3974576 Rep 3.65 4.9
## Nixon1 0.7044984 0.2955016 Rep 4.40 3.4
## Johnson 0.5359028 0.4640972 Dem 0.97 4.9
## Kennedy 0.4493649 0.5506351 Dem 1.71 6.6
table(pred_svm,Party)
## Party
## pred_svm Dem Rep
## Dem 5 1
## Rep 2 7
sum(ifelse(pred_svm==speech$Party,1,0))/length(speech$Party)
## [1] 0.8
plot(Party~Inflation,speech,col=colset)
plot(Party~Unemployment,speech,col=colset)
plot(Party~Old,speech,col=colset)
plot(model.svm,speech,Inflation~Unemployment,col=colset)
scatterplot3d(Unemployment, Inflation, Old, color = ifelse(Party=="Dem","blue","red"),pch = 8,main = "Parties domain 3D plot",xlab = "Unemployment, %", ylab = "Inflation, %", zlab = "Old",type = "h")
3D view
Here we use additional data for elected US presidents (1945-2017) without Words, Seconds and WPM values. We want to estimate probability of the Party as a function of the years Old.
uspres<-read.table(file="uspres.txt")
uspres$Pind<-ifelse(uspres$Party=="Rep",1,0)
uspres
## Year Party Old Height Infl Unempl GDPgr Pind
## Trump 2017 Rep 70 184 2.07 4.7 2.94 1
## Obama2 2013 Dem 52 185 1.59 8.0 4.31 0
## Obama1 2009 Dem 48 185 0.03 7.8 0.11 0
## Bush-2 2005 Rep 59 182 2.97 5.3 6.52 1
## Bush-1 2001 Rep 55 182 3.73 4.2 2.19 1
## Clinton2 1997 Dem 51 188 3.04 5.3 6.05 0
## Clinton1 1993 Dem 47 188 3.26 7.3 5.00 0
## Bush 1989 Rep 65 188 4.67 5.4 6.48 1
## Reagan2 1985 Rep 74 185 3.53 7.5 7.37 1
## Reagan1 1981 Rep 70 185 11.83 7.5 9.69 1
## Carter 1977 Dem 53 177 5.22 7.5 11.88 0
## Ford 1974 Rep 61 183 9.39 5.1 8.38 1
## Nixon2 1973 Rep 60 182 3.65 4.9 11.04 1
## Nixon1 1969 Rep 56 182 4.40 3.4 7.28 1
## Johnson 1965 Dem 55 192 0.97 4.9 10.70 0
## Kennedy 1961 Dem 44 183 1.71 6.6 7.48 0
## Eisenhower2 1957 Rep 67 179 2.99 4.2 3.12 1
## Eisenhower1 1953 Rep 63 179 0.38 2.9 1.39 1
## Truman2 1949 Dem 65 175 1.27 4.3 -3.46 0
## Truman1 1945 Dem 61 175 2.30 3.0 76.35 0
fit.lg<-glm(Pind~Old,data = uspres,family = binomial(link = "logit"))
summary(fit.lg)
##
## Call:
## glm(formula = Pind ~ Old, family = binomial(link = "logit"),
## data = uspres)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0746 -0.5972 0.2077 0.6781 1.4569
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -15.2944 6.6288 -2.307 0.0210 *
## Old 0.2665 0.1143 2.332 0.0197 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 27.526 on 19 degrees of freedom
## Residual deviance: 16.695 on 18 degrees of freedom
## AIC: 20.695
##
## Number of Fisher Scoring iterations: 5
anova(fit.lg)
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: Pind
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 19 27.526
## Old 1 10.831 18 16.695
logi.hist.plot(uspres$Old,uspres$Pind,xlabel = "Old",boxp = F,mainlabel = "US president Republican versus Democrat")
points(uspres$Old,fitted(fit.lg),pch=8)
grid(col = "black")
So what’s this all math about? Here are some results in brief (more results with machine learning will be later):
The mean Republican’s presidents speech is more longer in seconds as compared to the Democrat’s $T_{Rep}=1103, $ \(T_{Dem}=967.6\) while the overall mean president inauguration speech duration \(T_{all}=1040\). D.Trump spoke for a long as 980 s that is closer to the Democrats than to the Republicans. So he is a real business man not a politician one so far. Remember 1997 Bill Clinton’s speech who made the record of 1315 s being not a Republican! And what about number of Words?
The mean Republican’s presidents speech is more longer in words as compared to the Democrat’s \(W_{Rep}=2043\), \(W_{Dem}=1764\) while the overall mean president inauguration speech duration \(W_{all}=1912\). D.Trump told the crowd 1444 words that is closer to the Democrats than to the Republicans. We don’t discuss here the contents of his speech. Remember 1985 Ronald Reagan’s speech who made the record of 2561 words being a Republican! And what about Words per minute?
The mean Republican’s presidents speech tempo WPM as compared to the Democrat’s \(WPM_{Rep}=110.75\), \(WPM_{Dem}=109.85\). D.Trump speech tempo is 88 that is one of the lowest. Remember 1965 JFK’s speech who made the record of WPM=135 and time of speech T=671 s being a Democrat! And what about years Old?
The mean Republican’s president as compared to the Democrat’s \(OLD_{Rep}=63.6\), \(OLD_{Dem}=50\). D.Trump is 70 that is one of the oldest. Remember 1985 when Ronald Reagan had been elected being 74 and 1961 when JFK had been elected being 44! And what about Height?
The mean Republican’s president Height compared to the Democrat’s \(H_{Rep}=183.75\), \(H_{Dem}=185.42\). D.Trump is 184 cm that is he stands in the middle. Remember 1965 Lyndon Johnson 192 cm and 1977 Jimmy Carter, both Democrats! And what about Inflation %?
The mean Republican’s president Inflation rate in % at election compared to the Democrat’s \(Infl_{Rep}=4.61\), \(Infl_{Dem}=2.26\). D.Trump has got 2.07% Inflation from Democrat B.Obama. Remember 1981 Ronald Reagan 11.83% and 2009 B.Obama 0.03%! And what about Unemployment %?
The mean Republican’s president Unemployment in % at election compared to the Democrat’s \(Unempl_{Rep}=5.36\), \(Unempl_{Dem}=6.77\). D.Trump has got 4.7% Unemployment from Democrat B.Obama. Remember 2013 B.Obama 8.0% and 1969 R.Nixon 3.4%! And what about GDP growth rate in %?
The mean Republican’s president GDP growth in % at election compared to the Democrat’s \(GDPgr_{Rep}=6.68\), \(GDPgr_{Dem}=6.50\). D.Trump has got 2.94% GDP growth from Democrat B.Obama. Remember 1977 Jimmy Carter 11.88% and 2009 B.Obama 0.11%!