ds1 <- read.csv(url("http://elections.huffingtonpost.com/pollster/2016-general-election-trump-vs-clinton.csv"))
names(ds1)
## [1] "Pollster" "Start.Date"
## [3] "End.Date" "Entry.Date.Time..ET."
## [5] "Number.of.Observations" "Population"
## [7] "Mode" "Trump"
## [9] "Clinton" "Other"
## [11] "Undecided" "Pollster.URL"
## [13] "Source.URL" "Partisan"
## [15] "Affiliation" "Question.Text"
## [17] "Question.Iteration"
tipos <- function(x){
i <- 1
t <- ncol(x)
while (i <= t ){
pal <- class(x[1,i])
s <- paste(names(x)[i] , pal , sep = ' : ')
print(s)
i <- i + 1
}
}
tipos(ds1)
## [1] "Pollster : factor"
## [1] "Start.Date : factor"
## [1] "End.Date : factor"
## [1] "Entry.Date.Time..ET. : factor"
## [1] "Number.of.Observations : integer"
## [1] "Population : factor"
## [1] "Mode : factor"
## [1] "Trump : numeric"
## [1] "Clinton : numeric"
## [1] "Other : numeric"
## [1] "Undecided : numeric"
## [1] "Pollster.URL : factor"
## [1] "Source.URL : factor"
## [1] "Partisan : factor"
## [1] "Affiliation : factor"
## [1] "Question.Text : factor"
## [1] "Question.Iteration : integer"
sapply(ds1[1,], class)
## Pollster Start.Date End.Date
## "factor" "factor" "factor"
## Entry.Date.Time..ET. Number.of.Observations Population
## "factor" "integer" "factor"
## Mode Trump Clinton
## "factor" "numeric" "numeric"
## Other Undecided Pollster.URL
## "numeric" "numeric" "factor"
## Source.URL Partisan Affiliation
## "factor" "factor" "factor"
## Question.Text Question.Iteration
## "factor" "integer"
dim(ds1)
## [1] 1310 17
El grupo objetivo en esta encuesta se puede dividir en dos. Por un lado tenemos a los entrevistadores, por el otro a los entrevistados. Canales de televisión, periódicos y organizaciones forman parte del grupo de entrevistadores. En cambio los entrevistados son ciudadanos con alta probabilidad de votar en las elecciones presidenciales de noviembre de 2016.
a <- sapply(ds1[1,], class)
numericas <- a[a == "numeric"]
numericas
## Trump Clinton Other Undecided
## "numeric" "numeric" "numeric" "numeric"
Descripción estadística de estas variables:
a <- sapply(ds1[1,], class)
f <- a[a == "factor"]
f
## Pollster Start.Date End.Date
## "factor" "factor" "factor"
## Entry.Date.Time..ET. Population Mode
## "factor" "factor" "factor"
## Pollster.URL Source.URL Partisan
## "factor" "factor" "factor"
## Affiliation Question.Text
## "factor" "factor"
str(ds1)
## 'data.frame': 1310 obs. of 17 variables:
## $ Pollster : Factor w/ 46 levels "ABC/Post","AP-GfK (web)",..: 38 38 38 38 45 45 45 45 45 45 ...
## $ Start.Date : Factor w/ 236 levels "2015-05-19","2015-06-20",..: 236 236 236 236 235 235 235 235 235 235 ...
## $ End.Date : Factor w/ 215 levels "2015-05-26","2015-06-22",..: 215 215 215 215 214 214 214 214 214 214 ...
## $ Entry.Date.Time..ET. : Factor w/ 359 levels "2015-05-28T21:52:59Z",..: 356 356 356 356 351 351 351 351 351 351 ...
## $ Number.of.Observations: int 1500 NA NA NA 1055 415 255 385 1055 415 ...
## $ Population : Factor w/ 9 levels "Adults","Likely Voters",..: 2 3 4 5 6 7 8 9 6 7 ...
## $ Mode : Factor w/ 5 levels "Automated Phone",..: 2 2 2 2 3 3 3 3 3 3 ...
## $ Trump : num 42 13 78 39 40 8 85 36 43 8 ...
## $ Clinton : num 41 77 10 32 43 84 5 35 48 88 ...
## $ Other : num 4 3 4 8 5 2 2 10 8 4 ...
## $ Undecided : num 3 3 4 4 7 5 4 11 1 0 ...
## $ Pollster.URL : Factor w/ 359 levels "http://elections.huffingtonpost.com/pollster/polls/abc-post-22720",..: 311 311 311 311 355 355 355 355 355 355 ...
## $ Source.URL : Factor w/ 331 levels " https://today.yougov.com/news/2016/06/29/yougoveconomist-poll-june-24-27-2016/",..: 223 223 223 223 303 303 303 303 303 303 ...
## $ Partisan : Factor w/ 3 levels "Nonpartisan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Affiliation : Factor w/ 4 levels "Dem","None","Other",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Question.Text : Factor w/ 61 levels "","And if the election for President was held today and the candidates were Democrat Hillary Clinton,\nRepublican Donald Trump, Li"| __truncated__,..: 25 25 25 25 14 14 14 14 1 1 ...
## $ Question.Iteration : int 1 1 1 1 1 1 1 1 2 2 ...
tabla <- function(x) {
cbind(frec = table(x),
porcentaje = round(prop.table(table(x))*100, 2))
}
tabla(ds1$Pollster)
## frec porcentaje
## ABC/Post 14 1.07
## AP-GfK (web) 3 0.23
## ARG 17 1.30
## Bloomberg/Selzer 7 0.53
## CBS 28 2.14
## CBS/Times 26 1.98
## CNBC 1 0.08
## CNN 92 7.02
## Echelon Insights (R) 2 0.15
## Emerson College Polling Society 4 0.31
## FOX 78 5.95
## Fairleigh Dickinson/SSRS 5 0.38
## Franklin Pierce/RKM/Boston Herald 12 0.92
## GQR (D-Democracy Corps) 3 0.23
## GQR (D-Democracy Corps/Women's Voices Women Vote) 1 0.08
## GWU/Battleground 5 0.38
## Gravis Marketing/OANN 10 0.76
## IBD/TIPP 48 3.66
## ICITIZEN 6 0.46
## Ipsos/Reuters 160 12.21
## MSNBC/Telemundo/Marist 8 0.61
## McClatchy/Marist 36 2.75
## McLaughlin (R) 8 0.61
## Monmouth University 21 1.60
## Morning Consult 274 20.92
## NBC/SurveyMonkey 44 3.36
## NBC/WSJ 16 1.22
## Normington, Petts & Associates (D-End Citizens United) 2 0.15
## PPP (D) 68 5.19
## PSRAI 1 0.08
## Penn Schoen Berland 12 0.92
## Pew 4 0.31
## Public Religion Research Institute 1 0.08
## Public Religion Research Institute/The Atlantic 3 0.23
## Quinnipiac 80 6.11
## RABA Research 1 0.08
## Raba Research 2 0.15
## Rasmussen 53 4.05
## Saint Leo University 3 0.23
## Schoen (D) 1 0.08
## Suffolk/USA Today 32 2.44
## SurveyUSA 1 0.08
## UPI/CVOTER 9 0.69
## University of Delaware/PSRAI 4 0.31
## YouGov/Economist 100 7.63
## Zogby (Internet) 4 0.31
tabla(ds1$Mode)
## frec porcentaje
## Automated Phone 14 1.07
## IVR/Online 121 9.24
## Internet 628 47.94
## Live Phone 546 41.68
## Mixed 1 0.08
tabla(ds1$Population)
## frec porcentaje
## Adults 3 0.23
## Likely Voters 178 13.59
## Likely Voters - Democrat 98 7.48
## Likely Voters - Republican 98 7.48
## Likely Voters - independent 99 7.56
## Registered Voters 292 22.29
## Registered Voters - Democrat 181 13.82
## Registered Voters - Republican 181 13.82
## Registered Voters - independent 180 13.74
Variables categóricas más importantes: