2016 General Election (Trump vs Clinton)

ds1 <- read.csv(url("http://elections.huffingtonpost.com/pollster/2016-general-election-trump-vs-clinton.csv"))

Variables:

Nombres:

names(ds1)

##  [1] "Pollster"               "Start.Date"            
##  [3] "End.Date"               "Entry.Date.Time..ET."  
##  [5] "Number.of.Observations" "Population"            
##  [7] "Mode"                   "Trump"                 
##  [9] "Clinton"                "Other"                 
## [11] "Undecided"              "Pollster.URL"          
## [13] "Source.URL"             "Partisan"              
## [15] "Affiliation"            "Question.Text"         
## [17] "Question.Iteration"

Descripciones:

Pollster : entrevistadores diferentes
Start Date : fecha de inicio de la entrevista
End Date : fecha de fin de la entrevista
Entry Date/Time (ET) : fecha y hora de registro de la entrevista al sistema
Number of Observations : cantidad de entrevistados
Population : caracteristicas del entrevistado
Mode : método por el cual se realizo la entrevista
Trump : cantidad de respuestas de los entrevistados que votarían por Trump
Clinton : cantidad de respuestas de los entrevistados que votarían por Clinton
Other : cantidad de respuestas de los entrevistados que preferirían a un candidato diferente
Undecided : cantidad de entrevistados que están indecisos con su votación
Pollster URL : pagina web de la ubicacion de las entrevistas
Source URL : pagina web de donde se consiguieron las entrevistas
Partisan : categoría de ser partidista en la poliítica
Affiliation : categoría de los partidos políticos a los cuales pertenecen los entrevistados
Question Text : contiene el texto de las preguntas
Question Iteration : numero de veces que se tuvo que realizar la pregunta

Tipos de datos:

Primera forma:

tipos <- function(x){
    i <- 1
    t <- ncol(x) 
    while (i <= t ){
        pal <- class(x[1,i])
        s <- paste(names(x)[i] , pal , sep = ' : ')
        print(s)
        i <- i + 1 
    }
}

tipos(ds1)

## [1] "Pollster : factor"
## [1] "Start.Date : factor"
## [1] "End.Date : factor"
## [1] "Entry.Date.Time..ET. : factor"
## [1] "Number.of.Observations : integer"
## [1] "Population : factor"
## [1] "Mode : factor"
## [1] "Trump : numeric"
## [1] "Clinton : numeric"
## [1] "Other : numeric"
## [1] "Undecided : numeric"
## [1] "Pollster.URL : factor"
## [1] "Source.URL : factor"
## [1] "Partisan : factor"
## [1] "Affiliation : factor"
## [1] "Question.Text : factor"
## [1] "Question.Iteration : integer"

Segunda forma:

sapply(ds1[1,], class)

##               Pollster             Start.Date               End.Date 
##               "factor"               "factor"               "factor" 
##   Entry.Date.Time..ET. Number.of.Observations             Population 
##               "factor"              "integer"               "factor" 
##                   Mode                  Trump                Clinton 
##               "factor"              "numeric"              "numeric" 
##                  Other              Undecided           Pollster.URL 
##              "numeric"              "numeric"               "factor" 
##             Source.URL               Partisan            Affiliation 
##               "factor"               "factor"               "factor" 
##          Question.Text     Question.Iteration 
##               "factor"              "integer"

Dimensiones:

dim(ds1)

## [1] 1310   17

Filas: 1310

Columnas: 17

Grupo Objetivo:

El grupo objetivo en esta encuesta se puede dividir en dos. Por un lado tenemos a los entrevistadores, por el otro a los entrevistados. Canales de televisión, periódicos y organizaciones forman parte del grupo de entrevistadores. En cambio los entrevistados son ciudadanos con alta probabilidad de votar en las elecciones presidenciales de noviembre de 2016.

Variables numéricas:

a <- sapply(ds1[1,], class)
numericas <- a[a == "numeric"]
numericas

##     Trump   Clinton     Other Undecided 
## "numeric" "numeric" "numeric" "numeric"

Descripción estadística de estas variables:

Trump
- Min: 2
- 1st Qu.: 31.25
- Mediana : 39
- Media : 40.721374
- 3rd Qu. : 39
- Max : 93
- Rango : 2, 93
- IQR : 14.75
- Medidas de variabilidad:
  - S: 23.6697307
  - V: 560.2561509

Clinton
- Min: 1
- 1st Qu.: 26
- Mediana : 42
- Media : 42.7412214
- 3rd Qu. : 42
- Max : 97
- Rango : 1, 97
- IQR : 24
- Medidas de variabilidad:
  - S: 25.3943876
  - V: 644.8749229

Other
- Min: 0
- 1st Qu.: 3
- Mediana : 4
- Media : 5.9898763
- 3rd Qu. : 4
- Max : 34
- Rango : 0, 34
- IQR : 5
- Medidas de variabilidad:
  - S: 5.1503202
  - V: 26.5257983

Undecided
- Min: 0
- 1st Qu.: 5
- Mediana : 9
- Media : 9.7931583
- 3rd Qu. : 9
- Max : 36
- Rango : 0, 36
- IQR : 8
- Medidas de variabilidad:
  - S: 6.4093521
  - V: 41.0797939

Variables categóricas:

a <- sapply(ds1[1,], class)
f <- a[a == "factor"]
f

##             Pollster           Start.Date             End.Date 
##             "factor"             "factor"             "factor" 
## Entry.Date.Time..ET.           Population                 Mode 
##             "factor"             "factor"             "factor" 
##         Pollster.URL           Source.URL             Partisan 
##             "factor"             "factor"             "factor" 
##          Affiliation        Question.Text 
##             "factor"             "factor"

str(ds1)

## 'data.frame':    1310 obs. of  17 variables:
##  $ Pollster              : Factor w/ 46 levels "ABC/Post","AP-GfK (web)",..: 38 38 38 38 45 45 45 45 45 45 ...
##  $ Start.Date            : Factor w/ 236 levels "2015-05-19","2015-06-20",..: 236 236 236 236 235 235 235 235 235 235 ...
##  $ End.Date              : Factor w/ 215 levels "2015-05-26","2015-06-22",..: 215 215 215 215 214 214 214 214 214 214 ...
##  $ Entry.Date.Time..ET.  : Factor w/ 359 levels "2015-05-28T21:52:59Z",..: 356 356 356 356 351 351 351 351 351 351 ...
##  $ Number.of.Observations: int  1500 NA NA NA 1055 415 255 385 1055 415 ...
##  $ Population            : Factor w/ 9 levels "Adults","Likely Voters",..: 2 3 4 5 6 7 8 9 6 7 ...
##  $ Mode                  : Factor w/ 5 levels "Automated Phone",..: 2 2 2 2 3 3 3 3 3 3 ...
##  $ Trump                 : num  42 13 78 39 40 8 85 36 43 8 ...
##  $ Clinton               : num  41 77 10 32 43 84 5 35 48 88 ...
##  $ Other                 : num  4 3 4 8 5 2 2 10 8 4 ...
##  $ Undecided             : num  3 3 4 4 7 5 4 11 1 0 ...
##  $ Pollster.URL          : Factor w/ 359 levels "http://elections.huffingtonpost.com/pollster/polls/abc-post-22720",..: 311 311 311 311 355 355 355 355 355 355 ...
##  $ Source.URL            : Factor w/ 331 levels " https://today.yougov.com/news/2016/06/29/yougoveconomist-poll-june-24-27-2016/",..: 223 223 223 223 303 303 303 303 303 303 ...
##  $ Partisan              : Factor w/ 3 levels "Nonpartisan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Affiliation           : Factor w/ 4 levels "Dem","None","Other",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Question.Text         : Factor w/ 61 levels "","And if the election for President was held today and the candidates were Democrat Hillary Clinton,\nRepublican Donald Trump, Li"| __truncated__,..: 25 25 25 25 14 14 14 14 1 1 ...
##  $ Question.Iteration    : int  1 1 1 1 1 1 1 1 2 2 ...

tabla <- function(x) {
    cbind(frec = table(x), 
    porcentaje = round(prop.table(table(x))*100, 2)) 
}

tabla(ds1$Pollster)

##                                                        frec porcentaje
## ABC/Post                                                 14       1.07
## AP-GfK (web)                                              3       0.23
## ARG                                                      17       1.30
## Bloomberg/Selzer                                          7       0.53
## CBS                                                      28       2.14
## CBS/Times                                                26       1.98
## CNBC                                                      1       0.08
## CNN                                                      92       7.02
## Echelon Insights (R)                                      2       0.15
## Emerson College Polling Society                           4       0.31
## FOX                                                      78       5.95
## Fairleigh Dickinson/SSRS                                  5       0.38
## Franklin Pierce/RKM/Boston Herald                        12       0.92
## GQR (D-Democracy Corps)                                   3       0.23
## GQR (D-Democracy Corps/Women's Voices Women Vote)         1       0.08
## GWU/Battleground                                          5       0.38
## Gravis Marketing/OANN                                    10       0.76
## IBD/TIPP                                                 48       3.66
## ICITIZEN                                                  6       0.46
## Ipsos/Reuters                                           160      12.21
## MSNBC/Telemundo/Marist                                    8       0.61
## McClatchy/Marist                                         36       2.75
## McLaughlin (R)                                            8       0.61
## Monmouth University                                      21       1.60
## Morning Consult                                         274      20.92
## NBC/SurveyMonkey                                         44       3.36
## NBC/WSJ                                                  16       1.22
## Normington, Petts & Associates (D-End Citizens United)    2       0.15
## PPP (D)                                                  68       5.19
## PSRAI                                                     1       0.08
## Penn Schoen Berland                                      12       0.92
## Pew                                                       4       0.31
## Public Religion Research Institute                        1       0.08
## Public Religion Research Institute/The Atlantic           3       0.23
## Quinnipiac                                               80       6.11
## RABA Research                                             1       0.08
## Raba Research                                             2       0.15
## Rasmussen                                                53       4.05
## Saint Leo University                                      3       0.23
## Schoen (D)                                                1       0.08
## Suffolk/USA Today                                        32       2.44
## SurveyUSA                                                 1       0.08
## UPI/CVOTER                                                9       0.69
## University of Delaware/PSRAI                              4       0.31
## YouGov/Economist                                        100       7.63
## Zogby (Internet)                                          4       0.31

tabla(ds1$Mode)

##                 frec porcentaje
## Automated Phone   14       1.07
## IVR/Online       121       9.24
## Internet         628      47.94
## Live Phone       546      41.68
## Mixed              1       0.08

tabla(ds1$Population)

##                                 frec porcentaje
## Adults                             3       0.23
## Likely Voters                    178      13.59
## Likely Voters - Democrat          98       7.48
## Likely Voters - Republican        98       7.48
## Likely Voters - independent       99       7.56
## Registered Voters                292      22.29
## Registered Voters - Democrat     181      13.82
## Registered Voters - Republican   181      13.82
## Registered Voters - independent  180      13.74

Variables categóricas más importantes:

Pollster: ya que se puede reconocer quien está entrevistando
Mode: se conoce el medio por el cual se llevó a cabo la entrevista
Population: conocemos quienes estan contestando las entrevistas