This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

## Librerias a usar
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(tidycensus)
library(mapview)
library(viridis)
## Loading required package: viridisLite
library(tidycensus)
library(knitr)
library(leaflet)
library(stringr)
library(ggplot2)
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
options(tigris_use_cache = TRUE)
library(tidycensus)
library(tidyverse)
library(knitr)
library(leaflet)
library(stringr)
library(sf)
options(tigris_use_cache = TRUE)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
census_api_key("33fa3208cdfd6d0618e3d2c5f64f3c02880ea593")
## To install your API key for use in future sessions, run this function with `install = TRUE`.

-Presentación del ejercicio. El propósito de este planteamiento lógico-matemático es analizar la densidad poblacional en los Estados Unidos, catalogando a los estados de acuerdo a esta variable. A continuación se evidencia una tabla con los datos que relacionan el tipo de vivienda ocupada en los distintos estados: *Cabe añadir que la casilla “name” se entenderá como “population”

v10 <- load_variables(2010, "sf1", cache = TRUE)
v10 <- v10 %>% 
       filter(grepl("population", tolower(label), fixed = TRUE))
kable(head(v10))
name label concept
H011001 Total population in occupied housing units TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
H011002 Total population in occupied housing units!!Owned with a mortgage or a loan TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
H011003 Total population in occupied housing units!!Owned free and clear TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
H011004 Total population in occupied housing units!!Renter occupied TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
H011A001 Population in occupied housing units TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE (WHITE ALONE HOUSEHOLDER)
H011A002 Population in occupied housing units!!Owned with a mortgage or a loan TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE (WHITE ALONE HOUSEHOLDER)

A continuación se evidencia una muestra del valor poblacional de algunos estados, ordenados ascendentemente de acuerdo al abecedario:

population <- get_decennial(geography = "state", variables = c(population = "H011001"), 
                            shift_geo = TRUE, geometry = TRUE)
## Warning: The `shift_geo` argument is deprecated and will be removed in a future
## release. We recommend using `tigris::shift_geometry()` instead.
## Getting data from the 2010 decennial Census
## Using feature geometry obtained from the albersusa package
## Using Census Summary File 1
## Please note: Alaska and Hawaii are being shifted and are not to scale.
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## Warning: The `shift_geo` argument is deprecated and will be removed in a future
## release. We recommend using `tigris::shift_geometry()` instead.
## Getting data from the 2010 decennial Census
## Using feature geometry obtained from the albersusa package
## Using Census Summary File 1
## Please note: Alaska and Hawaii are being shifted and are not to scale.
## old-style crs object detected; please recreate object with a recent sf::st_crs()
kable(head(population))
GEOID NAME variable value geometry
04 Arizona population 6252633 MULTIPOLYGON (((-1111066 -8…
05 Arkansas population 2836987 MULTIPOLYGON (((557903.1 -1…
06 California population 36434140 MULTIPOLYGON (((-1853480 -9…
08 Colorado population 4913318 MULTIPOLYGON (((-613452.9 -…
09 Connecticut population 3455945 MULTIPOLYGON (((2226838 519…
11 District of Columbia population 561702 MULTIPOLYGON (((1960720 -41…

NOTA: En la anterior tabla es posible percibir que California es el estado que preside la densidad poblacional, seguida de Arizona, Colorado, Connecticut, Arkansas y Distrito de Columbia. Además cabe añadir que, se tiene en cuenta las variables espaciales de cada estado para concluir con dichos resultados. La relación entre los resultados poblacionales se pueden interpretar en la siguiente gráfica:

# Datos
x <- c(04, 05, 06, 08, 09, 11)
y <- c(6252633, 2836987, 36434140, 4913318, 3455945, 561702)

# Vectores
plot(x, y, type = "l")

Para dar mayor claridad a la propuesta, se presenta un mapa general del país estadounidense con un panorama completo de la densidad poblacional en los distintos estados:

pal <- colorNumeric(palette = "viridis", 
                    domain = population$value)
population %>%
  st_transform(crs = "+init=epsg:4326") %>%
  leaflet(width = "100%") %>%
  addProviderTiles(provider = "CartoDB.Positron") %>%
  addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
              stroke = FALSE,
              smoothFactor = 0,
              fillOpacity = 0.7,
              color = ~ pal(value)) %>%
  addLegend("bottomright", 
            pal = pal, 
            values = ~ value,
            title = "Population",
            #labFormat = labelFormat(prefix = "$"),
            opacity = 1)
## Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
## deprecated. It might return a CRS with a non-EPSG compliant axis order.

ANÁLISIS DEL EJERCICIO:

Medidas de localización y dispersión

library(tidyverse)
library(sf)
library(tigris)
library(tidycensus)
library(mapview)
library(viridis)
library(tidycensus)
library(knitr)
library(leaflet)
library(stringr)
library(ggplot2)
library(openintro)
options(tigris_use_cache = TRUE)

-Medidas de tendencia central

A continuación se busca el punto central de los datos generales que arroja el mapa

x<-c(5000000,10000000,15000000,20000000,25000000, 30000000,35000000);mean(x);median(x);table(x)
## [1] 2e+07
## [1] 2e+07
## x
##   5e+06   1e+07 1.5e+07   2e+07 2.5e+07   3e+07 3.5e+07 
##       1       1       1       1       1       1       1

Media:

with(population, mean(value, na.rm = TRUE))
## [1] 5897220

*Siendo esta la probabilidad como valor esperado de los datos

Mediana:

with(population, median(value, na.rm = TRUE))
## [1] 4213497

*siendo este el dato central de la secuencia de datos que indicaría el mapa poblacional

Moda:

with(population, as.numeric(names(table(value))[table(value)==max(table(value))]))
##  [1]   549914   561702   600412   647535   683879   780130   873521   960566
##  [9]  1009904  1276366  1292816  1317421  1538631  1775176  1803612  2016550
## [17]  2664397  2717733  2774044  2836987  2875333  2948243  3455945  3639334
## [25]  3744432  4213497  4405945  4486210  4663920  4913318  5168530  5536772
## [33]  5635177  5814785  6192633  6252633  6296879  6308747  6585165  7761190
## [41]  8605018  9278237  9434454  9654572 11230238 12276266 12528859 18379601
## [49] 18792424 24564422 36434140

*Siendo estos los valores que más se repiten en el mapa

A continuación se analizará que tanto se alejan los datos en relación a la media aritmética

Desviación estándar:

Poblacional:

with(population, sqrt(var(value, na.rm = TRUE)*(length(value)-1)/length(value)))
## [1] 6598280

*Teniendo que la cifra es alta, nos indica una gran dispersión de los datos en relacióna la media

Muestral:

with(population, sqrt(var(value, na.rm = TRUE)))
## [1] 6663936

*Valor insesgado del varaizan poblacional

Media absoluta:

with(population, mean(abs(value-mean(value, na.rm = TRUE)), na.rm = TRUE))
## [1] 4326382

*DMA considera todos los datos poblacionales, no sólo el mayor y el menor y mide el promedio de la variación

Varianza:

poblacional:

with(population, mean((value-mean(value, na.rm = TRUE))**2, na.rm = TRUE))
## [1] 4.35373e+13

*siendo este el punto de inicio de la nube de datos general en la población estadounidense

Rango:

Range = function(x){
    maximun = max(x, na.rm = TRUE)
    minimun = min(x, na.rm = TRUE)
    Range = maximun - minimun
    return(Range)
}
with(population, Range(value))
## [1] 35884226

*Diferencia entre el valor mayor y el menor de las poblaciones

with(population, range(value, na.rm = TRUE))
## [1]   549914 36434140

*Estos son los valores que se uso en la resta, el menor y el mayor

with(population, diff(range(value, na.rm = TRUE)))
## [1] 35884226

*Comprobación: se restan y genera nuevamente el rango

Rango intercuartílico:

with(population, IQR(value, na.rm = TRUE))
## [1] 4790052

*se divide la información entre cuartiles y se calcula el rango entre cada uno de ellos

Distribución normal:

ingresos.medianos <- as.data.frame(rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE)))
library(ggplot2)
p <- ggplot(ingresos.medianos, aes(x=`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`)) + geom_density()
p

options(tigris_use_cache = TRUE)

*Siendo esta ena ditribución de probabilidad para las variables poblacionales de acuerdo a los parametros de localización (media, mediana y moda).

p + geom_vline(aes(xintercept=mean(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`)), color="blue", linetype="dashed", size=1)

library(ggplot2)
p <- ggplot(population, aes(x=value)) + 
  geom_density()
p

El sesgo poblacional

library(e1071)
with(ingresos.medianos, skewness(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`, na.rm = TRUE))
## [1] 0.3332498
library(e1071)
with(population, skewness(value, na.rm = TRUE))
## [1] 2.510988

La curtosis poblacional

library(e1071)
with(ingresos.medianos, kurtosis(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`, na.rm = TRUE))
## [1] -0.6490457
library(e1071)
with(population, kurtosis(value, na.rm = TRUE))
## [1] 7.513754