This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
## Librerias a usar
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(tidycensus)
library(mapview)
library(viridis)
## Loading required package: viridisLite
library(tidycensus)
library(knitr)
library(leaflet)
library(stringr)
library(ggplot2)
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
options(tigris_use_cache = TRUE)
library(tidycensus)
library(tidyverse)
library(knitr)
library(leaflet)
library(stringr)
library(sf)
options(tigris_use_cache = TRUE)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
census_api_key("33fa3208cdfd6d0618e3d2c5f64f3c02880ea593")
## To install your API key for use in future sessions, run this function with `install = TRUE`.
-Presentación del ejercicio. El propósito de este planteamiento lógico-matemático es analizar la densidad poblacional en los Estados Unidos, catalogando a los estados de acuerdo a esta variable. A continuación se evidencia una tabla con los datos que relacionan el tipo de vivienda ocupada en los distintos estados: *Cabe añadir que la casilla “name” se entenderá como “population”
v10 <- load_variables(2010, "sf1", cache = TRUE)
v10 <- v10 %>%
filter(grepl("population", tolower(label), fixed = TRUE))
kable(head(v10))
| name | label | concept |
|---|---|---|
| H011001 | Total population in occupied housing units | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE |
| H011002 | Total population in occupied housing units!!Owned with a mortgage or a loan | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE |
| H011003 | Total population in occupied housing units!!Owned free and clear | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE |
| H011004 | Total population in occupied housing units!!Renter occupied | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE |
| H011A001 | Population in occupied housing units | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE (WHITE ALONE HOUSEHOLDER) |
| H011A002 | Population in occupied housing units!!Owned with a mortgage or a loan | TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE (WHITE ALONE HOUSEHOLDER) |
A continuación se evidencia una muestra del valor poblacional de algunos estados, ordenados ascendentemente de acuerdo al abecedario:
population <- get_decennial(geography = "state", variables = c(population = "H011001"),
shift_geo = TRUE, geometry = TRUE)
## Warning: The `shift_geo` argument is deprecated and will be removed in a future
## release. We recommend using `tigris::shift_geometry()` instead.
## Getting data from the 2010 decennial Census
## Using feature geometry obtained from the albersusa package
## Using Census Summary File 1
## Please note: Alaska and Hawaii are being shifted and are not to scale.
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## Warning: The `shift_geo` argument is deprecated and will be removed in a future
## release. We recommend using `tigris::shift_geometry()` instead.
## Getting data from the 2010 decennial Census
## Using feature geometry obtained from the albersusa package
## Using Census Summary File 1
## Please note: Alaska and Hawaii are being shifted and are not to scale.
## old-style crs object detected; please recreate object with a recent sf::st_crs()
kable(head(population))
| GEOID | NAME | variable | value | geometry |
|---|---|---|---|---|
| 04 | Arizona | population | 6252633 | MULTIPOLYGON (((-1111066 -8… |
| 05 | Arkansas | population | 2836987 | MULTIPOLYGON (((557903.1 -1… |
| 06 | California | population | 36434140 | MULTIPOLYGON (((-1853480 -9… |
| 08 | Colorado | population | 4913318 | MULTIPOLYGON (((-613452.9 -… |
| 09 | Connecticut | population | 3455945 | MULTIPOLYGON (((2226838 519… |
| 11 | District of Columbia | population | 561702 | MULTIPOLYGON (((1960720 -41… |
NOTA: En la anterior tabla es posible percibir que California es el estado que preside la densidad poblacional, seguida de Arizona, Colorado, Connecticut, Arkansas y Distrito de Columbia. Además cabe añadir que, se tiene en cuenta las variables espaciales de cada estado para concluir con dichos resultados. La relación entre los resultados poblacionales se pueden interpretar en la siguiente gráfica:
# Datos
x <- c(04, 05, 06, 08, 09, 11)
y <- c(6252633, 2836987, 36434140, 4913318, 3455945, 561702)
# Vectores
plot(x, y, type = "l")
Para dar mayor claridad a la propuesta, se presenta un mapa general del país estadounidense con un panorama completo de la densidad poblacional en los distintos estados:
pal <- colorNumeric(palette = "viridis",
domain = population$value)
population %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(value)) %>%
addLegend("bottomright",
pal = pal,
values = ~ value,
title = "Population",
#labFormat = labelFormat(prefix = "$"),
opacity = 1)
## Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
## deprecated. It might return a CRS with a non-EPSG compliant axis order.
ANÁLISIS DEL EJERCICIO:
Medidas de localización y dispersión
library(tidyverse)
library(sf)
library(tigris)
library(tidycensus)
library(mapview)
library(viridis)
library(tidycensus)
library(knitr)
library(leaflet)
library(stringr)
library(ggplot2)
library(openintro)
options(tigris_use_cache = TRUE)
-Medidas de tendencia central
A continuación se busca el punto central de los datos generales que arroja el mapa
x<-c(5000000,10000000,15000000,20000000,25000000, 30000000,35000000);mean(x);median(x);table(x)
## [1] 2e+07
## [1] 2e+07
## x
## 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 3.5e+07
## 1 1 1 1 1 1 1
Media:
with(population, mean(value, na.rm = TRUE))
## [1] 5897220
*Siendo esta la probabilidad como valor esperado de los datos
Mediana:
with(population, median(value, na.rm = TRUE))
## [1] 4213497
*siendo este el dato central de la secuencia de datos que indicaría el mapa poblacional
Moda:
with(population, as.numeric(names(table(value))[table(value)==max(table(value))]))
## [1] 549914 561702 600412 647535 683879 780130 873521 960566
## [9] 1009904 1276366 1292816 1317421 1538631 1775176 1803612 2016550
## [17] 2664397 2717733 2774044 2836987 2875333 2948243 3455945 3639334
## [25] 3744432 4213497 4405945 4486210 4663920 4913318 5168530 5536772
## [33] 5635177 5814785 6192633 6252633 6296879 6308747 6585165 7761190
## [41] 8605018 9278237 9434454 9654572 11230238 12276266 12528859 18379601
## [49] 18792424 24564422 36434140
*Siendo estos los valores que más se repiten en el mapa
A continuación se analizará que tanto se alejan los datos en relación a la media aritmética
Desviación estándar:
Poblacional:
with(population, sqrt(var(value, na.rm = TRUE)*(length(value)-1)/length(value)))
## [1] 6598280
*Teniendo que la cifra es alta, nos indica una gran dispersión de los datos en relacióna la media
Muestral:
with(population, sqrt(var(value, na.rm = TRUE)))
## [1] 6663936
*Valor insesgado del varaizan poblacional
Media absoluta:
with(population, mean(abs(value-mean(value, na.rm = TRUE)), na.rm = TRUE))
## [1] 4326382
*DMA considera todos los datos poblacionales, no sólo el mayor y el menor y mide el promedio de la variación
Varianza:
poblacional:
with(population, mean((value-mean(value, na.rm = TRUE))**2, na.rm = TRUE))
## [1] 4.35373e+13
*siendo este el punto de inicio de la nube de datos general en la población estadounidense
Rango:
Range = function(x){
maximun = max(x, na.rm = TRUE)
minimun = min(x, na.rm = TRUE)
Range = maximun - minimun
return(Range)
}
with(population, Range(value))
## [1] 35884226
*Diferencia entre el valor mayor y el menor de las poblaciones
with(population, range(value, na.rm = TRUE))
## [1] 549914 36434140
*Estos son los valores que se uso en la resta, el menor y el mayor
with(population, diff(range(value, na.rm = TRUE)))
## [1] 35884226
*Comprobación: se restan y genera nuevamente el rango
Rango intercuartílico:
with(population, IQR(value, na.rm = TRUE))
## [1] 4790052
*se divide la información entre cuartiles y se calcula el rango entre cada uno de ellos
Distribución normal:
ingresos.medianos <- as.data.frame(rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE)))
library(ggplot2)
p <- ggplot(ingresos.medianos, aes(x=`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`)) + geom_density()
p
options(tigris_use_cache = TRUE)
*Siendo esta ena ditribución de probabilidad para las variables poblacionales de acuerdo a los parametros de localización (media, mediana y moda).
p + geom_vline(aes(xintercept=mean(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`)), color="blue", linetype="dashed", size=1)
library(ggplot2)
p <- ggplot(population, aes(x=value)) +
geom_density()
p
El sesgo poblacional
library(e1071)
with(ingresos.medianos, skewness(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`, na.rm = TRUE))
## [1] 0.3332498
library(e1071)
with(population, skewness(value, na.rm = TRUE))
## [1] 2.510988
La curtosis poblacional
library(e1071)
with(ingresos.medianos, kurtosis(`rnorm(n = length(population$value), mean = mean(population$value, na.rm = TRUE), sd = sd(population$value, na.rm = TRUE))`, na.rm = TRUE))
## [1] -0.6490457
library(e1071)
with(population, kurtosis(value, na.rm = TRUE))
## [1] 7.513754