Centroamérica y el Caribe
A continuación, en el siguiente informe se organizarán y analizarán datos correspondientes a deslizamientos en una muestra de 15 países de Centroamérica y Norteamérica. Lo anterior se realiza teniendo en cuenta estados y ciudades de los países tomados como muestra. De igual manera, dentro de esteinforme se hace uso de diferentes métodos estadísticos que brindan un análisis de manera general y a detalle de los deslizamientos y la relación con sus distancias, fechas, etc. Por último, como objetivo se busca apreciar la utilidad de las estadisticas para acciones como organizar, categorizar y graficar datos de manera precisa y significativa.
Republica Dominicana
library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
library(readr)
library(knitr)
df_DO <- subset (df, Country == "Dominican Republic")
knitr::kable(head(df_DO))
df_DO %>%
select(Country, State, City, Distance, Date)
## Country State City Distance
## 15 Dominican Republic Distrito Nacional San Carlos 1.70298
## 52 Dominican Republic San Cristóbal Bajos de Haina 1.72138
## 58 Dominican Republic La Vega RÃo Verde Abajo 3.72637
## 64 Dominican Republic Santiago Santiago de los Caballeros 1.10868
## 132 Dominican Republic Hato Mayor Sabana de La Mar 0.75284
## 138 Dominican Republic Distrito Nacional La Agustina 5.71058
## 178 Dominican Republic Santiago Pedro GarcÃa 4.86398
## 211 Dominican Republic Puerto Plata Altamira 0.88500
## 212 Dominican Republic Santiago Tamboril 4.31327
## 750 Dominican Republic Santiago San José de Las Matas 2.72462
## 774 Dominican Republic Distrito Nacional Santo Domingo 0.55721
## 833 Dominican Republic La Vega Constanza 0.52969
## 923 Dominican Republic Puerto Plata Puerto Plata 1.19636
## 1394 Dominican Republic Santo Domingo Santo Domingo Este 3.98059
## 1395 Dominican Republic Puerto Plata Luperón 1.54885
## Date
## 15 7/13/07
## 52 10/29/07
## 58 11/1/07
## 64 12/11/07
## 132 8/17/08
## 138 8/26/08
## 178 2/12/09
## 211 9/20/09
## 212 9/20/09
## 750 6/3/11
## 774 7/6/11
## 833 11/18/11
## 923 12/5/12
## 1394 8/3/14
## 1395 11/7/14
Deslizamientos por estado
library(ggplot2)
ggplot(data=df_DO, aes(x = "Dominican Republic", y = Distance, fill=State)) +
geom_bar(stat = "identity", width = 1, color = "black") +
coord_polar("y", start = 0)

ggplot(data=df_DO, aes(fill=State, y=Distance, x="Dominican Republic")) +
geom_bar(position="dodge", stat="identity")

Distrito Nacional
Deslizamientos de las ciudades de Distrito Nacional
library(readr)
library(knitr)
df_DN <- subset (df, State == "Distrito Nacional")
df_DN %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 15 Dominican Republic Distrito Nacional San Carlos 1.70298 7/13/07
## 138 Dominican Republic Distrito Nacional La Agustina 5.71058 8/26/08
## 774 Dominican Republic Distrito Nacional Santo Domingo 0.55721 7/6/11
ggplot(data=df_DN, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_DN,aes(x="Distrito Nacional",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
## Warning: package 'qcc' was built under R version 4.1.1
## Package 'qcc' version 2.7
## Type 'citation("qcc")' for citing this R package in publications.
Distance <- df_DN$Distance
names(Distance) <- df_DN$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## La Agustina 5.710580 5.710580 71.644019 71.644019
## San Carlos 1.702980 7.413560 21.365314 93.009333
## Santo Domingo 0.557210 7.970770 6.990667 100.000000
Diagrama de tallo y hojas
stem(df_DN$"Distance")
##
## The decimal point is at the |
##
## 0 | 67
## 2 |
## 4 | 7
stem(df_DN$"Distance")
##
## The decimal point is at the |
##
## 0 | 67
## 2 |
## 4 | 7
stem(df_DN$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 0 | 6
## 1 | 7
## 2 |
## 3 |
## 4 |
## 5 | 7
Series temporales
library(forecast)
## Warning: package 'forecast' was built under R version 4.1.1
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
data_serie<- ts(df_DN$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar
## 2007 1.70298 5.71058 0.55721
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
## Warning: package 'questionr' was built under R version 4.1.1
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.55721 |
1 |
33.3 |
33.3 |
33.3 |
33.3 |
| 1.70298 |
1 |
33.3 |
33.3 |
66.7 |
66.7 |
| 5.71058 |
1 |
33.3 |
33.3 |
100.0 |
100.0 |
| Total |
3 |
100.0 |
100.0 |
100.0 |
100.0 |
str(table)
## Classes 'freqtab' and 'data.frame': 4 obs. of 5 variables:
## $ n : num 1 1 1 3
## $ % : num 33.3 33.3 33.3 100
## $ val% : num 33.3 33.3 33.3 100
## $ %cum : num 33.3 66.7 100 100
## $ val%cum: num 33.3 66.7 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 0.55721 |
1 |
| 1.70298 |
1 |
| 5.71058 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.55721 2.55721 4.55721 6.55721
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.55721 |
1 |
0.3333333 |
1 |
| 1.70298 |
1 |
0.3333333 |
2 |
| 5.71058 |
1 |
0.3333333 |
3 |
str(Freq_table)
## 'data.frame': 3 obs. of 4 variables:
## $ Distance: Factor w/ 3 levels "0.55721","1.70298",..: 1 2 3
## $ Freq : int 1 1 1
## $ Rel_Freq: num 0.333 0.333 0.333
## $ Cum_Freq: int 1 2 3
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 0.55721 |
1 |
| 1.70298 |
1 |
| 5.71058 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_DN$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5572 1.1301 1.7030 2.6569 3.7068 5.7106
library(pastecs)
## Warning: package 'pastecs' was built under R version 4.1.1
##
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
##
## first, last
stat.desc(df_DN)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 3.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 1.240000e+02 NA NA NA NA NA NA
## max 3.736000e+03 NA NA NA NA NA NA
## range 3.612000e+03 NA NA NA NA NA NA
## sum 4.606000e+03 NA NA NA NA NA NA
## median 7.460000e+02 NA NA NA NA NA NA
## mean 1.535333e+03 NA NA NA NA NA NA
## SE.mean 1.114887e+03 NA NA NA NA NA NA
## CI.mean.0.95 4.796973e+03 NA NA NA NA NA NA
## var 3.728921e+06 NA NA NA NA NA NA
## std.dev 1.931042e+03 NA NA NA NA NA NA
## coef.var 1.257734e+00 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 3.000000e+00 NA 3.000000 NA 3.000000000
## nbr.null 0.000000e+00 NA 0.000000 NA 0.000000000
## nbr.na 0.000000e+00 NA 0.000000 NA 0.000000000
## min 1.045700e+04 NA 0.557210 NA 18.475700000
## max 2.201941e+06 NA 5.710580 NA 18.550000000
## range 2.191484e+06 NA 5.153370 NA 0.074300000
## sum 2.225854e+06 NA 7.970770 NA 55.525700000
## median 1.345600e+04 NA 1.702980 NA 18.500000000
## mean 7.419513e+05 NA 2.656923 NA 18.508566667
## SE.mean 7.299953e+05 NA 1.562243 NA 0.021872078
## CI.mean.0.95 3.140916e+06 NA 6.721790 NA 0.094107954
## var 1.598680e+12 NA 7.321812 NA 0.001435163
## std.dev 1.264389e+06 NA 2.705885 NA 0.037883550
## coef.var 1.704140e+00 NA 1.018428 NA 0.002046812
## longitude geolocation hazard_type landslide_type
## nbr.val 3.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -6.998330e+01 NA NA NA
## max -6.991400e+01 NA NA NA
## range 6.930000e-02 NA NA NA
## sum -2.098173e+02 NA NA NA
## median -6.992000e+01 NA NA NA
## mean -6.993910e+01 NA NA NA
## SE.mean 2.216777e-02 NA NA NA
## CI.mean.0.95 9.538021e-02 NA NA NA
## var 1.474230e-03 NA NA NA
## std.dev 3.839570e-02 NA NA NA
## coef.var -5.489877e-04 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 0 2.000000 NA
## nbr.null NA NA NA 0 0.000000 NA
## nbr.na NA NA NA 3 1.000000 NA
## min NA NA NA Inf 1.000000 NA
## max NA NA NA -Inf 8.000000 NA
## range NA NA NA -Inf 7.000000 NA
## sum NA NA NA 0 9.000000 NA
## median NA NA NA NA 4.500000 NA
## mean NA NA NA NaN 4.500000 NA
## SE.mean NA NA NA NA 3.500000 NA
## CI.mean.0.95 NA NA NA NaN 44.471717 NA
## var NA NA NA NA 24.500000 NA
## std.dev NA NA NA NA 4.949747 NA
## coef.var NA NA NA NA 1.099944 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.3 v stringr 1.4.0
## v tidyr 1.1.3 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x tidyr::extract() masks pastecs::extract()
## x dplyr::filter() masks stats::filter()
## x pastecs::first() masks dplyr::first()
## x dplyr::lag() masks stats::lag()
## x pastecs::last() masks dplyr::last()
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 4.1.1
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(viridis)
## Warning: package 'viridis' was built under R version 4.1.1
## Loading required package: viridisLite
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
La Vega
Deslizamientos de las ciudades de La Vega
library(readr)
library(knitr)
df_LV <- subset (df, State == "La Vega")
df_LV %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 58 Dominican Republic La Vega RÃo Verde Abajo 3.72637 11/1/07
## 833 Dominican Republic La Vega Constanza 0.52969 11/18/11
head(df_LV)
## id Date time Continent Country country_code State
## 58 343 11/1/07 <NA> Dominican Republic DO La Vega
## 833 4051 11/18/11 <NA> Dominican Republic DO La Vega
## population City Distance location_description latitude
## 58 3613 RÃo Verde Abajo 3.72637 19.3050
## 833 29481 Constanza 0.52969 18.9045
## longitude geolocation hazard_type landslide_type
## 58 -70.600 (19.305, -70.599999999999994) Landslide Complex
## 833 -70.744 (18.904499999999999, -70.744) Landslide Mudslide
## landslide_size trigger storm_name injuries fatalities
## 58 Large Tropical cyclone Tropical Storm Noel NA 68
## 833 Medium Downpour NA 0
## source_name
## 58 United Nations Development Programme - Relief Web
## 833
## source_link
## 58 http://www.reliefweb.int/rw/fullMaps_Am.nsf/luFullMap/CEB72F0756431A7CC125738D003E2EF4/$File/ifrc_TC_carib071108.pdf?OpenElement
## 833 http://www.dominicantoday.com/dr/local/2011/11/18/41684/Mudslides-halt-traffic-to-Constanza
ggplot(data=df_LV, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_LV,aes(x="La Vega",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_LV$Distance
names(Distance) <- df_LV$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## RÃo Verde Abajo 3.72637 3.72637 87.55445 87.55445
## Constanza 0.52969 4.25606 12.44555 100.00000
Diagrama de tallo y hojas
stem(df_LV$"Distance")
##
## The decimal point is at the |
##
## 0 | 5
## 1 |
## 2 |
## 3 | 7
stem(df_LV$"Distance")
##
## The decimal point is at the |
##
## 0 | 5
## 1 |
## 2 |
## 3 | 7
stem(df_LV$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 0 | 5
## 1 |
## 1 |
## 2 |
## 2 |
## 3 |
## 3 | 7
Series temporales
library(forecast)
data_serie<- ts(df_LV$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb
## 2007 3.72637 0.52969
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.52969 |
1 |
50 |
50 |
50 |
50 |
| 3.72637 |
1 |
50 |
50 |
100 |
100 |
| Total |
2 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 3 obs. of 5 variables:
## $ n : num 1 1 2
## $ % : num 50 50 100
## $ val% : num 50 50 100
## $ %cum : num 50 100 100
## $ val%cum: num 50 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.52969 2.52969 4.52969
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.52969 |
1 |
0.5 |
1 |
| 3.72637 |
1 |
0.5 |
2 |
str(Freq_table)
## 'data.frame': 2 obs. of 4 variables:
## $ Distance: Factor w/ 2 levels "0.52969","3.72637": 1 2
## $ Freq : int 1 1
## $ Rel_Freq: num 0.5 0.5
## $ Cum_Freq: int 1 2
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_LV$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5297 1.3289 2.1280 2.1280 2.9272 3.7264
library(pastecs)
stat.desc(df_LV)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 2.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 3.430000e+02 NA NA NA NA NA NA
## max 4.051000e+03 NA NA NA NA NA NA
## range 3.708000e+03 NA NA NA NA NA NA
## sum 4.394000e+03 NA NA NA NA NA NA
## median 2.197000e+03 NA NA NA NA NA NA
## mean 2.197000e+03 NA NA NA NA NA NA
## SE.mean 1.854000e+03 NA NA NA NA NA NA
## CI.mean.0.95 2.355730e+04 NA NA NA NA NA NA
## var 6.874632e+06 NA NA NA NA NA NA
## std.dev 2.621952e+03 NA NA NA NA NA NA
## coef.var 1.193424e+00 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 2.000000e+00 NA 2.000000 NA 2.00000000
## nbr.null 0.000000e+00 NA 0.000000 NA 0.00000000
## nbr.na 0.000000e+00 NA 0.000000 NA 0.00000000
## min 3.613000e+03 NA 0.529690 NA 18.90450000
## max 2.948100e+04 NA 3.726370 NA 19.30500000
## range 2.586800e+04 NA 3.196680 NA 0.40050000
## sum 3.309400e+04 NA 4.256060 NA 38.20950000
## median 1.654700e+04 NA 2.128030 NA 19.10475000
## mean 1.654700e+04 NA 2.128030 NA 19.10475000
## SE.mean 1.293400e+04 NA 1.598340 NA 0.20025000
## CI.mean.0.95 1.643421e+05 NA 20.308835 NA 2.54441750
## var 3.345767e+08 NA 5.109382 NA 0.08020013
## std.dev 1.829144e+04 NA 2.260394 NA 0.28319627
## coef.var 1.105423e+00 NA 1.062200 NA 0.01482334
## longitude geolocation hazard_type landslide_type
## nbr.val 2.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -7.074400e+01 NA NA NA
## max -7.060000e+01 NA NA NA
## range 1.440000e-01 NA NA NA
## sum -1.413440e+02 NA NA NA
## median -7.067200e+01 NA NA NA
## mean -7.067200e+01 NA NA NA
## SE.mean 7.200000e-02 NA NA NA
## CI.mean.0.95 9.148467e-01 NA NA NA
## var 1.036800e-02 NA NA NA
## std.dev 1.018234e-01 NA NA NA
## coef.var -1.440788e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 0 2.000000 NA
## nbr.null NA NA NA 0 1.000000 NA
## nbr.na NA NA NA 2 0.000000 NA
## min NA NA NA Inf 0.000000 NA
## max NA NA NA -Inf 68.000000 NA
## range NA NA NA -Inf 68.000000 NA
## sum NA NA NA 0 68.000000 NA
## median NA NA NA NA 34.000000 NA
## mean NA NA NA NaN 34.000000 NA
## SE.mean NA NA NA NA 34.000000 NA
## CI.mean.0.95 NA NA NA NaN 432.010961 NA
## var NA NA NA NA 2312.000000 NA
## std.dev NA NA NA NA 48.083261 NA
## coef.var NA NA NA NA 1.414214 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Santiago
Deslizamiento en las ciudades de Santiago
library(readr)
library(knitr)
df_St <- subset (df, State == "Santiago")
df_St %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 64 Dominican Republic Santiago Santiago de los Caballeros 1.10868 12/11/07
## 178 Dominican Republic Santiago Pedro GarcÃa 4.86398 2/12/09
## 212 Dominican Republic Santiago Tamboril 4.31327 9/20/09
## 750 Dominican Republic Santiago San José de Las Matas 2.72462 6/3/11
head(df_St)
## id Date time Continent Country country_code State
## 64 388 12/11/07 <NA> Dominican Republic DO Santiago
## 178 984 2/12/09 <NA> Dominican Republic DO Santiago
## 212 1178 9/20/09 <NA> Dominican Republic DO Santiago
## 750 3569 6/3/11 <NA> Dominican Republic DO Santiago
## population City Distance location_description
## 64 1200000 Santiago de los Caballeros 1.10868
## 178 1457 Pedro GarcÃa 4.86398
## 212 23304 Tamboril 4.31327
## 750 9853 San José de Las Matas 2.72462
## latitude longitude geolocation hazard_type
## 64 19.4550 -70.7070 (19.454999999999998, -70.706999999999994) Landslide
## 178 19.5500 -70.6390 (19.55, -70.638999999999996) Landslide
## 212 19.5167 -70.5866 (19.5167, -70.586600000000004) Landslide
## 750 19.3556 -70.9189 (19.355599999999999, -70.918899999999994) Landslide
## landslide_type landslide_size trigger storm_name injuries
## 64 Landslide Medium Tropical cyclone Tropical Storm Olga NA
## 178 Mudslide Medium Downpour NA
## 212 Landslide Small Downpour NA
## 750 Landslide Medium Downpour NA
## fatalities source_name
## 64 17 news.gossip.info
## 178 0
## 212 NA
## 750 1
## source_link
## 64 http://clutchmagonline.com/newsgossipinfo/caribbean-storm-death-toll-rises/
## 178 http://us.puerto-plata-live.com/puerto-plata/news/year-2009/february-2009.html
## 212 http://www.laht.com/article.asp?CategoryId=14092&ArticleId=327347
## 750 http://english.peopledaily.com.cn/90001/90777/90852/7402423.html
ggplot(data=df_St, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_St,aes(x="Santiago",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_St$Distance
names(Distance) <- df_St$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Pedro GarcÃa 4.863980 4.863980 37.384891 37.384891
## Tamboril 4.313270 9.177250 33.152096 70.536987
## San José de Las Matas 2.724620 11.901870 20.941620 91.478608
## Santiago de los Caballeros 1.108680 13.010550 8.521392 100.000000
Diagrama de tallo y hojas
stem(df_St$"Distance")
##
## The decimal point is at the |
##
## 1 | 1
## 2 | 7
## 3 |
## 4 | 39
stem(df_St$"Distance")
##
## The decimal point is at the |
##
## 1 | 1
## 2 | 7
## 3 |
## 4 | 39
stem(df_St$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 1 | 1
## 1 |
## 2 |
## 2 | 7
## 3 |
## 3 |
## 4 | 3
## 4 | 9
Series temporales
library(forecast)
data_serie<- ts(df_St$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr
## 2007 1.10868 4.86398 4.31327 2.72462
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 1.10868 |
1 |
25 |
25 |
25 |
25 |
| 2.72462 |
1 |
25 |
25 |
50 |
50 |
| 4.31327 |
1 |
25 |
25 |
75 |
75 |
| 4.86398 |
1 |
25 |
25 |
100 |
100 |
| Total |
4 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 5 obs. of 5 variables:
## $ n : num 1 1 1 1 4
## $ % : num 25 25 25 25 100
## $ val% : num 25 25 25 25 100
## $ %cum : num 25 50 75 100 100
## $ val%cum: num 25 50 75 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 1.10868 |
1 |
| 2.72462 |
1 |
| 4.31327 |
1 |
| 4.86398 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 1.10868 3.10868 5.10868
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 1.10868 |
1 |
0.25 |
1 |
| 2.72462 |
1 |
0.25 |
2 |
| 4.31327 |
1 |
0.25 |
3 |
| 4.86398 |
1 |
0.25 |
4 |
str(Freq_table)
## 'data.frame': 4 obs. of 4 variables:
## $ Distance: Factor w/ 4 levels "1.10868","2.72462",..: 1 2 3 4
## $ Freq : int 1 1 1 1
## $ Rel_Freq: num 0.25 0.25 0.25 0.25
## $ Cum_Freq: int 1 2 3 4
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 1.10868 |
1 |
| 2.72462 |
1 |
| 4.31327 |
1 |
| 4.86398 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_St$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.109 2.321 3.519 3.253 4.451 4.864
library(pastecs)
stat.desc(df_St)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 4.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 3.880000e+02 NA NA NA NA NA NA
## max 3.569000e+03 NA NA NA NA NA NA
## range 3.181000e+03 NA NA NA NA NA NA
## sum 6.119000e+03 NA NA NA NA NA NA
## median 1.081000e+03 NA NA NA NA NA NA
## mean 1.529750e+03 NA NA NA NA NA NA
## SE.mean 7.002205e+02 NA NA NA NA NA NA
## CI.mean.0.95 2.228414e+03 NA NA NA NA NA NA
## var 1.961235e+06 NA NA NA NA NA NA
## std.dev 1.400441e+03 NA NA NA NA NA NA
## coef.var 9.154705e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 4.000000e+00 NA 4.0000000 NA 4.000000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.000000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.000000000
## min 1.457000e+03 NA 1.1086800 NA 19.355600000
## max 1.200000e+06 NA 4.8639800 NA 19.550000000
## range 1.198543e+06 NA 3.7553000 NA 0.194400000
## sum 1.234614e+06 NA 13.0105500 NA 77.877300000
## median 1.657850e+04 NA 3.5189450 NA 19.485850000
## mean 3.086535e+05 NA 3.2526375 NA 19.469325000
## SE.mean 2.971496e+05 NA 0.8464003 NA 0.042711657
## CI.mean.0.95 9.456625e+05 NA 2.6936236 NA 0.135927554
## var 3.531914e+11 NA 2.8655741 NA 0.007297143
## std.dev 5.942991e+05 NA 1.6928007 NA 0.085423314
## coef.var 1.925457e+00 NA 0.5204394 NA 0.004387585
## longitude geolocation hazard_type landslide_type
## nbr.val 4.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -7.091890e+01 NA NA NA
## max -7.058660e+01 NA NA NA
## range 3.323000e-01 NA NA NA
## sum -2.828515e+02 NA NA NA
## median -7.067300e+01 NA NA NA
## mean -7.071287e+01 NA NA NA
## SE.mean 7.296329e-02 NA NA NA
## CI.mean.0.95 2.322018e-01 NA NA NA
## var 2.129457e-02 NA NA NA
## std.dev 1.459266e-01 NA NA NA
## coef.var -2.063649e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 0 3.000000 NA
## nbr.null NA NA NA 0 1.000000 NA
## nbr.na NA NA NA 4 1.000000 NA
## min NA NA NA Inf 0.000000 NA
## max NA NA NA -Inf 17.000000 NA
## range NA NA NA -Inf 17.000000 NA
## sum NA NA NA 0 18.000000 NA
## median NA NA NA NA 1.000000 NA
## mean NA NA NA NaN 6.000000 NA
## SE.mean NA NA NA NA 5.507571 NA
## CI.mean.0.95 NA NA NA NaN 23.697163 NA
## var NA NA NA NA 91.000000 NA
## std.dev NA NA NA NA 9.539392 NA
## coef.var NA NA NA NA 1.589899 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Hato Mayor
Deslizamiento en las ciudades de Hato Mayor
library(readr)
library(knitr)
df_HM <- subset (df, State == "Hato Mayor")
df_HM %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 132 Dominican Republic Hato Mayor Sabana de La Mar 0.75284 8/17/08
head(df_HM)
## id Date time Continent Country country_code State
## 132 724 8/17/08 <NA> Dominican Republic DO Hato Mayor
## population City Distance location_description latitude
## 132 13977 Sabana de La Mar 0.75284 19.056
## longitude geolocation hazard_type
## 132 -69.3822 (19.056000000000001, -69.382199999999997) Landslide
## landslide_type landslide_size trigger storm_name injuries
## 132 Complex Medium Tropical cyclone Tropical Storm Fay NA
## fatalities source_name
## 132 NA
## source_link
## 132 http://www.dominicantoday.com/dr/economy/2008/8/18/29085/Storms-downpours-block-transit-on-newest-Dominican-highway
ggplot(data=df_HM, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_HM,aes(x="Hato Mayor",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_HM$Distance
names(Distance) <- df_HM$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Sabana de La Mar 0.75284 0.75284 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_HM$"Distance")
stem(df_HM$"Distance")
stem(df_HM$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_HM$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 0.75284
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.75284 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
str(Freq_table)
## 'data.frame': 4 obs. of 4 variables:
## $ Distance: Factor w/ 4 levels "1.10868","2.72462",..: 1 2 3 4
## $ Freq : int 1 1 1 1
## $ Rel_Freq: num 0.25 0.25 0.25 0.25
## $ Cum_Freq: int 1 2 3 4
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 1.10868 |
1 |
| 2.72462 |
1 |
| 4.31327 |
1 |
| 4.86398 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
head(
summary(df_HM$Distance))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.75284 0.75284 0.75284 0.75284 0.75284 0.75284
library(pastecs)
stat.desc(df_HM)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population City
## nbr.val 1 NA NA NA NA NA NA 1 NA
## nbr.null 0 NA NA NA NA NA NA 0 NA
## nbr.na 0 NA NA NA NA NA NA 0 NA
## min 724 NA NA NA NA NA NA 13977 NA
## max 724 NA NA NA NA NA NA 13977 NA
## range 0 NA NA NA NA NA NA 0 NA
## sum 724 NA NA NA NA NA NA 13977 NA
## median 724 NA NA NA NA NA NA 13977 NA
## mean 724 NA NA NA NA NA NA 13977 NA
## SE.mean NA NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN NA
## var NA NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA NA
## Distance location_description latitude longitude geolocation
## nbr.val 1.00000 NA 1.000 1.0000 NA
## nbr.null 0.00000 NA 0.000 0.0000 NA
## nbr.na 0.00000 NA 0.000 0.0000 NA
## min 0.75284 NA 19.056 -69.3822 NA
## max 0.75284 NA 19.056 -69.3822 NA
## range 0.00000 NA 0.000 0.0000 NA
## sum 0.75284 NA 19.056 -69.3822 NA
## median 0.75284 NA 19.056 -69.3822 NA
## mean 0.75284 NA 19.056 -69.3822 NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NaN NA NaN NaN NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 0 0 NA NA
## nbr.null 0 0 NA NA
## nbr.na 1 1 NA NA
## min Inf Inf NA NA
## max -Inf -Inf NA NA
## range -Inf -Inf NA NA
## sum 0 0 NA NA
## median NA NA NA NA
## mean NaN NaN NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
Santo Domingo
Deslizamiento en las ciudades de Santo Domingo
library(readr)
library(knitr)
df_SD <- subset (df, State == "Santo Domingo")
df_SD %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 1394 Dominican Republic Santo Domingo Santo Domingo Este 3.98059 8/3/14
head(df_SD)
## id Date time Continent Country country_code State
## 1394 6706 8/3/14 <NA> Dominican Republic DO Santo Domingo
## population City Distance location_description latitude
## 1394 0 Santo Domingo Este 3.98059 Urban area 18.5225
## longitude geolocation hazard_type
## 1394 -69.8693 (18.522500000000001, -69.869299999999996) Landslide
## landslide_type landslide_size trigger storm_name injuries
## 1394 Landslide Medium Tropical cyclone Bertha 0
## fatalities source_name
## 1394 0 Zona Oriental
## source_link
## 1394 http://www.delazonaoriental.net/2014/08/03/derrumbes-y-deslizamientos-de-tierra-afectan-varias-viviendas-en-la-barquita-tras-paso-tormenta-bertha/
ggplot(data=df_SD, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_SD,aes(x="Santo Domingo",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_SD$Distance
names(Distance) <- df_SD$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Santo Domingo Este 3.98059 3.98059 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_SD$"Distance")
head(df_SD)
## id Date time Continent Country country_code State
## 1394 6706 8/3/14 <NA> Dominican Republic DO Santo Domingo
## population City Distance location_description latitude
## 1394 0 Santo Domingo Este 3.98059 Urban area 18.5225
## longitude geolocation hazard_type
## 1394 -69.8693 (18.522500000000001, -69.869299999999996) Landslide
## landslide_type landslide_size trigger storm_name injuries
## 1394 Landslide Medium Tropical cyclone Bertha 0
## fatalities source_name
## 1394 0 Zona Oriental
## source_link
## 1394 http://www.delazonaoriental.net/2014/08/03/derrumbes-y-deslizamientos-de-tierra-afectan-varias-viviendas-en-la-barquita-tras-paso-tormenta-bertha/
stem(df_SD$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_SD$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 3.98059
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 3.98059 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 3.98059
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
str(Freq_table)
## 'data.frame': 1 obs. of 4 variables:
## $ Distance: Factor w/ 1 level "3.98059": 1
## $ Freq : int 1
## $ Rel_Freq: num 1
## $ Cum_Freq: int 1
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_SD$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.981 3.981 3.981 3.981 3.981 3.981
library(pastecs)
stat.desc(df_SD)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population
## nbr.val 1 NA NA NA NA NA NA 1
## nbr.null 0 NA NA NA NA NA NA 1
## nbr.na 0 NA NA NA NA NA NA 0
## min 6706 NA NA NA NA NA NA 0
## max 6706 NA NA NA NA NA NA 0
## range 0 NA NA NA NA NA NA 0
## sum 6706 NA NA NA NA NA NA 0
## median 6706 NA NA NA NA NA NA 0
## mean 6706 NA NA NA NA NA NA 0
## SE.mean NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN
## var NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA
## City Distance location_description latitude longitude geolocation
## nbr.val NA 1.00000 NA 1.0000 1.0000 NA
## nbr.null NA 0.00000 NA 0.0000 0.0000 NA
## nbr.na NA 0.00000 NA 0.0000 0.0000 NA
## min NA 3.98059 NA 18.5225 -69.8693 NA
## max NA 3.98059 NA 18.5225 -69.8693 NA
## range NA 0.00000 NA 0.0000 0.0000 NA
## sum NA 3.98059 NA 18.5225 -69.8693 NA
## median NA 3.98059 NA 18.5225 -69.8693 NA
## mean NA 3.98059 NA 18.5225 -69.8693 NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NaN NA NaN NaN NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 1 1 NA NA
## nbr.null 1 1 NA NA
## nbr.na 0 0 NA NA
## min 0 0 NA NA
## max 0 0 NA NA
## range 0 0 NA NA
## sum 0 0 NA NA
## median 0 0 NA NA
## mean 0 0 NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Puerto Plata
Deslizamiento en las ciudades de Puerto Plata
library(readr)
library(knitr)
df_PP <- subset (df, State == "Puerto Plata")
df_PP %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 211 Dominican Republic Puerto Plata Altamira 0.88500 9/20/09
## 923 Dominican Republic Puerto Plata Puerto Plata 1.19636 12/5/12
## 1395 Dominican Republic Puerto Plata Luperón 1.54885 11/7/14
head(df_PP)
## id Date time Continent Country country_code State
## 211 1177 9/20/09 <NA> Dominican Republic DO Puerto Plata
## 923 4655 12/5/12 <NA> Dominican Republic DO Puerto Plata
## 1395 6707 11/7/14 <NA> Dominican Republic DO Puerto Plata
## population City Distance location_description latitude longitude
## 211 4563 Altamira 0.88500 19.6750 -70.8362
## 923 146000 Puerto Plata 1.19636 19.7827 -70.6871
## 1395 4393 Luperón 1.54885 Below road 19.9053 -70.9630
## geolocation hazard_type landslide_type
## 211 (19.675000000000001, -70.836200000000005) Landslide Landslide
## 923 (19.782699999999998, -70.687100000000001) Landslide Landslide
## 1395 (19.9053, -70.962999999999994) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities source_name
## 211 Medium Downpour NA 2
## 923 Medium Rain NA NA
## 1395 Medium Rain 0 0 Hoy
## source_link
## 211 http://www.laht.com/article.asp?CategoryId=14092&ArticleId=327347
## 923 http://www.dominicantoday.com/dr/local/2012/12/5/45992/Crews-clear-Santiago-Puerto-Plata-road-blocked-by-landslides
## 1395 http://hoy.com.do/carretera-luperon-presenta-hundimientos-y-deslizamientos-por-lluvias/
ggplot(data=df_PP, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_PP,aes(x="Puerto Plata",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=6)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_PP$Distance
names(Distance) <- df_PP$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Luperón 1.54885 1.54885 42.66558 42.66558
## Puerto Plata 1.19636 2.74521 32.95567 75.62125
## Altamira 0.88500 3.63021 24.37875 100.00000
Diagrama de tallo y hojas
stem(df_PP$"Distance")
##
## The decimal point is 1 digit(s) to the left of the |
##
## 8 | 9
## 10 |
## 12 | 0
## 14 | 5
stem(df_PP$"Distance")
##
## The decimal point is 1 digit(s) to the left of the |
##
## 8 | 9
## 10 |
## 12 | 0
## 14 | 5
stem(df_PP$"Distance", scale = 2)
##
## The decimal point is 1 digit(s) to the left of the |
##
## 8 | 9
## 9 |
## 10 |
## 11 |
## 12 | 0
## 13 |
## 14 |
## 15 | 5
Series temporales
library(forecast)
data_serie<- ts(df_PP$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar
## 2007 0.88500 1.19636 1.54885
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.885 |
1 |
33.3 |
33.3 |
33.3 |
33.3 |
| 1.19636 |
1 |
33.3 |
33.3 |
66.7 |
66.7 |
| 1.54885 |
1 |
33.3 |
33.3 |
100.0 |
100.0 |
| Total |
3 |
100.0 |
100.0 |
100.0 |
100.0 |
str(table)
## Classes 'freqtab' and 'data.frame': 4 obs. of 5 variables:
## $ n : num 1 1 1 3
## $ % : num 33.3 33.3 33.3 100
## $ val% : num 33.3 33.3 33.3 100
## $ %cum : num 33.3 66.7 100 100
## $ val%cum: num 33.3 66.7 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 0.885 |
1 |
| 1.19636 |
1 |
| 1.54885 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.885 1.885
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.885 |
1 |
0.3333333 |
1 |
| 1.19636 |
1 |
0.3333333 |
2 |
| 1.54885 |
1 |
0.3333333 |
3 |
str(Freq_table)
## 'data.frame': 3 obs. of 4 variables:
## $ Distance: Factor w/ 3 levels "0.885","1.19636",..: 1 2 3
## $ Freq : int 1 1 1
## $ Rel_Freq: num 0.333 0.333 0.333
## $ Cum_Freq: int 1 2 3
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 0.885 |
1 |
| 1.19636 |
1 |
| 1.54885 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_PP$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.885 1.041 1.196 1.210 1.373 1.549
head(df_PP)
## id Date time Continent Country country_code State
## 211 1177 9/20/09 <NA> Dominican Republic DO Puerto Plata
## 923 4655 12/5/12 <NA> Dominican Republic DO Puerto Plata
## 1395 6707 11/7/14 <NA> Dominican Republic DO Puerto Plata
## population City Distance location_description latitude longitude
## 211 4563 Altamira 0.88500 19.6750 -70.8362
## 923 146000 Puerto Plata 1.19636 19.7827 -70.6871
## 1395 4393 Luperón 1.54885 Below road 19.9053 -70.9630
## geolocation hazard_type landslide_type
## 211 (19.675000000000001, -70.836200000000005) Landslide Landslide
## 923 (19.782699999999998, -70.687100000000001) Landslide Landslide
## 1395 (19.9053, -70.962999999999994) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities source_name
## 211 Medium Downpour NA 2
## 923 Medium Rain NA NA
## 1395 Medium Rain 0 0 Hoy
## source_link
## 211 http://www.laht.com/article.asp?CategoryId=14092&ArticleId=327347
## 923 http://www.dominicantoday.com/dr/local/2012/12/5/45992/Crews-clear-Santiago-Puerto-Plata-road-blocked-by-landslides
## 1395 http://hoy.com.do/carretera-luperon-presenta-hundimientos-y-deslizamientos-por-lluvias/
library(pastecs)
stat.desc(df_PP)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 3.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 1.177000e+03 NA NA NA NA NA NA
## max 6.707000e+03 NA NA NA NA NA NA
## range 5.530000e+03 NA NA NA NA NA NA
## sum 1.253900e+04 NA NA NA NA NA NA
## median 4.655000e+03 NA NA NA NA NA NA
## mean 4.179667e+03 NA NA NA NA NA NA
## SE.mean 1.613968e+03 NA NA NA NA NA NA
## CI.mean.0.95 6.944345e+03 NA NA NA NA NA NA
## var 7.814681e+06 NA NA NA NA NA NA
## std.dev 2.795475e+03 NA NA NA NA NA NA
## coef.var 6.688273e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 3.000000e+00 NA 3.0000000 NA 3.00000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.00000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.00000000
## min 4.393000e+03 NA 0.8850000 NA 19.67500000
## max 1.460000e+05 NA 1.5488500 NA 19.90530000
## range 1.416070e+05 NA 0.6638500 NA 0.23030000
## sum 1.549560e+05 NA 3.6302100 NA 59.36300000
## median 4.563000e+03 NA 1.1963600 NA 19.78270000
## mean 5.165200e+04 NA 1.2100700 NA 19.78766667
## SE.mean 4.717403e+04 NA 0.1917596 NA 0.06652825
## CI.mean.0.95 2.029734e+05 NA 0.8250748 NA 0.28624795
## var 6.676166e+09 NA 0.1103152 NA 0.01327802
## std.dev 8.170781e+04 NA 0.3321373 NA 0.11523031
## coef.var 1.581891e+00 NA 0.2744777 NA 0.00582334
## longitude geolocation hazard_type landslide_type
## nbr.val 3.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -7.096300e+01 NA NA NA
## max -7.068710e+01 NA NA NA
## range 2.759000e-01 NA NA NA
## sum -2.124863e+02 NA NA NA
## median -7.083620e+01 NA NA NA
## mean -7.082877e+01 NA NA NA
## SE.mean 7.973214e-02 NA NA NA
## CI.mean.0.95 3.430597e-01 NA NA NA
## var 1.907164e-02 NA NA NA
## std.dev 1.381001e-01 NA NA NA
## coef.var -1.949774e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 1 2.000000 NA
## nbr.null NA NA NA 1 1.000000 NA
## nbr.na NA NA NA 2 1.000000 NA
## min NA NA NA 0 0.000000 NA
## max NA NA NA 0 2.000000 NA
## range NA NA NA 0 2.000000 NA
## sum NA NA NA 0 2.000000 NA
## median NA NA NA 0 1.000000 NA
## mean NA NA NA 0 1.000000 NA
## SE.mean NA NA NA NA 1.000000 NA
## CI.mean.0.95 NA NA NA NaN 12.706205 NA
## var NA NA NA NA 2.000000 NA
## std.dev NA NA NA NA 1.414214 NA
## coef.var NA NA NA NA 1.414214 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
Cuba
library(readr)
library(knitr)
df_CU <- subset (df, Country == "Cuba")
knitr::kable(head(df_CU))
df_CU %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 483 Cuba Provincia de La Habana Cerro 0.89865 10/18/10
## 515 Cuba Guantanamo Baracoa 10.45795 11/7/10
## 1031 Cuba Artemisa Province Soroa 11.87914 7/9/13
head(df_CU)
## id Date time Continent Country country_code State
## 483 2611 10/18/10 <NA> Cuba CU Provincia de La Habana
## 515 2706 11/7/10 <NA> Cuba CU Guantanamo
## 1031 5067 7/9/13 <NA> Cuba CU Artemisa Province
## population City Distance location_description latitude longitude
## 483 132351 Cerro 0.89865 23.1098 -82.3691
## 515 48362 Baracoa 10.45795 20.2526 -74.4867
## 1031 7205 Soroa 11.87914 22.7943 -83.1322
## geolocation hazard_type landslide_type
## 483 (23.1098, -82.369100000000003) Landslide Complex
## 515 (20.252600000000001, -74.486699999999999) Landslide Landslide
## 1031 (22.7943, -83.132199999999997) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 483 Medium Tropical cyclone Tropical Storm Paula NA 0
## 515 Medium Tropical cyclone Hurricane Tomas NA 0
## 1031 Medium Downpour NA 0
## source_name
## 483
## 515
## 1031 www.havanatimes.org
## source_link
## 483 http://www.reliefweb.int/rw/RWFiles2010.nsf/FilesByRWDocUnidFilename/VDUX-8ADM53-full_report.pdf/$File/full_report.pdf
## 515 http://www.solvision.co.cu/english/index.php?option=com_content&view=article&id=1631:viaduct-la-farola-in-baracoa-traffic-restored&catid=34:portada&Itemid=171
## 1031 http://www.havanatimes.org/?p=96131
Deslizamentos por estado
library(ggplot2)
ggplot(data=df_CU, aes(x = "Cuba", y = Distance, fill=State)) +
geom_bar(stat = "identity", width = 1, color = "black") +
coord_polar("y", start = 0)

ggplot(data=df_CU, aes(fill=State, y=Distance, x="Cuba")) +
geom_bar(position="dodge", stat="identity")

Artemisa Province
Deslizamientos de las ciudades de Artemisa Province
library(readr)
library(knitr)
df_AP <- subset (df, State == "Artemisa Province")
df_AP %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 1031 Cuba Artemisa Province Soroa 11.87914 7/9/13
head(df_AP)
## id Date time Continent Country country_code State
## 1031 5067 7/9/13 <NA> Cuba CU Artemisa Province
## population City Distance location_description latitude longitude
## 1031 7205 Soroa 11.87914 22.7943 -83.1322
## geolocation hazard_type landslide_type landslide_size
## 1031 (22.7943, -83.132199999999997) Landslide Landslide Medium
## trigger storm_name injuries fatalities source_name
## 1031 Downpour NA 0 www.havanatimes.org
## source_link
## 1031 http://www.havanatimes.org/?p=96131
ggplot(data=df_AP, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_AP,aes(x="Distrito Nacional",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=6)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_AP$Distance
names(Distance) <- df_AP$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Soroa 11.87914 11.87914 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_AP$"Distance")
stem(df_AP$"Distance")
stem(df_AP$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_AP$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 11.87914
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 11.87914 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 11.87914
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
str(Freq_table)
## 'data.frame': 1 obs. of 4 variables:
## $ Distance: Factor w/ 1 level "11.87914": 1
## $ Freq : int 1
## $ Rel_Freq: num 1
## $ Cum_Freq: int 1
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_AP$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.88 11.88 11.88 11.88 11.88 11.88
library(pastecs)
stat.desc(df_AP)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population
## nbr.val 1 NA NA NA NA NA NA 1
## nbr.null 0 NA NA NA NA NA NA 0
## nbr.na 0 NA NA NA NA NA NA 0
## min 5067 NA NA NA NA NA NA 7205
## max 5067 NA NA NA NA NA NA 7205
## range 0 NA NA NA NA NA NA 0
## sum 5067 NA NA NA NA NA NA 7205
## median 5067 NA NA NA NA NA NA 7205
## mean 5067 NA NA NA NA NA NA 7205
## SE.mean NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN
## var NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA
## City Distance location_description latitude longitude geolocation
## nbr.val NA 1.00000 NA 1.0000 1.0000 NA
## nbr.null NA 0.00000 NA 0.0000 0.0000 NA
## nbr.na NA 0.00000 NA 0.0000 0.0000 NA
## min NA 11.87914 NA 22.7943 -83.1322 NA
## max NA 11.87914 NA 22.7943 -83.1322 NA
## range NA 0.00000 NA 0.0000 0.0000 NA
## sum NA 11.87914 NA 22.7943 -83.1322 NA
## median NA 11.87914 NA 22.7943 -83.1322 NA
## mean NA 11.87914 NA 22.7943 -83.1322 NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NaN NA NaN NaN NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 0 1 NA NA
## nbr.null 0 1 NA NA
## nbr.na 1 0 NA NA
## min Inf 0 NA NA
## max -Inf 0 NA NA
## range -Inf 0 NA NA
## sum 0 0 NA NA
## median NA 0 NA NA
## mean NaN 0 NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
Provincia de La Habana
Deslizamiento en las ciudades de Provincia de La Habana
library(readr)
library(knitr)
df_HA <- subset (df, State == "Provincia de La Habana")
df_HA %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 483 Cuba Provincia de La Habana Cerro 0.89865 10/18/10
head(df_HA)
## id Date time Continent Country country_code State
## 483 2611 10/18/10 <NA> Cuba CU Provincia de La Habana
## population City Distance location_description latitude longitude
## 483 132351 Cerro 0.89865 23.1098 -82.3691
## geolocation hazard_type landslide_type landslide_size
## 483 (23.1098, -82.369100000000003) Landslide Complex Medium
## trigger storm_name injuries fatalities source_name
## 483 Tropical cyclone Tropical Storm Paula NA 0
## source_link
## 483 http://www.reliefweb.int/rw/RWFiles2010.nsf/FilesByRWDocUnidFilename/VDUX-8ADM53-full_report.pdf/$File/full_report.pdf
ggplot(data=df_HA, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_HA,aes(x="Provincia de La Habana",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_HA$Distance
names(Distance) <- df_HA$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Cerro 0.89865 0.89865 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_HA$"Distance")
stem(df_HA$"Distance")
stem(df_HA$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_HA$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 0.89865
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.89865 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
str(Freq_table)
## 'data.frame': 1 obs. of 4 variables:
## $ Distance: Factor w/ 1 level "11.87914": 1
## $ Freq : int 1
## $ Rel_Freq: num 1
## $ Cum_Freq: int 1
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_HA$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8986 0.8986 0.8986 0.8986 0.8986 0.8986
library(pastecs)
stat.desc(df_HA)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population
## nbr.val 1 NA NA NA NA NA NA 1
## nbr.null 0 NA NA NA NA NA NA 0
## nbr.na 0 NA NA NA NA NA NA 0
## min 2611 NA NA NA NA NA NA 132351
## max 2611 NA NA NA NA NA NA 132351
## range 0 NA NA NA NA NA NA 0
## sum 2611 NA NA NA NA NA NA 132351
## median 2611 NA NA NA NA NA NA 132351
## mean 2611 NA NA NA NA NA NA 132351
## SE.mean NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN
## var NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA
## City Distance location_description latitude longitude geolocation
## nbr.val NA 1.00000 NA 1.0000 1.0000 NA
## nbr.null NA 0.00000 NA 0.0000 0.0000 NA
## nbr.na NA 0.00000 NA 0.0000 0.0000 NA
## min NA 0.89865 NA 23.1098 -82.3691 NA
## max NA 0.89865 NA 23.1098 -82.3691 NA
## range NA 0.00000 NA 0.0000 0.0000 NA
## sum NA 0.89865 NA 23.1098 -82.3691 NA
## median NA 0.89865 NA 23.1098 -82.3691 NA
## mean NA 0.89865 NA 23.1098 -82.3691 NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NaN NA NaN NaN NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 0 1 NA NA
## nbr.null 0 1 NA NA
## nbr.na 1 0 NA NA
## min Inf 0 NA NA
## max -Inf 0 NA NA
## range -Inf 0 NA NA
## sum 0 0 NA NA
## median NA 0 NA NA
## mean NaN 0 NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Guantanamo
Deslizamiento en las ciudades de Guantanamo
library(readr)
library(knitr)
df_Gu <- subset (df, State == "Guantanamo")
df_Gu %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 515 Cuba Guantanamo Baracoa 10.45795 11/7/10
head(df_Gu)
## id Date time Continent Country country_code State population
## 515 2706 11/7/10 <NA> Cuba CU Guantanamo 48362
## City Distance location_description latitude longitude
## 515 Baracoa 10.45795 20.2526 -74.4867
## geolocation hazard_type landslide_type
## 515 (20.252600000000001, -74.486699999999999) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 515 Medium Tropical cyclone Hurricane Tomas NA 0
## source_name
## 515
## source_link
## 515 http://www.solvision.co.cu/english/index.php?option=com_content&view=article&id=1631:viaduct-la-farola-in-baracoa-traffic-restored&catid=34:portada&Itemid=171
ggplot(data=df_Gu, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_Gu,aes(x="Guantanamo",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_Gu$Distance
names(Distance) <- df_Gu$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Baracoa 10.45795 10.45795 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_Gu$"Distance")
stem(df_Gu$"Distance")
stem(df_Gu$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_Gu$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 10.45795
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 10.45795 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 10.45795
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
str(Freq_table)
## 'data.frame': 1 obs. of 4 variables:
## $ Distance: Factor w/ 1 level "10.45795": 1
## $ Freq : int 1
## $ Rel_Freq: num 1
## $ Cum_Freq: int 1
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_Gu$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.46 10.46 10.46 10.46 10.46 10.46
library(pastecs)
stat.desc(df_Gu)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population
## nbr.val 1 NA NA NA NA NA NA 1
## nbr.null 0 NA NA NA NA NA NA 0
## nbr.na 0 NA NA NA NA NA NA 0
## min 2706 NA NA NA NA NA NA 48362
## max 2706 NA NA NA NA NA NA 48362
## range 0 NA NA NA NA NA NA 0
## sum 2706 NA NA NA NA NA NA 48362
## median 2706 NA NA NA NA NA NA 48362
## mean 2706 NA NA NA NA NA NA 48362
## SE.mean NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN
## var NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA
## City Distance location_description latitude longitude geolocation
## nbr.val NA 1.00000 NA 1.0000 1.0000 NA
## nbr.null NA 0.00000 NA 0.0000 0.0000 NA
## nbr.na NA 0.00000 NA 0.0000 0.0000 NA
## min NA 10.45795 NA 20.2526 -74.4867 NA
## max NA 10.45795 NA 20.2526 -74.4867 NA
## range NA 0.00000 NA 0.0000 0.0000 NA
## sum NA 10.45795 NA 20.2526 -74.4867 NA
## median NA 10.45795 NA 20.2526 -74.4867 NA
## mean NA 10.45795 NA 20.2526 -74.4867 NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NaN NA NaN NaN NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 0 1 NA NA
## nbr.null 0 1 NA NA
## nbr.na 1 0 NA NA
## min Inf 0 NA NA
## max -Inf 0 NA NA
## range -Inf 0 NA NA
## sum 0 0 NA NA
## median NA 0 NA NA
## mean NaN 0 NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Costa Rica
library(readr)
library(knitr)
df_CR <- subset (df, Country == "Costa Rica")
knitr::kable(head(df_CR, n=4))
df_CR %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 38 Costa Rica Heredia Heredia 0.26208 9/9/07
## 44 Costa Rica San José San Ignacio 4.57763 10/9/07
## 45 Costa Rica Alajuela Atenas 3.08459 10/11/07
## 46 Costa Rica San José 9.56251 10/11/07
## 51 Costa Rica Puntarenas Miramar 3.82425 10/24/07
## 102 Costa Rica Guanacaste Bagaces 17.65521 5/29/08
## 147 Costa Rica San José Daniel Flores 1.85787 9/6/08
## 153 Costa Rica San José San Isidro 16.24937 10/12/08
## 154 Costa Rica San José Santiago 12.85801 10/12/08
## 156 Costa Rica Puntarenas Golfito 11.74074 10/15/08
## 157 Costa Rica Puntarenas Miramar 8.92048 10/16/08
## 229 Costa Rica Puntarenas San Vito 18.00524 11/13/09
## 302 Costa Rica Alajuela Desamparados 6.88715 4/14/10
## 311 Costa Rica Heredia Ã\201ngeles 19.51432 4/27/10
## 347 Costa Rica Alajuela Desamparados 6.92174 5/22/10
## 395 Costa Rica Alajuela Desamparados 4.24199 7/30/10
## 459 Costa Rica Alajuela San Rafael 1.47396 9/29/10
## 469 Costa Rica San José Salitral 0.25254 10/1/10
## 470 Costa Rica San José Salitral 0.25254 10/1/10
## 480 Costa Rica Heredia Ã\201ngeles 14.81614 10/15/10
## 501 Costa Rica San José Escazú 3.67691 11/4/10
## 502 Costa Rica San José San Marcos 0.55804 11/4/10
## 503 Costa Rica Alajuela San Rafael 9.61692 11/4/10
## 504 Costa Rica Guanacaste Tilarán 10.21631 11/4/10
## 505 Costa Rica Cartago Orosà 19.28722 11/4/10
## 506 Costa Rica Puntarenas Golfito 7.87044 11/4/10
## 507 Costa Rica San José Tejar 6.49523 11/4/10
## 508 Costa Rica San José San Isidro 15.64997 11/4/10
## 509 Costa Rica Puntarenas Corredor 4.93053 11/4/10
## 510 Costa Rica Puntarenas Parrita 13.48919 11/4/10
## 511 Costa Rica Puntarenas Ciudad Cortés 20.06633 11/4/10
## 512 Costa Rica San José San Isidro 11.31047 11/4/10
## 513 Costa Rica San José Mercedes 8.21372 11/4/10
## 514 Costa Rica Alajuela Santiago 5.43516 11/5/10
## 529 Costa Rica Heredia Ã\201ngeles 19.54581 11/21/10
## 579 Costa Rica Limón Guápiles 17.23264 1/11/11
## 702 Costa Rica Heredia Ã\201ngeles 15.05161 5/8/11
## 780 Costa Rica Alajuela Upala 0.70048 7/12/11
## 819 Costa Rica San José San Isidro 21.67452 9/25/11
## 828 Costa Rica Cartago Cot 9.63616 10/31/11
## 884 Costa Rica Heredia Santo Domingo 21.95470 5/13/12
## 888 Costa Rica Guanacaste Tilarán 12.33807 5/31/12
## 889 Costa Rica Limón Siquirres 5.36500 6/14/12
## 913 Costa Rica San José Daniel Flores 4.89954 10/23/12
## 1098 Costa Rica Alajuela Sabanilla 4.87432 8/27/13
## 1156 Costa Rica Alajuela Sabanilla 10.32968 9/16/13
## 1157 Costa Rica Heredia Santo Domingo 9.85736 9/16/13
## 1169 Costa Rica Guanacaste Tilarán 12.21952 10/3/13
## 1173 Costa Rica Guanacaste Tilarán 12.18115 10/8/13
## 1289 Costa Rica Alajuela La Fortuna 9.84213 10/4/14
## 1301 Costa Rica Alajuela 5.57523 9/19/14
## 1308 Costa Rica Alajuela Desamparados 5.95519 11/1/14
## 1342 Costa Rica Alajuela Rio Segundo 11.96524 8/21/14
## 1364 Costa Rica Alajuela Desamparados 5.12667 8/10/14
## 1383 Costa Rica Cartago Cartago 3.07297 9/13/14
## 1384 Costa Rica Heredia Dulce Nombre de Jesus 10.01310 12/13/14
## 1385 Costa Rica San José Dulce Nombre de Jesus 2.92605 11/3/14
## 1386 Costa Rica San José San Isidro 10.73752 9/19/14
## 1404 Costa Rica San José San Isidro 22.32368 1/28/15
## 1406 Costa Rica San José Dulce Nombre de Jesus 8.39161 2/6/15
## 1461 Costa Rica Alajuela La Fortuna 5.96634 6/17/15
## 1475 Costa Rica Alajuela Atenas 6.80061 6/3/15
## 1528 Costa Rica San José Ã\201ngeles 9.53611 7/6/15
## 1529 Costa Rica San José Dulce Nombre de Jesus 3.71407 7/6/15
## 1600 Costa Rica San José San Juan 0.72957 10/29/15
## 1642 Costa Rica Alajuela Santo Domingo 3.21979 10/27/15
## 1643 Costa Rica Alajuela Alajuela 3.08916 11/18/15
## 1644 Costa Rica Alajuela Naranjo 2.08469 10/29/15
## 1646 Costa Rica Cartago 5.15142 10/15/15
## 1647 Costa Rica Cartago Cot 9.53493 3/20/15
## 1648 Costa Rica Cartago Cartago 2.94804 3/18/15
## 1649 Costa Rica Puntarenas Buenos Aires 0.35225 11/23/15
## 1650 Costa Rica San José San José 1.16705 9/25/15
## 1651 Costa Rica San José Mercedes 10.01198 11/5/15
## 1652 Costa Rica San José Santiago 8.27042 11/11/15
head(df_CR)
## id Date time Continent Country country_code State population
## 38 249 9/9/07 <NA> Costa Rica CR Heredia 21947
## 44 299 10/9/07 <NA> Costa Rica CR San José 3072
## 45 301 10/11/07 <NA> Costa Rica CR Alajuela 7014
## 46 302 10/11/07 <NA> Costa Rica CR San José 26669
## 51 323 10/24/07 <NA> Costa Rica CR Puntarenas 6540
## 102 556 5/29/08 <NA> Costa Rica CR Guanacaste 4108
## City Distance location_description latitude longitude
## 38 Heredia 0.26208 10.0000 -84.1167
## 44 San Ignacio 4.57763 9.7789 -84.1250
## 45 Atenas 3.08459 9.9869 -84.4070
## 46 9.56251 10.0214 -83.9451
## 51 Miramar 3.82425 Mine construction 10.0715 -84.7575
## 102 Bagaces 17.65521 10.4024 -85.3555
## geolocation hazard_type landslide_type
## 38 (10, -84.116699999999994) Landslide Landslide
## 44 (9.7789000000000001, -84.125) Landslide Complex
## 45 (9.9869000000000003, -84.406999999999996) Landslide Mudslide
## 46 (10.0214, -83.945099999999996) Landslide Landslide
## 51 (10.0715, -84.757499999999993) Landslide Mudslide
## 102 (10.4024, -85.355500000000006) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 38 Medium Rain NA NA
## 44 Medium Rain NA 4
## 45 Large Rain NA 14
## 46 Large Rain NA 10
## 51 Medium Downpour NA NA
## 102 Medium Tropical cyclone Tropical Storm Alma NA NA
## source_name
## 38 ticotimes.net
## 44 ticotimes.net
## 45 Agence France-Presse, afp.google.com
## 46 International Herald
## 51 Reuters - AlertNet.org
## 102
## source_link
## 38 http://www.ticotimes.net/dailyarchive/2007_09/0911072.htm
## 44 http://www.ticotimes.net/dailyarchive/2007_10/1010071.htm
## 45 http://afp.google.com/article/ALeqM5hu6a8oyAM1ycq9nU_6Zyj_l7F0AA
## 46 http://www.iht.com/articles/ap/2007/10/12/america/LA-GEN-Costa-Rica-Mudslide.php
## 51 http://www.reuters.com/article/companyNewsAndPR/idUSN2435152820071025
## 102 http://www.reliefweb.int/rw/RWB.NSF/db900SID/ASAZ-7FHCHL?OpenDocument
Deslizamentos por estado
library(ggplot2)
ggplot(data=df_CR, aes(x = "Costa Rica", y = Distance, fill=State)) +
geom_bar(stat = "identity", width = 1, color = "black") +
coord_polar("y", start = 0)

ggplot(data=df_CR, aes(fill=State, y=Distance, x="Costa Rica")) +
geom_bar(position="dodge", stat="identity")

Guanacaste
Deslizamientos de las ciudades de Guanacaste
library(readr)
library(knitr)
df_GU <- subset (df, State == "Guanacaste")
df_GU %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 102 Costa Rica Guanacaste Bagaces 17.65521 5/29/08
## 504 Costa Rica Guanacaste Tilarán 10.21631 11/4/10
## 888 Costa Rica Guanacaste Tilarán 12.33807 5/31/12
## 1169 Costa Rica Guanacaste Tilarán 12.21952 10/3/13
## 1173 Costa Rica Guanacaste Tilarán 12.18115 10/8/13
head(df_GU, n=4)
## id Date time Continent Country country_code State population
## 102 556 5/29/08 <NA> Costa Rica CR Guanacaste 4108
## 504 2683 11/4/10 <NA> Costa Rica CR Guanacaste 7301
## 888 4375 5/31/12 <NA> Costa Rica CR Guanacaste 7301
## 1169 5571 10/3/13 <NA> Costa Rica CR Guanacaste 7301
## City Distance location_description latitude longitude
## 102 Bagaces 17.65521 10.4024 -85.3555
## 504 Tilarán 10.21631 10.4548 -84.8751
## 888 Tilarán 12.33807 10.5562 -84.8952
## 1169 Tilarán 12.21952 10.5543 -84.8946
## geolocation hazard_type landslide_type
## 102 (10.4024, -85.355500000000006) Landslide Landslide
## 504 (10.454800000000001, -84.875100000000003) Landslide Landslide
## 888 (10.5562, -84.895200000000003) Landslide Landslide
## 1169 (10.5543, -84.894599999999997) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 102 Medium Tropical cyclone Tropical Storm Alma NA NA
## 504 Medium Tropical cyclone Tropical Storm Tomas NA 0
## 888 Large Downpour NA NA
## 1169 Medium Mining digging NA NA
## source_name
## 102
## 504
## 888
## 1169 www.ticotimes.net
## source_link
## 102 http://www.reliefweb.int/rw/RWB.NSF/db900SID/ASAZ-7FHCHL?OpenDocument
## 504 http://fortunatimes.com/2010/11/06/no-passage-to-the-south-and-central-pacific/
## 888 http://thecostaricanews.com/landslides-and-wash-outs-continue-to-cause-problems-in-northern-costa-rica/12129
## 1169 http://www.ticotimes.net/More-news/News-Briefs/TRAVEL-ALERT-UPDATE-Rains-landslides-close-eight-routes-across-Costa-Rica_Friday-October-04-2013
ggplot(data=df_GU, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_GU,aes(x="Guanacaste",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_GU$Distance
names(Distance) <- df_GU$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Bagaces 17.65521 17.65521 27.32571 27.32571
## Tilarán 12.33807 29.99328 19.09615 46.42185
## Tilarán 12.21952 42.21280 18.91266 65.33451
## Tilarán 12.18115 54.39395 18.85328 84.18779
## Tilarán 10.21631 64.61026 15.81221 100.00000
Diagrama de tallo y hojas
stem(df_GU$"Distance")
##
## The decimal point is at the |
##
## 10 | 2
## 12 | 223
## 14 |
## 16 | 7
stem(df_GU$"Distance")
##
## The decimal point is at the |
##
## 10 | 2
## 12 | 223
## 14 |
## 16 | 7
stem(df_GU$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 10 | 2
## 11 |
## 12 | 223
## 13 |
## 14 |
## 15 |
## 16 |
## 17 | 7
Series temporales
library(forecast)
data_serie<- ts(df_GU$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May
## 2007 17.65521 10.21631 12.33807 12.21952 12.18115
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 10.21631 |
1 |
20 |
20 |
20 |
20 |
| 12.18115 |
1 |
20 |
20 |
40 |
40 |
| 12.21952 |
1 |
20 |
20 |
60 |
60 |
| 12.33807 |
1 |
20 |
20 |
80 |
80 |
| 17.65521 |
1 |
20 |
20 |
100 |
100 |
| Total |
5 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 6 obs. of 5 variables:
## $ n : num 1 1 1 1 1 5
## $ % : num 20 20 20 20 20 100
## $ val% : num 20 20 20 20 20 100
## $ %cum : num 20 40 60 80 100 100
## $ val%cum: num 20 40 60 80 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 10.21631 |
1 |
| 12.18115 |
1 |
| 12.21952 |
1 |
| 12.33807 |
1 |
| 17.65521 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 10.21631 13.21631 16.21631 19.21631
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 10.21631 |
1 |
0.2 |
1 |
| 12.18115 |
1 |
0.2 |
2 |
| 12.21952 |
1 |
0.2 |
3 |
| 12.33807 |
1 |
0.2 |
4 |
| 17.65521 |
1 |
0.2 |
5 |
str(Freq_table)
## 'data.frame': 5 obs. of 4 variables:
## $ Distance: Factor w/ 5 levels "10.21631","12.18115",..: 1 2 3 4 5
## $ Freq : int 1 1 1 1 1
## $ Rel_Freq: num 0.2 0.2 0.2 0.2 0.2
## $ Cum_Freq: int 1 2 3 4 5
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 10.21631 |
1 |
| 12.18115 |
1 |
| 12.21952 |
1 |
| 12.33807 |
1 |
| 17.65521 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_GU$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.22 12.18 12.22 12.92 12.34 17.66
library(pastecs)
stat.desc(df_GU)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 5.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 5.560000e+02 NA NA NA NA NA NA
## max 5.591000e+03 NA NA NA NA NA NA
## range 5.035000e+03 NA NA NA NA NA NA
## sum 1.877600e+04 NA NA NA NA NA NA
## median 4.375000e+03 NA NA NA NA NA NA
## mean 3.755200e+03 NA NA NA NA NA NA
## SE.mean 9.601025e+02 NA NA NA NA NA NA
## CI.mean.0.95 2.665672e+03 NA NA NA NA NA NA
## var 4.608984e+06 NA NA NA NA NA NA
## std.dev 2.146854e+03 NA NA NA NA NA NA
## coef.var 5.717018e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 5.000000e+00 NA 5.0000000 NA 5.000000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.000000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.000000000
## min 4.108000e+03 NA 10.2163100 NA 10.402400000
## max 7.301000e+03 NA 17.6552100 NA 10.556200000
## range 3.193000e+03 NA 7.4389000 NA 0.153800000
## sum 3.331200e+04 NA 64.6102600 NA 52.522300000
## median 7.301000e+03 NA 12.2195200 NA 10.554300000
## mean 6.662400e+03 NA 12.9220520 NA 10.504460000
## SE.mean 6.386000e+02 NA 1.2471437 NA 0.032060437
## CI.mean.0.95 1.773038e+03 NA 3.4626259 NA 0.089014042
## var 2.039050e+06 NA 7.7768366 NA 0.005139358
## std.dev 1.427953e+03 NA 2.7886980 NA 0.071689316
## coef.var 2.143301e-01 NA 0.2158092 NA 0.006824655
## longitude geolocation hazard_type landslide_type
## nbr.val 5.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -8.535550e+01 NA NA NA
## max -8.487510e+01 NA NA NA
## range 4.804000e-01 NA NA NA
## sum -4.249159e+02 NA NA NA
## median -8.489520e+01 NA NA NA
## mean -8.498318e+01 NA NA NA
## SE.mean 9.316065e-02 NA NA NA
## CI.mean.0.95 2.586554e-01 NA NA NA
## var 4.339454e-02 NA NA NA
## std.dev 2.083136e-01 NA NA NA
## coef.var -2.451233e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 0 2.000000 NA
## nbr.null NA NA NA 0 1.000000 NA
## nbr.na NA NA NA 5 3.000000 NA
## min NA NA NA Inf 0.000000 NA
## max NA NA NA -Inf 2.000000 NA
## range NA NA NA -Inf 2.000000 NA
## sum NA NA NA 0 2.000000 NA
## median NA NA NA NA 1.000000 NA
## mean NA NA NA NaN 1.000000 NA
## SE.mean NA NA NA NA 1.000000 NA
## CI.mean.0.95 NA NA NA NaN 12.706205 NA
## var NA NA NA NA 2.000000 NA
## std.dev NA NA NA NA 1.414214 NA
## coef.var NA NA NA NA 1.414214 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Alajuela
Deslizamiento en las ciudades de Alajuela
library(readr)
library(knitr)
df_AJ <- subset (df, State == "Alajuela")
df_AJ %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 45 Costa Rica Alajuela Atenas 3.08459 10/11/07
## 302 Costa Rica Alajuela Desamparados 6.88715 4/14/10
## 347 Costa Rica Alajuela Desamparados 6.92174 5/22/10
## 395 Costa Rica Alajuela Desamparados 4.24199 7/30/10
## 459 Costa Rica Alajuela San Rafael 1.47396 9/29/10
## 503 Costa Rica Alajuela San Rafael 9.61692 11/4/10
## 514 Costa Rica Alajuela Santiago 5.43516 11/5/10
## 780 Costa Rica Alajuela Upala 0.70048 7/12/11
## 1098 Costa Rica Alajuela Sabanilla 4.87432 8/27/13
## 1156 Costa Rica Alajuela Sabanilla 10.32968 9/16/13
## 1289 Costa Rica Alajuela La Fortuna 9.84213 10/4/14
## 1301 Costa Rica Alajuela 5.57523 9/19/14
## 1308 Costa Rica Alajuela Desamparados 5.95519 11/1/14
## 1342 Costa Rica Alajuela Rio Segundo 11.96524 8/21/14
## 1364 Costa Rica Alajuela Desamparados 5.12667 8/10/14
## 1461 Costa Rica Alajuela La Fortuna 5.96634 6/17/15
## 1475 Costa Rica Alajuela Atenas 6.80061 6/3/15
## 1642 Costa Rica Alajuela Santo Domingo 3.21979 10/27/15
## 1643 Costa Rica Alajuela Alajuela 3.08916 11/18/15
## 1644 Costa Rica Alajuela Naranjo 2.08469 10/29/15
head(df_AJ)
## id Date time Continent Country country_code State
## 45 301 10/11/07 <NA> Costa Rica CR Alajuela
## 302 1749 4/14/10 <NA> Costa Rica CR Alajuela
## 347 1886 5/22/10 18:00:00 <NA> Costa Rica CR Alajuela
## 395 2174 7/30/10 9:30:00 <NA> Costa Rica CR Alajuela
## 459 2516 9/29/10 <NA> Costa Rica CR Alajuela
## 503 2682 11/4/10 <NA> Costa Rica CR Alajuela
## population City Distance location_description latitude longitude
## 45 7014 Atenas 3.08459 9.9869 -84.4070
## 302 14448 Desamparados 6.88715 Above road 9.9323 -84.4453
## 347 14448 Desamparados 6.92174 Above road 9.9290 -84.4428
## 395 14448 Desamparados 4.24199 Above road 9.9271 -84.4568
## 459 3624 San Rafael 1.47396 10.0757 -84.4793
## 503 3624 San Rafael 9.61692 10.0421 -84.5577
## geolocation hazard_type landslide_type
## 45 (9.9869000000000003, -84.406999999999996) Landslide Mudslide
## 302 (9.9322999999999997, -84.445300000000003) Landslide Landslide
## 347 (9.9290000000000003, -84.442800000000005) Landslide Landslide
## 395 (9.9270999999999994, -84.456800000000001) Landslide Landslide
## 459 (10.075699999999999, -84.479299999999995) Landslide Mudslide
## 503 (10.0421, -84.557699999999997) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 45 Large Rain NA 14
## 302 Medium Downpour NA 0
## 347 Medium Downpour 3 0
## 395 Medium Rain NA 0
## 459 Medium Downpour NA 0
## 503 Medium Tropical cyclone Tropical Storm Tomas NA 0
## source_name
## 45 Agence France-Presse, afp.google.com
## 302
## 347 Costa Rica News
## 395 La Fortuna
## 459
## 503
## source_link
## 45 http://afp.google.com/article/ALeqM5hu6a8oyAM1ycq9nU_6Zyj_l7F0AA
## 302 http://www.insidecostarica.com/dailynews/2010/april/16/costarica10041602.htm
## 347 http://thecostaricanews.com/rains-cause-landslides-and-road-accidents-on-caldera/3255
## 395 https://lafortunatimes.wordpress.com/2010/07/30/landslide-caused-closure-of-san-jose-caldera-for-most-of-the-day-friday/
## 459 http://www.ticotimes.net/News/Daily-News/Inter-American-Highway-Reopens-Caldera-Highway-Under-Repair_Monday-October-04-2010
## 503 http://fortunatimes.com/2010/11/06/no-passage-to-the-south-and-central-pacific/
ggplot(data=df_AJ, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_AJ,aes(x="Alajuela",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_AJ$Distance
names(Distance) <- df_AJ$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Rio Segundo 11.9652400 11.9652400 10.5708367 10.5708367
## Sabanilla 10.3296800 22.2949200 9.1258813 19.6967180
## La Fortuna 9.8421300 32.1370500 8.6951494 28.3918674
## San Rafael 9.6169200 41.7539700 8.4961849 36.8880523
## Desamparados 6.9217400 48.6757100 6.1150953 43.0031476
## Desamparados 6.8871500 55.5628600 6.0845364 49.0876840
## Atenas 6.8006100 62.3634700 6.0080816 55.0957655
## La Fortuna 5.9663400 68.3298100 5.2710356 60.3668011
## Desamparados 5.9551900 74.2850000 5.2611850 65.6279861
## 5.5752300 79.8602300 4.9255047 70.5534908
## Santiago 5.4351600 85.2953900 4.8017582 75.3552490
## Desamparados 5.1266700 90.4220600 4.5292189 79.8844679
## Sabanilla 4.8743200 95.2963800 4.3062772 84.1907451
## Desamparados 4.2419900 99.5383700 3.7476376 87.9383828
## Santo Domingo 3.2197900 102.7581600 2.8445626 90.7829454
## Alajuela 3.0891600 105.8473200 2.7291559 93.5121013
## Atenas 3.0845900 108.9319100 2.7251185 96.2372198
## Naranjo 2.0846900 111.0166000 1.8417447 98.0789646
## San Rafael 1.4739600 112.4905600 1.3021879 99.3811524
## Upala 0.7004800 113.1910400 0.6188476 100.0000000
Diagrama de tallo y hojas
stem(df_AJ$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 1123334
## 0 | 555666777
## 1 | 0002
stem(df_AJ$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 1123334
## 0 | 555666777
## 1 | 0002
stem(df_AJ$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 0 | 75
## 2 | 1112
## 4 | 29146
## 6 | 00899
## 8 | 68
## 10 | 3
## 12 | 0
Series temporales
library(forecast)
data_serie<- ts(df_AJ$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 3.08459 6.88715 6.92174 4.24199 1.47396 9.61692
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.70048 |
1 |
5 |
5 |
5 |
5 |
| 1.47396 |
1 |
5 |
5 |
10 |
10 |
| 2.08469 |
1 |
5 |
5 |
15 |
15 |
| 3.08459 |
1 |
5 |
5 |
20 |
20 |
| 3.08916 |
1 |
5 |
5 |
25 |
25 |
| 3.21979 |
1 |
5 |
5 |
30 |
30 |
| 4.24199 |
1 |
5 |
5 |
35 |
35 |
| 4.87432 |
1 |
5 |
5 |
40 |
40 |
| 5.12667 |
1 |
5 |
5 |
45 |
45 |
| 5.43516 |
1 |
5 |
5 |
50 |
50 |
| 5.57523 |
1 |
5 |
5 |
55 |
55 |
| 5.95519 |
1 |
5 |
5 |
60 |
60 |
| 5.96634 |
1 |
5 |
5 |
65 |
65 |
| 6.80061 |
1 |
5 |
5 |
70 |
70 |
| 6.88715 |
1 |
5 |
5 |
75 |
75 |
| 6.92174 |
1 |
5 |
5 |
80 |
80 |
| 9.61692 |
1 |
5 |
5 |
85 |
85 |
| 9.84213 |
1 |
5 |
5 |
90 |
90 |
| 10.32968 |
1 |
5 |
5 |
95 |
95 |
| 11.96524 |
1 |
5 |
5 |
100 |
100 |
| Total |
20 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 21 obs. of 5 variables:
## $ n : num 1 1 1 1 1 1 1 1 1 1 ...
## $ % : num 5 5 5 5 5 5 5 5 5 5 ...
## $ val% : num 5 5 5 5 5 5 5 5 5 5 ...
## $ %cum : num 5 10 15 20 25 30 35 40 45 50 ...
## $ val%cum: num 5 10 15 20 25 30 35 40 45 50 ...
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 0.70048 |
1 |
| 1.47396 |
1 |
| 2.08469 |
1 |
| 3.08459 |
1 |
| 3.08916 |
1 |
| 3.21979 |
1 |
| 4.24199 |
1 |
| 4.87432 |
1 |
| 5.12667 |
1 |
| 5.43516 |
1 |
| 5.57523 |
1 |
| 5.95519 |
1 |
| 5.96634 |
1 |
| 6.80061 |
1 |
| 6.88715 |
1 |
| 6.92174 |
1 |
| 9.61692 |
1 |
| 9.84213 |
1 |
| 10.32968 |
1 |
| 11.96524 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.70048 3.70048 6.70048 9.70048 12.70048
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.70048 |
1 |
0.05 |
1 |
| 1.47396 |
1 |
0.05 |
2 |
| 2.08469 |
1 |
0.05 |
3 |
| 3.08459 |
1 |
0.05 |
4 |
| 3.08916 |
1 |
0.05 |
5 |
| 3.21979 |
1 |
0.05 |
6 |
| 4.24199 |
1 |
0.05 |
7 |
| 4.87432 |
1 |
0.05 |
8 |
| 5.12667 |
1 |
0.05 |
9 |
| 5.43516 |
1 |
0.05 |
10 |
| 5.57523 |
1 |
0.05 |
11 |
| 5.95519 |
1 |
0.05 |
12 |
| 5.96634 |
1 |
0.05 |
13 |
| 6.80061 |
1 |
0.05 |
14 |
| 6.88715 |
1 |
0.05 |
15 |
| 6.92174 |
1 |
0.05 |
16 |
| 9.61692 |
1 |
0.05 |
17 |
| 9.84213 |
1 |
0.05 |
18 |
| 10.32968 |
1 |
0.05 |
19 |
| 11.96524 |
1 |
0.05 |
20 |
str(Freq_table)
## 'data.frame': 20 obs. of 4 variables:
## $ Distance: Factor w/ 20 levels "0.70048","1.47396",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Rel_Freq: num 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
## $ Cum_Freq: int 1 2 3 4 5 6 7 8 9 10 ...
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 0.70048 |
1 |
| 1.47396 |
1 |
| 2.08469 |
1 |
| 3.08459 |
1 |
| 3.08916 |
1 |
| 3.21979 |
1 |
| 4.24199 |
1 |
| 4.87432 |
1 |
| 5.12667 |
1 |
| 5.43516 |
1 |
| 5.57523 |
1 |
| 5.95519 |
1 |
| 5.96634 |
1 |
| 6.80061 |
1 |
| 6.88715 |
1 |
| 6.92174 |
1 |
| 9.61692 |
1 |
| 9.84213 |
1 |
| 10.32968 |
1 |
| 11.96524 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_AJ$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7005 3.1871 5.5052 5.6596 6.8958 11.9652
library(pastecs)
stat.desc(df_AJ)
## id Date time Continent Country country_code State
## nbr.val 2.000000e+01 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 3.010000e+02 NA NA NA NA NA NA
## max 7.488000e+03 NA NA NA NA NA NA
## range 7.187000e+03 NA NA NA NA NA NA
## sum 9.718800e+04 NA NA NA NA NA NA
## median 5.878000e+03 NA NA NA NA NA NA
## mean 4.859400e+03 NA NA NA NA NA NA
## SE.mean 5.261514e+02 NA NA NA NA NA NA
## CI.mean.0.95 1.101248e+03 NA NA NA NA NA NA
## var 5.536707e+06 NA NA NA NA NA NA
## std.dev 2.353021e+03 NA NA NA NA NA NA
## coef.var 4.842204e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 2.000000e+01 NA 20.0000000 NA 20.00000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.00000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.00000000
## min 1.015000e+03 NA 0.7004800 NA 9.91890000
## max 4.749400e+04 NA 11.9652400 NA 10.89160000
## range 4.647900e+04 NA 11.2647600 NA 0.97270000
## sum 1.924900e+05 NA 113.1910400 NA 202.24760000
## median 7.014000e+03 NA 5.5051950 NA 10.04315000
## mean 9.624500e+03 NA 5.6595520 NA 10.11238000
## SE.mean 2.281502e+03 NA 0.6812501 NA 0.05493583
## CI.mean.0.95 4.775238e+03 NA 1.4258729 NA 0.11498201
## var 1.041050e+08 NA 9.2820347 NA 0.06035891
## std.dev 1.020319e+04 NA 3.0466432 NA 0.24568050
## coef.var 1.060126e+00 NA 0.5383188 NA 0.02429502
## longitude geolocation hazard_type landslide_type
## nbr.val 2.000000e+01 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -8.501410e+01 NA NA NA
## max -8.418070e+01 NA NA NA
## range 8.334000e-01 NA NA NA
## sum -1.688552e+03 NA NA NA
## median -8.444405e+01 NA NA NA
## mean -8.442758e+01 NA NA NA
## SE.mean 4.594981e-02 NA NA NA
## CI.mean.0.95 9.617405e-02 NA NA NA
## var 4.222770e-02 NA NA NA
## std.dev 2.054938e-01 NA NA NA
## coef.var -2.433965e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities
## nbr.val NA NA NA 11.0000000 18.0000000
## nbr.null NA NA NA 10.0000000 15.0000000
## nbr.na NA NA NA 9.0000000 2.0000000
## min NA NA NA 0.0000000 0.0000000
## max NA NA NA 3.0000000 14.0000000
## range NA NA NA 3.0000000 14.0000000
## sum NA NA NA 3.0000000 16.0000000
## median NA NA NA 0.0000000 0.0000000
## mean NA NA NA 0.2727273 0.8888889
## SE.mean NA NA NA 0.2727273 0.7749716
## CI.mean.0.95 NA NA NA 0.6076742 1.6350471
## var NA NA NA 0.8181818 10.8104575
## std.dev NA NA NA 0.9045340 3.2879260
## coef.var NA NA NA 3.3166248 3.6989168
## source_name source_link
## nbr.val NA NA
## nbr.null NA NA
## nbr.na NA NA
## min NA NA
## max NA NA
## range NA NA
## sum NA NA
## median NA NA
## mean NA NA
## SE.mean NA NA
## CI.mean.0.95 NA NA
## var NA NA
## std.dev NA NA
## coef.var NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Provincia de Cartago
Deslizamiento en las ciudades de Cartago
library(readr)
library(knitr)
df_CA <- subset (df, State == "Cartago")
head(df_CA, n=4)
## id Date time Continent Country country_code State population
## 505 2684 11/4/10 <NA> Costa Rica CR Cartago 4350
## 828 4031 10/31/11 <NA> Costa Rica CR Cartago 6784
## 1383 6695 9/13/14 <NA> Costa Rica CR Cartago 26594
## 1646 7490 10/15/15 <NA> Costa Rica CR Cartago 4060
## City Distance location_description latitude longitude
## 505 Orosà 19.28722 9.6227 -83.8359
## 828 Cot 9.63616 Natural slope 9.9792 -83.8525
## 1383 Cartago 3.07297 Below road 9.8895 -83.9316
## 1646 5.15142 Above road 9.7917 -83.9815
## geolocation hazard_type landslide_type
## 505 (9.6227, -83.835899999999995) Landslide Landslide
## 828 (9.9792000000000005, -83.852500000000006) Landslide Landslide
## 1383 (9.8895, -83.931600000000003) Landslide Landslide
## 1646 (9.7917000000000005, -83.981499999999997) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 505 Medium Tropical cyclone Tropical Storm Tomas NA 0
## 828 Medium Downpour NA 0
## 1383 Small Rain 0 0
## 1646 Medium Downpour 0 0
## source_name
## 505
## 828 Inside Costa Rica
## 1383 Ahora
## 1646 crhoy
## source_link
## 505 http://fortunatimes.com/2010/11/06/no-passage-to-the-south-and-central-pacific/
## 828 http://www.insidecostarica.com/dailynews/2011/october/31/costarica11103102.htm
## 1383 http://www.ahora.cr/nacionales/Derrumbe-pone-riesgo-linea-Cartago_0_1439256064.html
## 1646 http://www.crhoy.com/carril-cerrado-sobre-interamericana-sur-por-deslizamiento/
df_CA %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 505 Costa Rica Cartago Orosà 19.28722 11/4/10
## 828 Costa Rica Cartago Cot 9.63616 10/31/11
## 1383 Costa Rica Cartago Cartago 3.07297 9/13/14
## 1646 Costa Rica Cartago 5.15142 10/15/15
## 1647 Costa Rica Cartago Cot 9.53493 3/20/15
## 1648 Costa Rica Cartago Cartago 2.94804 3/18/15
ggplot(data=df_CA, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_CA,aes(x="Cartago",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_CA$Distance
names(Distance) <- df_CA$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Orosà 19.287220 19.287220 38.861440 38.861440
## Cot 9.636160 28.923380 19.415709 58.277148
## Cot 9.534930 38.458310 19.211743 77.488891
## 5.151420 43.609730 10.379495 87.868386
## Cartago 3.072970 46.682700 6.191667 94.060052
## Cartago 2.948040 49.630740 5.939948 100.000000
Diagrama de tallo y hojas
stem(df_CA$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 33
## 0 | 5
## 1 | 00
## 1 | 9
stem(df_CA$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 33
## 0 | 5
## 1 | 00
## 1 | 9
stem(df_CA$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 2 | 91
## 4 | 2
## 6 |
## 8 | 56
## 10 |
## 12 |
## 14 |
## 16 |
## 18 | 3
Series temporales
library(forecast)
data_serie<- ts(df_CA$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 19.28722 9.63616 3.07297 5.15142 9.53493 2.94804
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 2.94804 |
1 |
16.7 |
16.7 |
16.7 |
16.7 |
| 3.07297 |
1 |
16.7 |
16.7 |
33.3 |
33.3 |
| 5.15142 |
1 |
16.7 |
16.7 |
50.0 |
50.0 |
| 9.53493 |
1 |
16.7 |
16.7 |
66.7 |
66.7 |
| 9.63616 |
1 |
16.7 |
16.7 |
83.3 |
83.3 |
| 19.28722 |
1 |
16.7 |
16.7 |
100.0 |
100.0 |
| Total |
6 |
100.0 |
100.0 |
100.0 |
100.0 |
str(table)
## Classes 'freqtab' and 'data.frame': 7 obs. of 5 variables:
## $ n : num 1 1 1 1 1 1 6
## $ % : num 16.7 16.7 16.7 16.7 16.7 16.7 100
## $ val% : num 16.7 16.7 16.7 16.7 16.7 16.7 100
## $ %cum : num 16.7 33.3 50 66.7 83.3 100 100
## $ val%cum: num 16.7 33.3 50 66.7 83.3 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 2.94804 |
1 |
| 3.07297 |
1 |
| 5.15142 |
1 |
| 9.53493 |
1 |
| 9.63616 |
1 |
| 19.28722 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 2.94804 8.94804 14.94804 20.94804
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 2.94804 |
1 |
0.1666667 |
1 |
| 3.07297 |
1 |
0.1666667 |
2 |
| 5.15142 |
1 |
0.1666667 |
3 |
| 9.53493 |
1 |
0.1666667 |
4 |
| 9.63616 |
1 |
0.1666667 |
5 |
| 19.28722 |
1 |
0.1666667 |
6 |
str(Freq_table)
## 'data.frame': 6 obs. of 4 variables:
## $ Distance: Factor w/ 6 levels "2.94804","3.07297",..: 1 2 3 4 5 6
## $ Freq : int 1 1 1 1 1 1
## $ Rel_Freq: num 0.167 0.167 0.167 0.167 0.167 ...
## $ Cum_Freq: int 1 2 3 4 5 6
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 2.94804 |
1 |
| 3.07297 |
1 |
| 5.15142 |
1 |
| 9.53493 |
1 |
| 9.63616 |
1 |
| 19.28722 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_CA$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.948 3.593 7.343 8.272 9.611 19.287
library(pastecs)
stat.desc(df_CA)
## id Date time Continent Country country_code State
## nbr.val 6.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 2.684000e+03 NA NA NA NA NA NA
## max 7.492000e+03 NA NA NA NA NA NA
## range 4.808000e+03 NA NA NA NA NA NA
## sum 3.588300e+04 NA NA NA NA NA NA
## median 7.092500e+03 NA NA NA NA NA NA
## mean 5.980500e+03 NA NA NA NA NA NA
## SE.mean 8.567926e+02 NA NA NA NA NA NA
## CI.mean.0.95 2.202455e+03 NA NA NA NA NA NA
## var 4.404561e+06 NA NA NA NA NA NA
## std.dev 2.098705e+03 NA NA NA NA NA NA
## coef.var 3.509246e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 6.000000e+00 NA 6.0000000 NA 6.00000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.00000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.00000000
## min 4.060000e+03 NA 2.9480400 NA 9.62270000
## max 2.659400e+04 NA 19.2872200 NA 9.97920000
## range 2.253400e+04 NA 16.3391800 NA 0.35650000
## sum 7.516600e+04 NA 49.6307400 NA 59.14320000
## median 6.784000e+03 NA 7.3431750 NA 9.88550000
## mean 1.252767e+04 NA 8.2717900 NA 9.85720000
## SE.mean 4.473174e+03 NA 2.5159722 NA 0.05493519
## CI.mean.0.95 1.149866e+04 NA 6.4675124 NA 0.14121539
## var 1.200557e+08 NA 37.9806957 NA 0.01810725
## std.dev 1.095699e+04 NA 6.1628480 NA 0.13456317
## coef.var 8.746236e-01 NA 0.7450441 NA 0.01365126
## longitude geolocation hazard_type landslide_type
## nbr.val 6.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -8.398150e+01 NA NA NA
## max -8.383590e+01 NA NA NA
## range 1.456000e-01 NA NA NA
## sum -5.033958e+02 NA NA NA
## median -8.389290e+01 NA NA NA
## mean -8.389930e+01 NA NA NA
## SE.mean 2.429580e-02 NA NA NA
## CI.mean.0.95 6.245435e-02 NA NA NA
## var 3.541716e-03 NA NA NA
## std.dev 5.951232e-02 NA NA NA
## coef.var -7.093303e-04 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 4 6 NA
## nbr.null NA NA NA 4 6 NA
## nbr.na NA NA NA 2 0 NA
## min NA NA NA 0 0 NA
## max NA NA NA 0 0 NA
## range NA NA NA 0 0 NA
## sum NA NA NA 0 0 NA
## median NA NA NA 0 0 NA
## mean NA NA NA 0 0 NA
## SE.mean NA NA NA 0 0 NA
## CI.mean.0.95 NA NA NA 0 0 NA
## var NA NA NA 0 0 NA
## std.dev NA NA NA 0 0 NA
## coef.var NA NA NA NaN NaN NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Heredia
Deslizamientos de las ciudades de Heredia
library(readr)
library(knitr)
df_HE <- subset (df, State == "Heredia")
df_HE %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 38 Costa Rica Heredia Heredia 0.26208 9/9/07
## 311 Costa Rica Heredia Ã\201ngeles 19.51432 4/27/10
## 480 Costa Rica Heredia Ã\201ngeles 14.81614 10/15/10
## 529 Costa Rica Heredia Ã\201ngeles 19.54581 11/21/10
## 702 Costa Rica Heredia Ã\201ngeles 15.05161 5/8/11
## 884 Costa Rica Heredia Santo Domingo 21.95470 5/13/12
## 1157 Costa Rica Heredia Santo Domingo 9.85736 9/16/13
## 1384 Costa Rica Heredia Dulce Nombre de Jesus 10.01310 12/13/14
head(df_HE)
## id Date time Continent Country country_code State
## 38 249 9/9/07 <NA> Costa Rica CR Heredia
## 311 1786 4/27/10 Early morning <NA> Costa Rica CR Heredia
## 480 2598 10/15/10 <NA> Costa Rica CR Heredia
## 529 2742 11/21/10 <NA> Costa Rica CR Heredia
## 702 3472 5/8/11 Night <NA> Costa Rica CR Heredia
## 884 4358 5/13/12 <NA> Costa Rica CR Heredia
## population City Distance location_description latitude longitude
## 38 21947 Heredia 0.26208 10.0000 -84.1167
## 311 1355 Ã\201ngeles 19.51432 10.1452 -83.9564
## 480 1355 Ã\201ngeles 14.81614 10.1067 -83.9753
## 529 1355 Ã\201ngeles 19.54581 10.1433 -83.9529
## 702 1355 Ã\201ngeles 15.05161 10.1118 -83.9793
## 884 5745 Santo Domingo 21.95470 10.1981 -84.0074
## geolocation hazard_type landslide_type
## 38 (10, -84.116699999999994) Landslide Landslide
## 311 (10.145200000000001, -83.956400000000002) Landslide Landslide
## 480 (10.1067, -83.975300000000004) Landslide Rockfall
## 529 (10.1433, -83.9529) Landslide Landslide
## 702 (10.111800000000001, -83.979299999999995) Landslide Landslide
## 884 (10.1981, -84.007400000000004) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities source_name
## 38 Medium Rain NA NA ticotimes.net
## 311 Medium Downpour NA 0
## 480 Medium Downpour NA 2
## 529 Medium Downpour NA 0
## 702 Medium Rain NA 0
## 884 Medium Downpour NA NA
## source_link
## 38 http://www.ticotimes.net/dailyarchive/2007_09/0911072.htm
## 311 http://en.trend.az/news/incident/1678592.html
## 480 http://www.ticotimes.net/News/Daily-News/Two-People-Die-in-Landslide-on-Limon-Highway_Saturday-October-16-2010
## 529 http://insidecostarica.com/dailynews/2010/november/22/costarica10112204.htm
## 702 http://insidecostarica.com/dailynews/2011/may/10/costarica11051010.htm
## 884 http://www.insidecostarica.com/dailynews/2012/may/17/costarica12051708.htm
ggplot(data=df_HE, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_HE,aes(x="Distrito Nacional",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_HE$Distance
names(Distance) <- df_HE$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Santo Domingo 21.954700 21.954700 19.776315 19.776315
## Ã\201ngeles 19.545810 41.500510 17.606440 37.382755
## Ã\201ngeles 19.514320 61.014830 17.578074 54.960829
## Ã\201ngeles 15.051610 76.066440 13.558162 68.518991
## Ã\201ngeles 14.816140 90.882580 13.346056 81.865047
## Dulce Nombre de Jesus 10.013100 100.895680 9.019582 90.884629
## Santo Domingo 9.857360 110.753040 8.879295 99.763924
## Heredia 0.262080 111.015120 0.236076 100.000000
Diagrama de tallo y hojas
stem(df_HE$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 0
## 0 |
## 1 | 00
## 1 | 55
## 2 | 002
stem(df_HE$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 0
## 0 |
## 1 | 00
## 1 | 55
## 2 | 002
stem(df_HE$"Distance", scale = 2)
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 0
## 0 |
## 1 | 00
## 1 | 55
## 2 | 002
Series temporales
library(forecast)
data_serie<- ts(df_HE$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 0.26208 19.51432 14.81614 19.54581 15.05161 21.95470
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.26208 |
1 |
12.5 |
12.5 |
12.5 |
12.5 |
| 9.85736 |
1 |
12.5 |
12.5 |
25.0 |
25.0 |
| 10.0131 |
1 |
12.5 |
12.5 |
37.5 |
37.5 |
| 14.81614 |
1 |
12.5 |
12.5 |
50.0 |
50.0 |
| 15.05161 |
1 |
12.5 |
12.5 |
62.5 |
62.5 |
| 19.51432 |
1 |
12.5 |
12.5 |
75.0 |
75.0 |
| 19.54581 |
1 |
12.5 |
12.5 |
87.5 |
87.5 |
| 21.9547 |
1 |
12.5 |
12.5 |
100.0 |
100.0 |
| Total |
8 |
100.0 |
100.0 |
100.0 |
100.0 |
str(table)
## Classes 'freqtab' and 'data.frame': 9 obs. of 5 variables:
## $ n : num 1 1 1 1 1 1 1 1 8
## $ % : num 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 100
## $ val% : num 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 100
## $ %cum : num 12.5 25 37.5 50 62.5 75 87.5 100 100
## $ val%cum: num 12.5 25 37.5 50 62.5 75 87.5 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 0.26208 |
1 |
| 9.85736 |
1 |
| 10.0131 |
1 |
| 14.81614 |
1 |
| 15.05161 |
1 |
| 19.51432 |
1 |
| 19.54581 |
1 |
| 21.9547 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.26208 6.26208 12.26208 18.26208 24.26208
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.26208 |
1 |
0.125 |
1 |
| 9.85736 |
1 |
0.125 |
2 |
| 10.0131 |
1 |
0.125 |
3 |
| 14.81614 |
1 |
0.125 |
4 |
| 15.05161 |
1 |
0.125 |
5 |
| 19.51432 |
1 |
0.125 |
6 |
| 19.54581 |
1 |
0.125 |
7 |
| 21.9547 |
1 |
0.125 |
8 |
str(Freq_table)
## 'data.frame': 8 obs. of 4 variables:
## $ Distance: Factor w/ 8 levels "0.26208","9.85736",..: 1 2 3 4 5 6 7 8
## $ Freq : int 1 1 1 1 1 1 1 1
## $ Rel_Freq: num 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
## $ Cum_Freq: int 1 2 3 4 5 6 7 8
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 0.26208 |
1 |
| 9.85736 |
1 |
| 10.0131 |
1 |
| 14.81614 |
1 |
| 15.05161 |
1 |
| 19.51432 |
1 |
| 19.54581 |
1 |
| 21.9547 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_HE$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2621 9.9742 14.9339 13.8769 19.5222 21.9547
library(pastecs)
stat.desc(df_HE)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 8.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 2.490000e+02 NA NA NA NA NA NA
## max 6.696000e+03 NA NA NA NA NA NA
## range 6.447000e+03 NA NA NA NA NA NA
## sum 2.744200e+04 NA NA NA NA NA NA
## median 3.107000e+03 NA NA NA NA NA NA
## mean 3.430250e+03 NA NA NA NA NA NA
## SE.mean 7.315967e+02 NA NA NA NA NA NA
## CI.mean.0.95 1.729951e+03 NA NA NA NA NA NA
## var 4.281870e+06 NA NA NA NA NA NA
## std.dev 2.069268e+03 NA NA NA NA NA NA
## coef.var 6.032412e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 8.000000e+00 NA 8.0000000 NA 8.000000000
## nbr.null 1.000000e+00 NA 0.0000000 NA 0.000000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.000000000
## min 0.000000e+00 NA 0.2620800 NA 10.000000000
## max 2.194700e+04 NA 21.9547000 NA 10.205400000
## range 2.194700e+04 NA 21.6926200 NA 0.205400000
## sum 3.885700e+04 NA 111.0151200 NA 81.063300000
## median 1.355000e+03 NA 14.9338750 NA 10.144250000
## mean 4.857125e+03 NA 13.8768900 NA 10.132912500
## SE.mean 2.557523e+03 NA 2.4924134 NA 0.022739522
## CI.mean.0.95 6.047580e+03 NA 5.8936213 NA 0.053770426
## var 5.232738e+07 NA 49.6969984 NA 0.004136687
## std.dev 7.233767e+03 NA 7.0496098 NA 0.064317081
## coef.var 1.489310e+00 NA 0.5080108 NA 0.006347344
## longitude geolocation hazard_type landslide_type
## nbr.val 8.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -8.414890e+01 NA NA NA
## max -8.390410e+01 NA NA NA
## range 2.448000e-01 NA NA NA
## sum -6.720410e+02 NA NA NA
## median -8.397730e+01 NA NA NA
## mean -8.400512e+01 NA NA NA
## SE.mean 2.987758e-02 NA NA NA
## CI.mean.0.95 7.064924e-02 NA NA NA
## var 7.141356e-03 NA NA NA
## std.dev 8.450655e-02 NA NA NA
## coef.var -1.005969e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 1 6.0000000 NA
## nbr.null NA NA NA 1 5.0000000 NA
## nbr.na NA NA NA 7 2.0000000 NA
## min NA NA NA 0 0.0000000 NA
## max NA NA NA 0 2.0000000 NA
## range NA NA NA 0 2.0000000 NA
## sum NA NA NA 0 2.0000000 NA
## median NA NA NA 0 0.0000000 NA
## mean NA NA NA 0 0.3333333 NA
## SE.mean NA NA NA NA 0.3333333 NA
## CI.mean.0.95 NA NA NA NaN 0.8568606 NA
## var NA NA NA NA 0.6666667 NA
## std.dev NA NA NA NA 0.8164966 NA
## coef.var NA NA NA NA 2.4494897 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
Puntarenas
Deslizamientos de las ciudades de Puntarenas
library(readr)
library(knitr)
df_PNS <- subset (df, State == "Puntarenas")
df_PNS %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 51 Costa Rica Puntarenas Miramar 3.82425 10/24/07
## 156 Costa Rica Puntarenas Golfito 11.74074 10/15/08
## 157 Costa Rica Puntarenas Miramar 8.92048 10/16/08
## 229 Costa Rica Puntarenas San Vito 18.00524 11/13/09
## 506 Costa Rica Puntarenas Golfito 7.87044 11/4/10
## 509 Costa Rica Puntarenas Corredor 4.93053 11/4/10
## 510 Costa Rica Puntarenas Parrita 13.48919 11/4/10
## 511 Costa Rica Puntarenas Ciudad Cortés 20.06633 11/4/10
## 1649 Costa Rica Puntarenas Buenos Aires 0.35225 11/23/15
head(df_PNS, n=4)
## id Date time Continent Country country_code State population
## 51 323 10/24/07 <NA> Costa Rica CR Puntarenas 6540
## 156 845 10/15/08 <NA> Costa Rica CR Puntarenas 6777
## 157 848 10/16/08 <NA> Costa Rica CR Puntarenas 6540
## 229 1296 11/13/09 <NA> Costa Rica CR Puntarenas 3981
## City Distance location_description latitude longitude
## 51 Miramar 3.82425 Mine construction 10.0715 -84.7575
## 156 Golfito 11.74074 8.6700 -83.0640
## 157 Miramar 8.92048 10.1110 -84.8090
## 229 San Vito 18.00524 8.8021 -83.1335
## geolocation hazard_type landslide_type
## 51 (10.0715, -84.757499999999993) Landslide Mudslide
## 156 (8.67, -83.063999999999993) Landslide Landslide
## 157 (10.111000000000001, -84.808999999999997) Landslide Complex
## 229 (8.8020999999999994, -83.133499999999998) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 51 Medium Downpour NA NA
## 156 Medium Downpour NA NA
## 157 Medium Downpour NA NA
## 229 Medium Earthquake NA 1
## source_name
## 51 Reuters - AlertNet.org
## 156
## 157
## 229
## source_link
## 51 http://www.reuters.com/article/companyNewsAndPR/idUSN2435152820071025
## 156 http://www.ticotimes.net/dailyarchive/2008_10/1016081.htm
## 157 http://insidecostarica.com/dailynews/2008/october/17/nac01.htm
## 229 http://www.ticotimes.net/dailyarchive/2009_11/1116092.cfm
ggplot(data=df_PNS, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_PNS,aes(x="Puntarenas",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_PNS$Distance
names(Distance) <- df_PNS$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Ciudad Cortés 20.0663300 20.0663300 22.4960244 22.4960244
## San Vito 18.0052400 38.0715700 20.1853711 42.6813955
## Parrita 13.4891900 51.5607600 15.1225036 57.8038990
## Golfito 11.7407400 63.3015000 13.1623457 70.9662447
## Miramar 8.9204800 72.2219800 10.0005998 80.9668445
## Golfito 7.8704400 80.0924200 8.8234176 89.7902622
## Corredor 4.9305300 85.0229500 5.5275341 95.3177962
## Miramar 3.8242500 88.8472000 4.2873022 99.6050985
## Buenos Aires 0.3522500 89.1994500 0.3949015 100.0000000
Diagrama de tallo y hojas
stem(df_PNS$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 04
## 0 | 589
## 1 | 23
## 1 | 8
## 2 | 0
stem(df_PNS$"Distance")
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 04
## 0 | 589
## 1 | 23
## 1 | 8
## 2 | 0
stem(df_PNS$"Distance", scale = 2)
##
## The decimal point is at the |
##
## 0 | 4
## 2 | 8
## 4 | 9
## 6 | 9
## 8 | 9
## 10 | 7
## 12 | 5
## 14 |
## 16 |
## 18 | 0
## 20 | 1
Series temporales
library(forecast)
data_serie<- ts(df_PNS$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 3.82425 11.74074 8.92048 18.00524 7.87044 4.93053
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 0.35225 |
1 |
11.1 |
11.1 |
11.1 |
11.1 |
| 3.82425 |
1 |
11.1 |
11.1 |
22.2 |
22.2 |
| 4.93053 |
1 |
11.1 |
11.1 |
33.3 |
33.3 |
| 7.87044 |
1 |
11.1 |
11.1 |
44.4 |
44.4 |
| 8.92048 |
1 |
11.1 |
11.1 |
55.6 |
55.6 |
| 11.74074 |
1 |
11.1 |
11.1 |
66.7 |
66.7 |
| 13.48919 |
1 |
11.1 |
11.1 |
77.8 |
77.8 |
| 18.00524 |
1 |
11.1 |
11.1 |
88.9 |
88.9 |
| 20.06633 |
1 |
11.1 |
11.1 |
100.0 |
100.0 |
| Total |
9 |
100.0 |
100.0 |
100.0 |
100.0 |
str(table)
## Classes 'freqtab' and 'data.frame': 10 obs. of 5 variables:
## $ n : num 1 1 1 1 1 1 1 1 1 9
## $ % : num 11.1 11.1 11.1 11.1 11.1 11.1 11.1 11.1 11.1 100
## $ val% : num 11.1 11.1 11.1 11.1 11.1 11.1 11.1 11.1 11.1 100
## $ %cum : num 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 100 100
## $ val%cum: num 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
| 0.35225 |
1 |
| 3.82425 |
1 |
| 4.93053 |
1 |
| 7.87044 |
1 |
| 8.92048 |
1 |
| 11.74074 |
1 |
| 13.48919 |
1 |
| 18.00524 |
1 |
| 20.06633 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 0.35225 4.35225 8.35225 12.35225 16.35225 20.35225
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 0.35225 |
1 |
0.1111111 |
1 |
| 3.82425 |
1 |
0.1111111 |
2 |
| 4.93053 |
1 |
0.1111111 |
3 |
| 7.87044 |
1 |
0.1111111 |
4 |
| 8.92048 |
1 |
0.1111111 |
5 |
| 11.74074 |
1 |
0.1111111 |
6 |
| 13.48919 |
1 |
0.1111111 |
7 |
| 18.00524 |
1 |
0.1111111 |
8 |
| 20.06633 |
1 |
0.1111111 |
9 |
str(Freq_table)
## 'data.frame': 9 obs. of 4 variables:
## $ Distance: Factor w/ 9 levels "0.35225","3.82425",..: 1 2 3 4 5 6 7 8 9
## $ Freq : int 1 1 1 1 1 1 1 1 1
## $ Rel_Freq: num 0.111 0.111 0.111 0.111 0.111 ...
## $ Cum_Freq: int 1 2 3 4 5 6 7 8 9
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
| 0.35225 |
1 |
| 3.82425 |
1 |
| 4.93053 |
1 |
| 7.87044 |
1 |
| 8.92048 |
1 |
| 11.74074 |
1 |
| 13.48919 |
1 |
| 18.00524 |
1 |
| 20.06633 |
1 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_PNS$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3523 4.9305 8.9205 9.9110 13.4892 20.0663
library(pastecs)
stat.desc(df_PNS)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 9.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 3.230000e+02 NA NA NA NA NA NA
## max 7.493000e+03 NA NA NA NA NA NA
## range 7.170000e+03 NA NA NA NA NA NA
## sum 2.155700e+04 NA NA NA NA NA NA
## median 2.685000e+03 NA NA NA NA NA NA
## mean 2.395222e+03 NA NA NA NA NA NA
## SE.mean 7.132643e+02 NA NA NA NA NA NA
## CI.mean.0.95 1.644790e+03 NA NA NA NA NA NA
## var 4.578713e+06 NA NA NA NA NA NA
## std.dev 2.139793e+03 NA NA NA NA NA NA
## coef.var 8.933588e-01 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 9.000000e+00 NA 9.0000000 NA 9.00000000
## nbr.null 0.000000e+00 NA 0.0000000 NA 0.00000000
## nbr.na 0.000000e+00 NA 0.0000000 NA 0.00000000
## min 3.734000e+03 NA 0.3522500 NA 8.61170000
## max 1.168000e+04 NA 20.0663300 NA 10.11100000
## range 7.946000e+03 NA 19.7140800 NA 1.49930000
## sum 5.696300e+04 NA 89.1994500 NA 82.72080000
## median 6.540000e+03 NA 8.9204800 NA 8.98960000
## mean 6.329222e+03 NA 9.9110500 NA 9.19120000
## SE.mean 8.172298e+02 NA 2.1831653 NA 0.19984316
## CI.mean.0.95 1.884535e+03 NA 5.0343882 NA 0.46083916
## var 6.010781e+06 NA 42.8958955 NA 0.35943561
## std.dev 2.451689e+03 NA 6.5494958 NA 0.59952949
## coef.var 3.873603e-01 NA 0.6608276 NA 0.06522864
## longitude geolocation hazard_type landslide_type
## nbr.val 9.000000e+00 NA NA NA
## nbr.null 0.000000e+00 NA NA NA
## nbr.na 0.000000e+00 NA NA NA
## min -8.480900e+01 NA NA NA
## max -8.294180e+01 NA NA NA
## range 1.867200e+00 NA NA NA
## sum -7.528426e+02 NA NA NA
## median -8.332680e+01 NA NA NA
## mean -8.364918e+01 NA NA NA
## SE.mean 2.553648e-01 NA NA NA
## CI.mean.0.95 5.888723e-01 NA NA NA
## var 5.869007e-01 NA NA NA
## std.dev 7.660945e-01 NA NA NA
## coef.var -9.158422e-03 NA NA NA
## landslide_size trigger storm_name injuries fatalities source_name
## nbr.val NA NA NA 1 6.0000000 NA
## nbr.null NA NA NA 1 5.0000000 NA
## nbr.na NA NA NA 8 3.0000000 NA
## min NA NA NA 0 0.0000000 NA
## max NA NA NA 0 1.0000000 NA
## range NA NA NA 0 1.0000000 NA
## sum NA NA NA 0 1.0000000 NA
## median NA NA NA 0 0.0000000 NA
## mean NA NA NA 0 0.1666667 NA
## SE.mean NA NA NA NA 0.1666667 NA
## CI.mean.0.95 NA NA NA NaN 0.4284303 NA
## var NA NA NA NA 0.1666667 NA
## std.dev NA NA NA NA 0.4082483 NA
## coef.var NA NA NA NA 2.4494897 NA
## source_link
## nbr.val NA
## nbr.null NA
## nbr.na NA
## min NA
## max NA
## range NA
## sum NA
## median NA
## mean NA
## SE.mean NA
## CI.mean.0.95 NA
## var NA
## std.dev NA
## coef.var NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[10] <- "Distance"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[2] <- "Date"
Barbados
library(readr)
library(knitr)
df_BA <- subset (df, Country == "Barbados")
knitr::kable(head(df_BA))
df_BA %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 161 Barbados Saint Joseph Bathsheba 2.87363 10/22/08
Deslizamentos por estado
library(ggplot2)
ggplot(data=df_BA, aes(x = "Barbados", y = Distance, fill=State)) +
geom_bar(stat = "identity", width = 1, color = "black") +
coord_polar("y", start = 0)

ggplot(data=df_BA, aes(fill=State, y=Distance, x="Barbados")) +
geom_bar(position="dodge", stat="identity")

Saint Joseph:
Deslizamientos en las ciudades de Saint Joseph
library(readr)
library(knitr)
df_SJ <- subset (df, State == "Saint Joseph")
df_SJ %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 161 Barbados Saint Joseph Bathsheba 2.87363 10/22/08
## 1201 Dominica Saint Joseph Saint Joseph 2.38605 1/7/14
head(df_SJ)
## id Date time Continent Country country_code State
## 161 857 10/22/08 <NA> Barbados BB Saint Joseph
## 1201 5754 1/7/14 Morning <NA> Dominica DM Saint Joseph
## population City Distance location_description latitude longitude
## 161 1765 Bathsheba 2.87363 13.229 -59.5400
## 1201 2184 Saint Joseph 2.38605 Above road 15.421 -61.4285
## geolocation hazard_type landslide_type landslide_size
## 161 (13.228999999999999, -59.54) Landslide Mudslide Medium
## 1201 (15.420999999999999, -61.4285) Landslide Landslide Medium
## trigger storm_name injuries fatalities source_name
## 161 Downpour NA NA
## 1201 unknown 0 0 DaVibes The Caribbean News Portal
## source_link
## 161 http://www.nationnews.com/story/326456269849259.php
## 1201 http://dominicavibes.dm/colihaut-men-escape-landslide/
ggplot(data=df_SJ, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico Circular
ggplot(df_SJ,aes(x="Saint Joseph",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_SJ$Distance
names(Distance) <- df_SJ$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Bathsheba 2.87363 2.87363 54.63507 54.63507
## Saint Joseph 2.38605 5.25968 45.36493 100.00000
Diagrama tallo y hoja
stem(df_SJ$"Distance")
##
## The decimal point is 1 digit(s) to the left of the |
##
## 23 | 9
## 24 |
## 25 |
## 26 |
## 27 |
## 28 | 7
stem(df_SJ$"Distance")
##
## The decimal point is 1 digit(s) to the left of the |
##
## 23 | 9
## 24 |
## 25 |
## 26 |
## 27 |
## 28 | 7
stem(df_SJ$"Distance", scale = 2)
##
## The decimal point is 1 digit(s) to the left of the |
##
## 23 | 9
## 24 |
## 24 |
## 25 |
## 25 |
## 26 |
## 26 |
## 27 |
## 27 |
## 28 |
## 28 | 7
Series temporales
library(forecast)
data_serie<- ts(df_SJ$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb
## 2007 2.87363 2.38605
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 2.38605 |
1 |
50 |
50 |
50 |
50 |
| 2.87363 |
1 |
50 |
50 |
100 |
100 |
| Total |
2 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 3 obs. of 5 variables:
## $ n : num 1 1 2
## $ % : num 50 50 100
## $ val% : num 50 50 100
## $ %cum : num 50 100 100
## $ val%cum: num 50 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 2.38605 3.38605
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| 2.38605 |
1 |
0.5 |
1 |
| 2.87363 |
1 |
0.5 |
2 |
str(Freq_table)
## 'data.frame': 2 obs. of 4 variables:
## $ Distance: Factor w/ 2 levels "2.38605","2.87363": 1 2
## $ Freq : int 1 1
## $ Rel_Freq: num 0.5 0.5
## $ Cum_Freq: int 1 2
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Estadísticos
Personas afectadas por deslizamiento
summary(df_SJ$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.386 2.508 2.630 2.630 2.752 2.874
library(pastecs)
stat.desc(df_SJ)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State
## nbr.val 2.000000e+00 NA NA NA NA NA NA
## nbr.null 0.000000e+00 NA NA NA NA NA NA
## nbr.na 0.000000e+00 NA NA NA NA NA NA
## min 8.570000e+02 NA NA NA NA NA NA
## max 5.754000e+03 NA NA NA NA NA NA
## range 4.897000e+03 NA NA NA NA NA NA
## sum 6.611000e+03 NA NA NA NA NA NA
## median 3.305500e+03 NA NA NA NA NA NA
## mean 3.305500e+03 NA NA NA NA NA NA
## SE.mean 2.448500e+03 NA NA NA NA NA NA
## CI.mean.0.95 3.111114e+04 NA NA NA NA NA NA
## var 1.199030e+07 NA NA NA NA NA NA
## std.dev 3.462702e+03 NA NA NA NA NA NA
## coef.var 1.047558e+00 NA NA NA NA NA NA
## population City Distance location_description latitude
## nbr.val 2.000000 NA 2.0000000 NA 2.0000000
## nbr.null 0.000000 NA 0.0000000 NA 0.0000000
## nbr.na 0.000000 NA 0.0000000 NA 0.0000000
## min 1765.000000 NA 2.3860500 NA 13.2290000
## max 2184.000000 NA 2.8736300 NA 15.4210000
## range 419.000000 NA 0.4875800 NA 2.1920000
## sum 3949.000000 NA 5.2596800 NA 28.6500000
## median 1974.500000 NA 2.6298400 NA 14.3250000
## mean 1974.500000 NA 2.6298400 NA 14.3250000
## SE.mean 209.500000 NA 0.2437900 NA 1.0960000
## CI.mean.0.95 2661.949892 NA 3.0976457 NA 13.9260004
## var 87780.500000 NA 0.1188671 NA 2.4024320
## std.dev 296.277741 NA 0.3447711 NA 1.5499781
## coef.var 0.150052 NA 0.1310997 NA 0.1082009
## longitude geolocation hazard_type landslide_type landslide_size
## nbr.val 2.000000 NA NA NA NA
## nbr.null 0.000000 NA NA NA NA
## nbr.na 0.000000 NA NA NA NA
## min -61.428500 NA NA NA NA
## max -59.540000 NA NA NA NA
## range 1.888500 NA NA NA NA
## sum -120.968500 NA NA NA NA
## median -60.484250 NA NA NA NA
## mean -60.484250 NA NA NA NA
## SE.mean 0.944250 NA NA NA NA
## CI.mean.0.95 11.997834 NA NA NA NA
## var 1.783216 NA NA NA NA
## std.dev 1.335371 NA NA NA NA
## coef.var -0.022078 NA NA NA NA
## trigger storm_name injuries fatalities source_name source_link
## nbr.val NA NA 1 1 NA NA
## nbr.null NA NA 1 1 NA NA
## nbr.na NA NA 1 1 NA NA
## min NA NA 0 0 NA NA
## max NA NA 0 0 NA NA
## range NA NA 0 0 NA NA
## sum NA NA 0 0 NA NA
## median NA NA 0 0 NA NA
## mean NA NA 0 0 NA NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NA NaN NaN NA NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[4] <- "Continent"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[2] <- "Date"
Belize
library(readr)
library(knitr)
df_BZ <- subset (df, Country == "Belize")
knitr::kable(head(df_BZ, n=4))
df_BZ %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 1593 Belize Cayo Belmopan 9.71758 11/24/15
head(df_BZ)
## id Date time Continent Country country_code State population
## 1593 7437 11/24/15 10:00 <NA> Belize BZ Cayo 13381
## City Distance location_description latitude longitude
## 1593 Belmopan 9.71758 Above road 17.2183 -88.8519
## geolocation hazard_type landslide_type
## 1593 (17.218299999999999, -88.851900000000001) Landslide Rockfall
## landslide_size trigger storm_name injuries fatalities source_name
## 1593 Small Mining digging 0 0 Plus TV
## source_link
## 1593 http://www.plustvbelize.com/landslide-in-arizona-village-blocks-road-for-hours/
Deslizamentos por estado
library(ggplot2)
ggplot(data=df_BZ, aes(x = "Belize", y = Distance, fill=State)) +
geom_bar(stat = "identity", width = 1, color = "black") +
coord_polar("y", start = 0)

ggplot(data=df_BZ, aes(fill=State, y=Distance, x="Belize")) +
geom_bar(position="dodge", stat="identity")

Cayo
Deslizamientos de las ciudades de Cayo
library(readr)
library(knitr)
df_CY <- subset (df, State == "Cayo")
df_CY %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 1593 Belize Cayo Belmopan 9.71758 11/24/15
head(df_CY, n=4)
## id Date time Continent Country country_code State population
## 1593 7437 11/24/15 10:00 <NA> Belize BZ Cayo 13381
## City Distance location_description latitude longitude
## 1593 Belmopan 9.71758 Above road 17.2183 -88.8519
## geolocation hazard_type landslide_type
## 1593 (17.218299999999999, -88.851900000000001) Landslide Rockfall
## landslide_size trigger storm_name injuries fatalities source_name
## 1593 Small Mining digging 0 0 Plus TV
## source_link
## 1593 http://www.plustvbelize.com/landslide-in-arizona-village-blocks-road-for-hours/
ggplot(data=df_CY, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="blue", fill="white")

Gráfico circular
ggplot(df_CY,aes(x="Cayo",y=Distance, fill=City))+
geom_bar(stat = "identity",
color="white")+
geom_text(aes(label=(Distance*1)),
position=position_stack(vjust=0.5),color="white",size=4)+
coord_polar(theta = "y")+
labs(title="Gráfico de Deslizamiento")

Diagrama de pareto
Cuidad con mayor deslizamiento
library(qcc)
Distance <- df_CY$Distance
names(Distance) <- df_CY$City
pareto.chart(Distance,
ylab="Distance",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "CIUDADES CON MAYORES DESLIZAMIENTOS"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Belmopan 9.71758 9.71758 100.00000 100.00000
Diagrama de tallo y hojas
stem(df_CY$"Distance")
stem(df_CY$"Distance")
stem(df_CY$"Distance", scale = 2)
Series temporales
library(forecast)
data_serie<- ts(df_CY$Distance, frequency=12, start=2007)
head(data_serie)
## Jan
## 2007 9.71758
autoplot(data_serie)+
labs(title = "Serie de Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_bw()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Tablas de frecuencia
library(questionr)
table <- questionr::freq(Distance, cum = TRUE, sort = "dec", total = TRUE)
knitr::kable(table)
| 9.71758 |
1 |
100 |
100 |
100 |
100 |
| Total |
1 |
100 |
100 |
100 |
100 |
str(table)
## Classes 'freqtab' and 'data.frame': 2 obs. of 5 variables:
## $ n : num 1 1
## $ % : num 100 100
## $ val% : num 100 100
## $ %cum : num 100 100
## $ val%cum: num 100 100
x <- row.names(table)
y <- table$n
names <- x[1:(length(x)-1)]
freqs <- y[1:(length(y)-1)]
df <- data.frame(x = names, y = freqs)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="white", fill="blue") +
xlab("Número de asistencias") +
ylab("Frecuencia")

Tabla de frecuencia agrupada
n_sturges = 1 + log(length(Distance))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Distance) - min(Distance)
w = ceiling(R/n_clases)
bins <- seq(min(Distance), max(Distance) + w, by = w)
bins
## [1] 9.71758
Edades <- cut(Distance, bins)
Freq_table <- transform(table(Distance), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
str(Freq_table)
## 'data.frame': 1 obs. of 4 variables:
## $ Distance: Factor w/ 1 level "9.71758": 1
## $ Freq : int 1
## $ Rel_Freq: num 1
## $ Cum_Freq: int 1
df <- data.frame(x = Freq_table$Distance, y = Freq_table$Freq)
knitr::kable(df)
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Rango de Distance") +
ylab("Frecuencia")

Personas afectadas por deslizamiento
summary(df_CY$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.718 9.718 9.718 9.718 9.718 9.718
library(pastecs)
stat.desc(df_CY)
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## Warning in qt((0.5 + p/2), (Nbrval - 1)): NaNs produced
## id Date time Continent Country country_code State population
## nbr.val 1 NA NA NA NA NA NA 1
## nbr.null 0 NA NA NA NA NA NA 0
## nbr.na 0 NA NA NA NA NA NA 0
## min 7437 NA NA NA NA NA NA 13381
## max 7437 NA NA NA NA NA NA 13381
## range 0 NA NA NA NA NA NA 0
## sum 7437 NA NA NA NA NA NA 13381
## median 7437 NA NA NA NA NA NA 13381
## mean 7437 NA NA NA NA NA NA 13381
## SE.mean NA NA NA NA NA NA NA NA
## CI.mean.0.95 NaN NA NA NA NA NA NA NaN
## var NA NA NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA NA NA
## City Distance location_description latitude longitude geolocation
## nbr.val NA 1.00000 NA 1.0000 1.0000 NA
## nbr.null NA 0.00000 NA 0.0000 0.0000 NA
## nbr.na NA 0.00000 NA 0.0000 0.0000 NA
## min NA 9.71758 NA 17.2183 -88.8519 NA
## max NA 9.71758 NA 17.2183 -88.8519 NA
## range NA 0.00000 NA 0.0000 0.0000 NA
## sum NA 9.71758 NA 17.2183 -88.8519 NA
## median NA 9.71758 NA 17.2183 -88.8519 NA
## mean NA 9.71758 NA 17.2183 -88.8519 NA
## SE.mean NA NA NA NA NA NA
## CI.mean.0.95 NA NaN NA NaN NaN NA
## var NA NA NA NA NA NA
## std.dev NA NA NA NA NA NA
## coef.var NA NA NA NA NA NA
## hazard_type landslide_type landslide_size trigger storm_name
## nbr.val NA NA NA NA NA
## nbr.null NA NA NA NA NA
## nbr.na NA NA NA NA NA
## min NA NA NA NA NA
## max NA NA NA NA NA
## range NA NA NA NA NA
## sum NA NA NA NA NA
## median NA NA NA NA NA
## mean NA NA NA NA NA
## SE.mean NA NA NA NA NA
## CI.mean.0.95 NA NA NA NA NA
## var NA NA NA NA NA
## std.dev NA NA NA NA NA
## coef.var NA NA NA NA NA
## injuries fatalities source_name source_link
## nbr.val 1 1 NA NA
## nbr.null 1 1 NA NA
## nbr.na 0 0 NA NA
## min 0 0 NA NA
## max 0 0 NA NA
## range 0 0 NA NA
## sum 0 0 NA NA
## median 0 0 NA NA
## mean 0 0 NA NA
## SE.mean NA NA NA NA
## CI.mean.0.95 NaN NaN NA NA
## var NA NA NA NA
## std.dev NA NA NA NA
## coef.var NA NA NA NA
Caja y extensión
boxplot(Distance, horizontal=TRUE, col='steelblue')

library(tidyverse)
library(hrbrthemes)
library(viridis)
df <- data.frame(Distance)
df %>% ggplot(aes(x = "", y = Distance)) +
geom_boxplot(color="red", fill="orange", alpha=0.5) +
theme_ipsum() +
theme(legend.position="none", plot.title = element_text(size=11)) +
ggtitle("Deslizamientos") +
coord_flip() +
xlab("") +
ylab("")
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

Guatemala
library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[2] <- "Date"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[8] <- "Population"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[18] <- "Trigger"
colnames(df)[21] <- "Fatalities"
library(readr)
library(knitr)
df_GT <- subset (df, Country == "Guatemala")
knitr::kable(head(df_GT))
Gráfico de barras clásico
df_GT %>%
select(Country, Distance, Trigger)
## Country Distance Trigger
## 17 Guatemala 4.74385 Rain
## 27 Guatemala 13.39817 Tropical cyclone
## 28 Guatemala 12.55184 Tropical cyclone
## 41 Guatemala 2.79113 Rain
## 104 Guatemala 3.10150 Tropical cyclone
## 108 Guatemala 3.12614 Rain
## 120 Guatemala 0.80640 Tropical cyclone
## 158 Guatemala 5.31511 Tropical cyclone
## 162 Guatemala 1.58358 Downpour
## 169 Guatemala 23.92309 Rain
## 351 Guatemala 0.77254 Tropical cyclone
## 353 Guatemala 0.18542 Tropical cyclone
## 354 Guatemala 2.02891 Tropical cyclone
## 355 Guatemala 0.44764 Tropical cyclone
## 356 Guatemala 6.13527 Tropical cyclone
## 357 Guatemala 4.07930 Tropical cyclone
## 358 Guatemala 6.00513 Tropical cyclone
## 359 Guatemala 0.99952 Tropical cyclone
## 360 Guatemala 0.50611 Tropical cyclone
## 361 Guatemala 0.89040 Tropical cyclone
## 362 Guatemala 8.93658 Tropical cyclone
## 363 Guatemala 0.17513 Tropical cyclone
## 372 Guatemala 3.85753 Tropical cyclone
## 383 Guatemala 3.85648 Downpour
## 427 Guatemala 2.10418 Downpour
## 428 Guatemala 3.64749 Downpour
## 429 Guatemala 2.81128 Downpour
## 430 Guatemala 6.15103 Downpour
## 431 Guatemala 0.03280 Downpour
## 432 Guatemala 0.00359 Downpour
## 433 Guatemala 2.30104 Downpour
## 437 Guatemala 3.04642 Downpour
## 438 Guatemala 0.92729 Downpour
## 439 Guatemala 21.83272 Downpour
## 440 Guatemala 0.63089 Downpour
## 441 Guatemala 1.36473 Downpour
## 442 Guatemala 0.35171 Downpour
## 818 Guatemala 0.45507 Rain
## 885 Guatemala 7.39906 Downpour
## 1112 Guatemala 0.96647 Earthquake
## 1244 Guatemala 0.91108 Downpour
## 1347 Guatemala 7.03115 Rain
## 1352 Guatemala 5.88787 Rain
## 1353 Guatemala 2.70053 Rain
## 1354 Guatemala 2.59620 Rain
## 1356 Guatemala 22.56101 Downpour
## 1357 Guatemala 4.51954 Downpour
## 1358 Guatemala 3.30989 Downpour
## 1359 Guatemala 5.94535 Downpour
## 1360 Guatemala 3.98185 Rain
## 1361 Guatemala 0.75729 Rain
## 1557 Guatemala 0.94245 Rain
## 1559 Guatemala 3.96161 Tropical cyclone
## 1560 Guatemala 0.82332 Tropical cyclone
## 1561 Guatemala 3.47803 Rain
## 1568 Guatemala 6.19218 Rain
## 1569 Guatemala 5.52205 Rain
## 1570 Guatemala 1.87009 Rain
## 1571 Guatemala 4.20726 Rain
## 1572 Guatemala 3.18658 Rain
## 1573 Guatemala 0.67040 Rain
## 1574 Guatemala 3.80312 Rain
## 1575 Guatemala 1.68290 Rain
## 1576 Guatemala 2.08425 Rain
## 1577 Guatemala 3.25675 Rain
## 1578 Guatemala 3.49341 Rain
## 1579 Guatemala 1.83863 Rain
## 1580 Guatemala 1.57381 Rain
## 1581 Guatemala 1.70147 Rain
## 1582 Guatemala 3.00314 Rain
## 1583 Guatemala 2.27725 Rain
## 1584 Guatemala 2.36376 Rain
## 1585 Guatemala 2.66358 Rain
## 1588 Guatemala 1.45200 Continuous rain
## 1589 Guatemala 5.14479 Rain
## 1590 Guatemala 8.25465 Unknown
## 1591 Guatemala 0.65744 Rain
## 1592 Guatemala 0.75685 Rain
## 1595 Guatemala 1.81216 Rain
library(ggplot2)
library(dplyr)
ggplot(data=df_GT, aes(fill=Trigger, x="Guatemala", y=Distance)) +
geom_bar(position="dodge", stat="identity")

Gráfico circular
library(ggplot2)
library(dplyr)
data <- data.frame(Trigger =
c("Tropical cyclone",
"Rain",
"Downpour",
"Earthquake",
"Continuous rain",
"Unknown"),
value = c(20, 35, 21, 1, 1, 1))
knitr::kable(data)
| Tropical cyclone |
20 |
| Rain |
35 |
| Downpour |
21 |
| Earthquake |
1 |
| Continuous rain |
1 |
| Unknown |
1 |
library(ggplot2)
library(dplyr)
data <- data %>%
arrange(desc(Trigger)) %>%
mutate(prop = value / sum(data$value) *78) %>%
mutate(ypos = cumsum(prop)- 0.5*prop )
require(scales)
## Loading required package: scales
##
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
##
## viridis_pal
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
ggplot(data, aes(x="Trigger", y = value, fill=Trigger)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
theme_void() +
theme(legend.position="none") +
geom_text(aes(y = ypos, label = percent(value/100)), color = "white", size=6) +
scale_fill_brewer(palette="Set1")+
labs(title="Trigger")

ggplot(data, aes(x = "Trigger", y = value, fill=Trigger)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0)

Diagrama de Pareto
df <- data.frame(Trigger =
c("Tropical cyclone",
"Rain",
"Downpour",
"Earthquake",
"Continuous rain",
"Unknown"),
Frequency = c(20, 35, 21, 1, 1, 1))
knitr::kable(df)
| Tropical cyclone |
20 |
| Rain |
35 |
| Downpour |
21 |
| Earthquake |
1 |
| Continuous rain |
1 |
| Unknown |
1 |
library(qcc)
Frequency <- df$Frequency
names(Frequency) <- df$Trigger
pareto.chart(Frequency,
ylab="Frequency",
col = heat.colors(length(Frequency)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Accumulated Percentage",
main = "Events that trigger landslides "
)

##
## Pareto chart analysis for Frequency
## Frequency Cum.Freq. Percentage Cum.Percent.
## Rain 35.000000 35.000000 44.303797 44.303797
## Downpour 21.000000 56.000000 26.582278 70.886076
## Tropical cyclone 20.000000 76.000000 25.316456 96.202532
## Earthquake 1.000000 77.000000 1.265823 97.468354
## Continuous rain 1.000000 78.000000 1.265823 98.734177
## Unknown 1.000000 79.000000 1.265823 100.000000
Haiti
library(readr)
library(knitr)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[2] <- "Date"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[8] <- "Population"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
colnames(df)[18] <- "Trigger"
colnames(df)[21] <- "Fatalities"
library(readr)
library(knitr)
df_HT <- subset (df, Country == "Haiti")
knitr::kable(df_HT)
| 43 |
297 |
10/8/07 |
|
NA |
Haiti |
HT |
Artibonite |
7294 |
Gros Morne |
8.70343 |
|
19.6990 |
-72.7540 |
(19.699000000000002, -72.754000000000005) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
NA |
|
https://www-secure.ifrc.org/dmis/prepare/view_report.asp?ReportID=3285 |
| 47 |
303 |
10/12/07 |
|
NA |
Haiti |
HT |
Ouest |
3951 |
Cabaret |
0.51272 |
|
18.7335 |
-72.4133 |
(18.733499999999999, -72.413300000000007) |
Landslide |
Complex |
Large |
Rain |
|
NA |
23 |
Euronews.net |
http://www.euronews.net/index.php?page=info&article=448067&lng=1 |
| 53 |
334 |
10/29/07 |
|
NA |
Haiti |
HT |
Ouest |
1234742 |
Port-au-Prince |
2.72168 |
|
18.5146 |
-72.3361 |
(18.514600000000002, -72.336100000000002) |
Landslide |
Complex |
Medium |
Tropical cyclone |
Tropical Storm Noel |
NA |
NA |
ABC news |
http://www.abcnews.go.com/International/wireStory?id=3807131 |
| 94 |
506 |
4/20/08 |
|
NA |
Haiti |
HT |
Ouest |
1234742 |
Port-au-Prince |
1.80063 |
|
18.5283 |
-72.3224 |
(18.528300000000002, -72.322400000000002) |
Landslide |
Mudslide |
Medium |
Rain |
|
NA |
3 |
|
http://www.news.com.au/heraldsun/story/0,21985,23596379-5005961,00.html |
| 139 |
747 |
8/26/08 |
|
NA |
Haiti |
HT |
Sud-Est |
137966 |
Jacmel |
4.41574 |
|
18.2640 |
-72.5070 |
(18.263999999999999, -72.507000000000005) |
Landslide |
Landslide |
Medium |
Tropical cyclone |
Hurricane Gustav |
NA |
25 |
|
http://ap.google.com/article/ALeqM5gVWjsPEiqe1tEu2mhBIRaxxGi8owD92RGO9O1 |
| 140 |
748 |
8/26/08 |
|
NA |
Haiti |
HT |
Ouest |
1234742 |
Port-au-Prince |
3.50201 |
|
18.5090 |
-72.3450 |
(18.509, -72.344999999999999) |
Landslide |
Mudslide |
Medium |
Tropical cyclone |
Hurricane Gustav |
NA |
3 |
|
http://www.reuters.com/article/worldNews/idUSN2541891320080827?pageNumber=1&virtualBrandChannel=0 |
| 145 |
771 |
9/3/08 |
|
NA |
Haiti |
HT |
Artibonite |
84961 |
Gonaïves |
4.72379 |
|
19.4300 |
-72.6480 |
(19.43, -72.647999999999996) |
Landslide |
Mudslide |
Medium |
Tropical cyclone |
Hurricane Hannah |
NA |
26 |
|
http://www.miamiherald.com/news/americas/cuba/story/671682.html |
| 208 |
1140 |
9/7/09 |
Early morning |
NA |
Haiti |
HT |
Artibonite |
66226 |
Saint-Marc |
17.29836 |
|
18.9523 |
-72.7053 |
(18.952300000000001, -72.705299999999994) |
Landslide |
Mudslide |
Medium |
Downpour |
|
NA |
1 |
|
http://www.google.com/hostednews/ap/article/ALeqM5hdjzxxFRHymhlrd1BpUjDSV3HK6AD9AIQ5OO0 |
| 223 |
1266 |
10/20/09 |
|
NA |
Haiti |
HT |
Ouest |
442156 |
Carrefour |
1.31659 |
|
18.5347 |
-72.4097 |
(18.534700000000001, -72.409700000000001) |
Landslide |
Landslide |
Small |
Downpour |
|
NA |
4 |
|
http://www.etaiwannews.com/etn/news_content.php?id=1088959&lang=eng_news |
| 264 |
1506 |
2/15/10 |
12:00 |
NA |
Haiti |
HT |
Nord |
134815 |
Cap-Haïtien |
0.27505 |
Urban area |
19.7560 |
-72.2060 |
(19.756, -72.206000000000003) |
Landslide |
Mudslide |
Medium |
Downpour |
|
NA |
4 |
Times Live |
http://www.timeslive.co.za/world/article311411.ece |
| 471 |
2528 |
10/1/10 |
|
NA |
Haiti |
HT |
Ouest |
442156 |
Carrefour |
12.13199 |
|
18.4468 |
-72.4577 |
(18.4468, -72.457700000000003) |
Landslide |
Mudslide |
Medium |
Downpour |
|
NA |
3 |
|
http://www.presstv.ir/detail/144854.html |
| 481 |
2604 |
10/17/10 |
|
NA |
Haiti |
HT |
Ouest |
134190 |
Léogâne |
7.67473 |
|
18.4674 |
-72.5738 |
(18.467400000000001, -72.573800000000006) |
Landslide |
Complex |
Medium |
Downpour |
|
NA |
8 |
|
http://edition.cnn.com/2010/WORLD/americas/10/19/haiti.flooding/ |
| 482 |
2605 |
10/17/10 |
|
NA |
Haiti |
HT |
Ouest |
442156 |
Carrefour |
2.63565 |
|
18.5202 |
-72.4111 |
(18.520199999999999, -72.411100000000005) |
Landslide |
Mudslide |
Medium |
Downpour |
|
NA |
2 |
|
http://www.npr.org/templates/story/story.php?storyId=130649188 |
| 748 |
3563 |
6/2/11 |
|
NA |
Haiti |
HT |
Sud-Est |
137966 |
Jacmel |
0.19079 |
|
18.2348 |
-72.5364 |
(18.2348, -72.5364) |
Landslide |
Landslide |
Small |
Downpour |
|
NA |
0 |
|
http://www.haitilibre.com/en/news-3095-haiti-climate-the-situation-by-department.html |
| 749 |
3564 |
6/2/11 |
|
NA |
Haiti |
HT |
Centre |
18590 |
Hinche |
7.86436 |
|
19.2088 |
-71.9747 |
(19.2088, -71.974699999999999) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
1 |
|
http://www.haitilibre.com/en/news-3095-haiti-climate-the-situation-by-department.html |
| 754 |
3576 |
6/7/11 |
|
NA |
Haiti |
HT |
Ouest |
283052 |
Pétionville |
0.11071 |
|
18.5135 |
-72.2853 |
(18.513500000000001, -72.285300000000007) |
Landslide |
Landslide |
Large |
Downpour |
|
NA |
13 |
|
http://www.bbc.co.uk/news/world-latin-america-13689711 |
| 873 |
4289 |
3/30/12 |
Late night |
NA |
Haiti |
HT |
Ouest |
283052 |
Pétionville |
1.33931 |
|
18.5044 |
-72.2947 |
(18.5044, -72.294700000000006) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
6 |
|
http://www.haitilibre.com/en/news-5290-haiti-weather-first-drama-of-the-rain.html |
| 875 |
4312 |
4/8/12 |
|
NA |
Haiti |
HT |
Nord |
32645 |
Limbé |
0.03471 |
|
19.7041 |
-72.4006 |
(19.7041, -72.400599999999997) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
2 |
|
http://www.usatoday.com/news/world/story/2012-04-10/Haiti-floods/54160810/1 |
| 1401 |
6713 |
11/1/14 |
|
NA |
Haiti |
HT |
Nord |
134815 |
Okap |
5.23459 |
Urban area |
19.7450 |
-72.2152 |
(19.745000000000001, -72.215199999999996) |
Landslide |
Landslide |
Medium |
Downpour |
|
0 |
1 |
reliefweb |
http://reliefweb.int/report/haiti/undp-government-haiti-provide-immediate-support-flood-affected-victims |
| 1402 |
6722 |
5/27/14 |
|
NA |
Haiti |
HT |
Nord |
134815 |
Okap |
1.58489 |
Unknown |
19.7698 |
-72.2085 |
(19.7698, -72.208500000000001) |
Landslide |
Landslide |
Small |
Continuous rain |
|
1 |
3 |
Business Recorder |
http://www.brecorder.com/world/north-america/15393-three-children-die-in-haiti-landslide.html |
Gráfico apilado
df_HT %>%
select(Country, City, Distance)
## Country City Distance
## 43 Haiti Gros Morne 8.70343
## 47 Haiti Cabaret 0.51272
## 53 Haiti Port-au-Prince 2.72168
## 94 Haiti Port-au-Prince 1.80063
## 139 Haiti Jacmel 4.41574
## 140 Haiti Port-au-Prince 3.50201
## 145 Haiti Gonaïves 4.72379
## 208 Haiti Saint-Marc 17.29836
## 223 Haiti Carrefour 1.31659
## 264 Haiti Cap-Haïtien 0.27505
## 471 Haiti Carrefour 12.13199
## 481 Haiti Léogâne 7.67473
## 482 Haiti Carrefour 2.63565
## 748 Haiti Jacmel 0.19079
## 749 Haiti Hinche 7.86436
## 754 Haiti Pétionville 0.11071
## 873 Haiti Pétionville 1.33931
## 875 Haiti Limbé 0.03471
## 1401 Haiti Okap 5.23459
## 1402 Haiti Okap 1.58489
library(ggplot2)
library(dplyr)
ggplot(data=df_HT, aes(fill=City, x="Haiti", y=Distance)) +
geom_bar(position="stack", stat="identity")

Gráfico agrupado
df_HT %>%
select(Country, Fatalities)
## Country Fatalities
## 43 Haiti NA
## 47 Haiti 23
## 53 Haiti NA
## 94 Haiti 3
## 139 Haiti 25
## 140 Haiti 3
## 145 Haiti 26
## 208 Haiti 1
## 223 Haiti 4
## 264 Haiti 4
## 471 Haiti 3
## 481 Haiti 8
## 482 Haiti 2
## 748 Haiti 0
## 749 Haiti 1
## 754 Haiti 13
## 873 Haiti 6
## 875 Haiti 2
## 1401 Haiti 1
## 1402 Haiti 3
Fatalities <- c(0, 23, 0, 3, 25, 3, 26, 1, 4, 4, 3, 8, 2, 0, 1, 13, 6, 2, 1, 3)
knitr::kable(head(Fatalities))
n_sturges = 1 + log(length(Fatalities))/log(2)
n_sturgesc = ceiling(n_sturges)
n_sturgesf = floor(n_sturges)
n_clases = 0
if (n_sturgesc%%2 == 0) {
n_clases = n_sturgesf
} else {
n_clases = n_sturgesc
}
R = max(Fatalities) - min(Fatalities)
w = ceiling(R/n_clases)
bins <- seq(min(Fatalities), max(Fatalities) + w, by = w)
bins
## [1] 0 6 12 18 24 30
Fatalities <- cut(Fatalities, bins)
Freq_table <- transform(table(Fatalities), Rel_Freq=prop.table(Freq), Cum_Freq=cumsum(Freq))
knitr::kable(Freq_table)
| (0,6] |
12 |
0.7058824 |
12 |
| (6,12] |
1 |
0.0588235 |
13 |
| (12,18] |
1 |
0.0588235 |
14 |
| (18,24] |
1 |
0.0588235 |
15 |
| (24,30] |
2 |
0.1176471 |
17 |
str(Freq_table)
## 'data.frame': 5 obs. of 4 variables:
## $ Fatalities: Factor w/ 5 levels "(0,6]","(6,12]",..: 1 2 3 4 5
## $ Freq : int 12 1 1 1 2
## $ Rel_Freq : num 0.7059 0.0588 0.0588 0.0588 0.1176
## $ Cum_Freq : int 12 13 14 15 17
df <- data.frame(x = Freq_table$Fatalities, y = Freq_table$Freq)
knitr::kable(df)
| (0,6] |
12 |
| (6,12] |
1 |
| (12,18] |
1 |
| (18,24] |
1 |
| (24,30] |
2 |
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_bar(stat="identity", color="blue", fill="green") +
xlab("Fatalities") +
ylab("Frecuencia")

Dominica
library(readr)
library(knitr)
library(ggplot2)
df <- read.csv("https://raw.githubusercontent.com/lihkir/AnalisisEstadisticoUN/main/Data/catalog.csv")
library(dplyr)
colnames(df)[2] <- "Date"
colnames(df)[5] <- "Country"
colnames(df)[7] <- "State"
colnames(df)[8] <- "Population"
colnames(df)[9] <- "City"
colnames(df)[10] <- "Distance"
library(readr)
library(knitr)
df_DOM <- subset (df, Country == "Dominica")
knitr::kable(head(df_DOM))
df_DOM %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 20 Dominica Saint Paul Pont Cassé 3.39516 8/17/07
## 39 Dominica Saint George Roseau 2.59849 9/9/07
## 267 Dominica Saint Paul Pont Cassé 3.98646 3/11/10
## 297 Dominica Saint Patrick Berekua 2.08997 4/12/10
## 298 Dominica Saint Paul Pont Cassé 3.78784 4/12/10
## 299 Dominica Saint Patrick Berekua 4.08252 4/12/10
## 300 Dominica Saint Patrick Berekua 5.61495 4/12/10
## 301 Dominica Saint Patrick La Plaine 5.11600 4/12/10
## 304 Dominica Saint Paul Pont Cassé 6.45930 4/16/10
## 476 Dominica Saint Andrew Calibishie 2.64873 10/5/10
## 1190 Dominica Saint Paul Pont Cassé 4.20239 12/24/13
## 1193 Dominica Saint John Portsmouth 5.92994 12/24/13
## 1194 Dominica Saint Mark Soufrière 1.80847 12/24/13
## 1201 Dominica Saint Joseph Saint Joseph 2.38605 1/7/14
head(df_DOM)
## id Date time continent_code Country country_code State
## 20 186 8/17/07 <NA> Dominica DM Saint Paul
## 39 250 9/9/07 <NA> Dominica DM Saint George
## 267 1552 3/11/10 <NA> Dominica DM Saint Paul
## 297 1743 4/12/10 <NA> Dominica DM Saint Patrick
## 298 1744 4/12/10 <NA> Dominica DM Saint Paul
## 299 1745 4/12/10 <NA> Dominica DM Saint Patrick
## Population City Distance location_description latitude longitude
## 20 702 Pont Cassé 3.39516 15.3379 -61.3610
## 39 16571 Roseau 2.59849 15.3055 -61.3642
## 267 702 Pont Cassé 3.98646 15.3356 -61.3312
## 297 2608 Berekua 2.08997 15.2454 -61.3017
## 298 702 Pont Cassé 3.78784 15.4004 -61.3440
## 299 2608 Berekua 4.08252 15.2458 -61.2809
## geolocation hazard_type landslide_type
## 20 (15.337899999999999, -61.360999999999997) Landslide Mudslide
## 39 (15.3055, -61.364199999999997) Landslide Landslide
## 267 (15.335599999999999, -61.331200000000003) Landslide Landslide
## 297 (15.2454, -61.301699999999997) Landslide Landslide
## 298 (15.400399999999999, -61.344000000000001) Landslide Landslide
## 299 (15.245799999999999, -61.280900000000003) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 20 Small Tropical cyclone Hurricane Dean NA 2
## 39 Medium Rain Tropical Wave NA NA
## 267 Medium Rain NA 0
## 297 Medium Downpour NA 0
## 298 Medium Downpour NA 0
## 299 Small Downpour NA 0
## source_name
## 20 Tribune India
## 39 RadioJamaica
## 267
## 297
## 298
## 299
## source_link
## 20 http://www.tribuneindia.com/2007/20070817/himachal.htm
## 39 http://www.radiojamaica.com/content/view/1156/88/
## 267 http://stormcarib.com/reports/current/report.php?id=1268397271_8827
## 297 http://www.dominicacentral.com/general/community/heavy-overnight-rains-cause-landslides-across-island.html
## 298 http://www.dominicacentral.com/general/community/heavy-overnight-rains-cause-landslides-across-island.html
## 299 http://www.dominicacentral.com/general/community/heavy-overnight-rains-cause-landslides-across-island.html
Grafica de Pareto
Utilizada para representar a las ciudades con mayor deslizamiento.
library(qcc)
Distance <- df_DOM$Distance
names(Distance) <- df_DOM$City
pareto.chart(Distance,
ylab="Distancia",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "Ciudades con mayor deslizamientos"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Pont Cassé 6.459300 6.459300 11.938173 11.938173
## Portsmouth 5.929940 12.389240 10.959802 22.897975
## Berekua 5.614950 18.004190 10.377633 33.275607
## La Plaine 5.116000 23.120190 9.455466 42.731073
## Pont Cassé 4.202390 27.322580 7.766919 50.497992
## Berekua 4.082520 31.405100 7.545373 58.043365
## Pont Cassé 3.986460 35.391560 7.367834 65.411199
## Pont Cassé 3.787840 39.179400 7.000741 72.411940
## Pont Cassé 3.395160 42.574560 6.274984 78.686925
## Calibishie 2.648730 45.223290 4.895422 83.582346
## Roseau 2.598490 47.821780 4.802567 88.384914
## Saint Joseph 2.386050 50.207830 4.409933 92.794846
## Berekua 2.089970 52.297800 3.862713 96.657559
## Soufrière 1.808470 54.106270 3.342441 100.000000
library(forecast)
data_serie<- ts(df_DOM$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 3.39516 2.59849 3.98646 2.08997 3.78784 4.08252
Serie de tiempo
Representa el deslizamiento en Dominica a través de los años.
autoplot(data_serie)+
labs(title = "Deslizamiento", x="Años", y = "Distancia", colour = "#00a0dc") +theme_grey()

Salvador
library(readr)
library(knitr)
df_SV <- subset (df, Country == "El Salvador")
knitr::kable(head(df_DOM))
df_SV %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 34 El Salvador Ahuachapán Concepción de Ataco 0.00273 9/5/07
## 105 El Salvador La Libertad Santa Tecla 4.96416 6/2/08
## 224 El Salvador San Vicente San Vicente 7.60946 11/8/09
## 225 El Salvador La Libertad Antiguo Cuscatlán 4.86219 11/8/09
## 226 El Salvador San Vicente San Vicente 5.90726 11/8/09
## 227 El Salvador San Vicente San Vicente 4.03125 11/8/09
## 453 El Salvador Ahuachapán Tacuba 5.29901 9/26/10
## 824 El Salvador San Salvador Apopa 3.01739 10/10/11
## 1294 El Salvador San Miguel Chirilagua 6.94536 10/13/14
## 1366 El Salvador San Miguel San Rafael Oriente 10.06695 5/22/14
## 1367 El Salvador Cabañas San MartÃn 8.82525 4/21/14
## 1369 El Salvador Sonsonate Nahuizalco 4.23875 10/15/14
## 1370 El Salvador Sonsonate Sonzacate 3.22235 10/15/14
## 1371 El Salvador La Paz San Pedro Masahuat 0.31933 10/15/14
## 1372 El Salvador San Miguel Chirilagua 9.97227 10/15/14
## 1373 El Salvador Santa Ana Coatepeque 8.83210 10/12/14
## 1374 El Salvador La Libertad Santa Tecla 4.60655 10/12/14
## 1375 El Salvador San Salvador Antiguo Cuscatlán 3.25227 10/12/14
## 1594 El Salvador Santa Ana Ciudad Arce 1.15810 7/18/15
## 1596 El Salvador La Libertad Santa Tecla 4.67722 11/3/15
## 1597 El Salvador La Libertad Santa Tecla 9.87553 11/4/15
## 1598 El Salvador Sonsonate Juayúa 0.49346 10/19/15
head(df_SV)
## id Date time continent_code Country country_code State
## 34 230 9/5/07 <NA> El Salvador SV Ahuachapán
## 105 564 6/2/08 <NA> El Salvador SV La Libertad
## 224 1285 11/8/09 <NA> El Salvador SV San Vicente
## 225 1286 11/8/09 <NA> El Salvador SV La Libertad
## 226 1287 11/8/09 <NA> El Salvador SV San Vicente
## 227 1288 11/8/09 <NA> El Salvador SV San Vicente
## Population City Distance location_description latitude
## 34 7797 Concepción de Ataco 0.00273 13.8703
## 105 124694 Santa Tecla 4.96416 13.7205
## 224 41504 San Vicente 7.60946 13.6409
## 225 33767 Antiguo Cuscatlán 4.86219 13.7156
## 226 41504 San Vicente 5.90726 13.6094
## 227 41504 San Vicente 4.03125 13.6466
## longitude geolocation hazard_type
## 34 -89.8486 (13.8703, -89.848600000000005) Landslide
## 105 -89.2687 (13.720499999999999, -89.268699999999995) Landslide
## 224 -88.8699 (13.6409, -88.869900000000001) Landslide
## 225 -89.2521 (13.7156, -89.252099999999999) Landslide
## 226 -88.8488 (13.609400000000001, -88.848799999999997) Landslide
## 227 -88.8347 (13.646599999999999, -88.834699999999998) Landslide
## landslide_type landslide_size trigger storm_name
## 34 Mudslide Medium Tropical cyclone Hurricane Felix
## 105 Landslide Medium Tropical cyclone Tropical Storm Arthur
## 224 Complex Very_large Tropical cyclone Tropical Cyclone Ida
## 225 Mudslide Medium Tropical cyclone Tropical Cyclone Ida
## 226 Rockfall Medium Tropical cyclone Tropical Cyclone Ida
## 227 Mudslide Medium Tropical cyclone Tropical Cyclone Ida
## injuries fatalities source_name
## 34 NA NA Azcentral.com
## 105 NA NA
## 224 NA 23
## 225 NA 4
## 226 NA NA
## 227 NA NA
## source_link
## 34 http://www.azcentral.com/news/articles/1108sr-fhsistercity1109-ON.html
## 105 http://news.xinhuanet.com/english/2008-06/04/content_8310737.htm
## 224 http://www.google.com/hostednews/ap/article/ALeqM5j0XCCb1n12DyhoBoDzGj_hTyEtrAD9BRKPRG0
## 225 http://www.google.com/hostednews/ap/article/ALeqM5j0XCCb1n12DyhoBoDzGj_hTyEtrAD9BRKPRG0
## 226 http://news.bbc.co.uk/2/hi/in_depth/8349333.stm
## 227 http://news.yahoo.com/s/afp/20091109/wl_afp/salvadorweatherstorm_20091109100952
Grafica de Pareto, utilizada para representar a las ciudades con mayor deslizamiento.
library(qcc)
Distance <- df_SV$Distance
names(Distance) <- df_SV$City
pareto.chart(Distance,
ylab="Distancia",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "Ciudades con mayor deslizamientos"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## San Rafael Oriente 1.006695e+01 1.006695e+01 8.974011e+00 8.974011e+00
## Chirilagua 9.972270e+00 2.003922e+01 8.889610e+00 1.786362e+01
## Santa Tecla 9.875530e+00 2.991475e+01 8.803373e+00 2.666699e+01
## Coatepeque 8.832100e+00 3.874685e+01 7.873225e+00 3.454022e+01
## San MartÃn 8.825250e+00 4.757210e+01 7.867118e+00 4.240734e+01
## San Vicente 7.609460e+00 5.518156e+01 6.783323e+00 4.919066e+01
## Chirilagua 6.945360e+00 6.212692e+01 6.191323e+00 5.538198e+01
## San Vicente 5.907260e+00 6.803418e+01 5.265926e+00 6.064791e+01
## Tacuba 5.299010e+00 7.333319e+01 4.723712e+00 6.537162e+01
## Santa Tecla 4.964160e+00 7.829735e+01 4.425216e+00 6.979684e+01
## Antiguo Cuscatlán 4.862190e+00 8.315954e+01 4.334316e+00 7.413115e+01
## Santa Tecla 4.677220e+00 8.783676e+01 4.169428e+00 7.830058e+01
## Santa Tecla 4.606550e+00 9.244331e+01 4.106430e+00 8.240701e+01
## Nahuizalco 4.238750e+00 9.668206e+01 3.778561e+00 8.618557e+01
## San Vicente 4.031250e+00 1.007133e+02 3.593589e+00 8.977916e+01
## Antiguo Cuscatlán 3.252270e+00 1.039656e+02 2.899181e+00 9.267834e+01
## Sonzacate 3.222350e+00 1.071879e+02 2.872509e+00 9.555085e+01
## Apopa 3.017390e+00 1.102053e+02 2.689801e+00 9.824065e+01
## Ciudad Arce 1.158100e+00 1.113634e+02 1.032368e+00 9.927302e+01
## Juayúa 4.934600e-01 1.118569e+02 4.398865e-01 9.971291e+01
## San Pedro Masahuat 3.193300e-01 1.121762e+02 2.846613e-01 9.999757e+01
## Concepción de Ataco 2.730000e-03 1.121789e+02 2.433612e-03 1.000000e+02
library(forecast)
data_serie<- ts(df_SV$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 0.00273 4.96416 7.60946 4.86219 5.90726 4.03125
Esta serie de tiempo, representa el deslizamiento en Honduras a través de los años.
autoplot(data_serie)+
labs(title = "Deslizamiento", x="Años", y = "Distancia", colour = "#752514") +theme_grey()

Honduras
library(readr)
library(knitr)
df_HON <- subset (df, Country == "Honduras")
knitr::kable(head(df_DOM))
df_HON %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 159 Honduras Copán CorquÃn 0.43391 10/19/08
## 160 Honduras Francisco Morazán Tegucigalpa 2.99239 10/20/08
## 376 Honduras Francisco Morazán Tegucigalpa 0.98377 7/12/10
## 381 Honduras Francisco Morazán Tegucigalpa 1.24404 7/18/10
## 406 Honduras Francisco Morazán Tegucigalpa 2.21442 8/7/10
## 435 Honduras Francisco Morazán Santa LucÃa 4.75791 8/29/10
## 474 Honduras Comayagua El Rancho 4.53362 10/3/10
## 485 Honduras Colón Cusuna 36.37629 10/25/10
## 820 Honduras Francisco Morazán Tegucigalpa 1.23639 9/26/11
## 1100 Honduras Cortés Los Caminos 3.53737 8/29/13
## 1279 Honduras Choluteca Ciudad Choluteca 3.69596 7/2/14
## 1288 Honduras Yoro Yoro 0.31238 5/20/14
## 1363 Honduras Ocotepeque Sinuapa 2.00805 10/13/14
## 1377 Honduras Cortés Agua Azul Rancho 0.97057 7/31/14
## 1379 Honduras Santa Bárbara Agualote 2.91594 10/14/14
## 1599 Honduras El ParaÃso 1.90052 6/9/15
## 1602 Honduras Francisco Morazán El Lolo 1.85897 9/28/15
## 1603 Honduras Francisco Morazán Tegucigalpa 3.25281 9/28/15
## 1604 Honduras Choluteca Duyure 11.67237 6/11/15
## 1605 Honduras Choluteca Corpus 0.36987 12/15/15
## 1610 Honduras Comayagua El Sauce 7.28575 10/16/15
## 1611 Honduras Comayagua La Libertad 17.28613 10/29/15
## 1612 Honduras Comayagua Concepción de Guasistagua 8.52584 10/16/15
## 1613 Honduras Copán Santa Rosa de Copán 0.74414 9/6/15
## 1614 Honduras Copán Santa Rosa de Copán 0.28887 9/6/15
## 1615 Honduras Copán Ojos de Agua 1.39095 11/21/15
## 1616 Honduras La Paz San José 4.69133 9/25/15
## 1617 Honduras Copán Lucerna 5.89721 9/24/15
## 1618 Honduras Ocotepeque La Labor 5.79867 9/25/15
## 1619 Honduras Francisco Morazán Villa Nueva 2.00830 6/13/15
## 1620 Honduras Santa Bárbara Ilama 2.87349 9/28/15
## 1622 Honduras Francisco Morazán El Guapinol 3.54399 10/21/15
## 1623 Honduras Yoro La Sarrosa 6.66574 1/22/15
## 1624 Honduras Francisco Morazán El Tablón 3.12986 6/13/15
## 1638 Honduras Francisco Morazán Tegucigalpa 0.91552 9/28/15
## 1639 Honduras Francisco Morazán Yaguacire 1.30583 9/28/15
## 1640 Honduras Francisco Morazán RÃo Abajo 3.63962 9/9/15
## 1641 Honduras Francisco Morazán Tegucigalpa 2.91326 9/20/15
head(df_HON)
## id Date time continent_code Country country_code
## 159 854 10/19/08 <NA> Honduras HN
## 160 855 10/20/08 <NA> Honduras HN
## 376 2062 7/12/10 5:30:00 <NA> Honduras HN
## 381 2093 7/18/10 <NA> Honduras HN
## 406 2217 8/7/10 Overnight <NA> Honduras HN
## 435 2358 8/29/10 4:30:00 <NA> Honduras HN
## State Population City Distance location_description
## 159 Copán 4752 CorquÃn 0.43391
## 160 Francisco Morazán 850848 Tegucigalpa 2.99239
## 376 Francisco Morazán 850848 Tegucigalpa 0.98377
## 381 Francisco Morazán 850848 Tegucigalpa 1.24404
## 406 Francisco Morazán 850848 Tegucigalpa 2.21442
## 435 Francisco Morazán 2288 Santa LucÃa 4.75791
## latitude longitude geolocation hazard_type
## 159 14.5637 -88.8693 (14.563700000000001, -88.869299999999996) Landslide
## 160 14.1080 -87.2137 (14.108000000000001, -87.213700000000003) Landslide
## 376 14.0831 -87.1978 (14.0831, -87.197800000000001) Landslide
## 381 14.0814 -87.1953 (14.0814, -87.195300000000003) Landslide
## 406 14.0783 -87.2270 (14.0783, -87.227000000000004) Landslide
## 435 14.1015 -87.1607 (14.1015, -87.160700000000006) Landslide
## landslide_type landslide_size trigger storm_name
## 159 Landslide Large Tropical cyclone Tropical Depression 16
## 160 Mudslide Large Tropical cyclone Tropical Depression 16
## 376 Mudslide Medium Downpour
## 381 Landslide Medium Downpour
## 406 Mudslide Medium Downpour
## 435 Rockfall Medium Downpour
## injuries fatalities source_name
## 159 NA 23
## 160 NA 29
## 376 NA 1
## 381 NA 0
## 406 NA 3
## 435 NA 5
## source_link
## 159 http://www.chron.com/disp/story.mpl/ap/world/6068144.html
## 160 http://in.ibtimes.com/articles/20081021/honduras-landslide-tegucigalpa-victim.htm
## 376 http://mdn.mainichi.jp/mdnnews/news/20100713p2a00m0na013000c.html
## 381 http://www.insidecostarica.com/dailynews/2010/july/19/centralamerica10071903.htm
## 406
## 435
Grafica de Pareto, utilizada para representar a las ciudades con mayor deslizamiento.
library(qcc)
Distance <- df_HON$Distance
names(Distance) <- df_HON$City
pareto.chart(Distance,
ylab="Distancia",
col = heat.colors(length(Distance)),
cumperc = seq(0, 100, by = 10),
ylab2 = "Porcentaje acumulado",
main = "Ciudades con mayor deslizamientos"
)

##
## Pareto chart analysis for Distance
## Frequency Cum.Freq. Percentage Cum.Percent.
## Cusuna 36.3762900 36.3762900 21.8907391 21.8907391
## La Libertad 17.2861300 53.6624200 10.4025496 32.2932888
## Duyure 11.6723700 65.3347900 7.0242679 39.3175567
## Concepción de Guasistagua 8.5258400 73.8606300 5.1307305 44.4482872
## El Sauce 7.2857500 81.1463800 4.3844618 48.8327489
## La Sarrosa 6.6657400 87.8121200 4.0113485 52.8440974
## Lucerna 5.8972100 93.7093300 3.5488579 56.3929554
## La Labor 5.7986700 99.5080000 3.4895580 59.8825133
## Santa LucÃa 4.7579100 104.2659100 2.8632432 62.7457566
## San José 4.6913300 108.9572400 2.8231763 65.5689329
## El Rancho 4.5336200 113.4908600 2.7282687 68.2972016
## Ciudad Choluteca 3.6959600 117.1868200 2.2241767 70.5213783
## RÃo Abajo 3.6396200 120.8264400 2.1902721 72.7116504
## El Guapinol 3.5439900 124.3704300 2.1327233 74.8443736
## Los Caminos 3.5373700 127.9078000 2.1287395 76.9731131
## Tegucigalpa 3.2528100 131.1606100 1.9574953 78.9306084
## El Tablón 3.1298600 134.2904700 1.8835057 80.8141140
## Tegucigalpa 2.9923900 137.2828600 1.8007782 82.6148922
## Agualote 2.9159400 140.1988000 1.7547716 84.3696639
## Tegucigalpa 2.9132600 143.1120600 1.7531588 86.1228227
## Ilama 2.8734900 145.9855500 1.7292258 87.8520485
## Tegucigalpa 2.2144200 148.1999700 1.3326068 89.1846553
## Villa Nueva 2.0083000 150.2082700 1.2085667 90.3932220
## Sinuapa 2.0080500 152.2163200 1.2084162 91.6016382
## 1.9005200 154.1168400 1.1437062 92.7453444
## El Lolo 1.8589700 155.9758100 1.1187020 93.8640463
## Ojos de Agua 1.3909500 157.3667600 0.8370541 94.7011005
## Yaguacire 1.3058300 158.6725900 0.7858301 95.4869306
## Tegucigalpa 1.2440400 159.9166300 0.7486458 96.2355763
## Tegucigalpa 1.2363900 161.1530200 0.7440421 96.9796184
## Tegucigalpa 0.9837700 162.1367900 0.5920189 97.5716373
## Agua Azul Rancho 0.9705700 163.1073600 0.5840754 98.1557127
## Tegucigalpa 0.9155200 164.0228800 0.5509470 98.7066598
## Santa Rosa de Copán 0.7441400 164.7670200 0.4478130 99.1544727
## CorquÃn 0.4339100 165.2009300 0.2611209 99.4155937
## Corpus 0.3698700 165.5708000 0.2225826 99.6381762
## Yoro 0.3123800 165.8831800 0.1879859 99.8261621
## Santa Rosa de Copán 0.2888700 166.1720500 0.1738379 100.0000000
library(forecast)
data_serie<- ts(df_HON$Distance, frequency=12, start=2007)
head(data_serie)
## Jan Feb Mar Apr May Jun
## 2007 0.43391 2.99239 0.98377 1.24404 2.21442 4.75791
Esta serie de tiempo, representa el deslizamiento en Honduras a través de los años.
autoplot(data_serie)+
labs(title = "Deslizamiento", x="Años", y = "Distancia", colour = "#752514") +theme_grey()

Puerto Rico
library(readr)
library(knitr)
df_PR <- subset (df, Country == "Puerto Rico")
knitr::kable(head(df_PR))
df_PR %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 68 Puerto Rico San Juan San Juan 6.91777 12/12/07
## 477 Puerto Rico Orocovis Orocovis 6.85760 10/6/10
## 1396 Puerto Rico Vega Alta Vega Alta 3.49090 5/18/14
## 1397 Puerto Rico Aguada Aguada 1.40257 9/24/14
## 1398 Puerto Rico Ponce Adjuntas 5.78872 8/24/14
## 1399 Puerto Rico Ponce Adjuntas 6.89036 8/24/14
## 1400 Puerto Rico Villalba Villalba 3.65535 11/7/14
head(df_PR)
## id Date time continent_code Country country_code State
## 68 393 12/12/07 <NA> Puerto Rico PR San Juan
## 477 2550 10/6/10 <NA> Puerto Rico PR Orocovis
## 1396 6708 5/18/14 16:30 <NA> Puerto Rico PR Vega Alta
## 1397 6709 9/24/14 <NA> Puerto Rico PR Aguada
## 1398 6710 8/24/14 3:00 <NA> Puerto Rico PR Ponce
## 1399 6711 8/24/14 <NA> Puerto Rico PR Ponce
## Population City Distance location_description latitude longitude
## 68 418140 San Juan 6.91777 18.4320 -66.0510
## 477 944 Orocovis 6.85760 18.1652 -66.3969
## 1396 12036 Vega Alta 3.49090 Mine construction 18.3806 -66.3319
## 1397 4040 Aguada 1.40257 Unknown 18.3711 -67.1782
## 1398 5080 Adjuntas 5.78872 Unknown 18.1283 -66.6810
## 1399 5080 Adjuntas 6.89036 Unknown 18.1254 -66.6700
## geolocation hazard_type landslide_type
## 68 (18.431999999999999, -66.051000000000002) Landslide Landslide
## 477 (18.165199999999999, -66.396900000000002) Landslide Complex
## 1396 (18.380600000000001, -66.331900000000005) Landslide Other
## 1397 (18.371099999999998, -67.178200000000004) Landslide Landslide
## 1398 (18.128299999999999, -66.680999999999997) Landslide Landslide
## 1399 (18.125399999999999, -66.67) Landslide Landslide
## landslide_size trigger storm_name injuries fatalities
## 68 Medium Tropical cyclone Tropical Storm Olga NA NA
## 477 Medium Tropical cyclone Tropical Storm Otto NA 0
## 1396 Small Rain 0 0
## 1397 Medium Downpour 0 0
## 1398 Small Downpour 0 0
## 1399 Medium Downpour 0 0
## source_name
## 68 AP.google.com
## 477
## 1396 Telemundo
## 1397 Telemundo
## 1398 Perla del Sur
## 1399 Perla del Sur
## source_link
## 68 http://ap.google.com/article/ALeqM5gVWjsPEiqe1tEu2mhBIRaxxGi8owD8TFVR600
## 477 http://www.whitehouse.gov/the-press-office/2010/10/26/president-obama-signs-puerto-rico-disaster-declaration
## 1396 http://www.telemundopr.com/telenoticias/puerto-rico/Deslizamiento-deja-a-familias-incomunicadas-en-Vega-Alta-258522361.html
## 1397 http://www.telemundopr.com/telenoticias/puerto-rico/Viviendas-inhabitables-luego-de-deslizamiento-de-tierras-en-Aguada-277123031.html
## 1398 http://www.periodicolaperla.com/index.php?option=com_content&view=article&id=6371:surgen-nuevos-deslizamientos-en-ponce&catid=135:actualidad-del-sur&Itemid=423
## 1399 http://www.periodicolaperla.com/index.php?option=com_content&view=article&id=6371:surgen-nuevos-deslizamientos-en-ponce&catid=135:actualidad-del-sur&Itemid=423
Los datos recibidos para Puerto Rico, no son abundantes, por lo tanto, la mejor relación está dada por la distancia de cada ciudad.
ggplot(data=df_PR, aes(x=City, y=Distance)) + geom_bar(stat="identity", color="aquamarine2", fill="aquamarine2")

Gráfico circular
Se encuentra la relación del espacio que tiene cada ciudad, con la distancia total del territorio nacional.
library(ggplot2)
library(dplyr)
df_PR <- df_PR %>%
arrange(desc(City)) %>%
mutate(prop = Distance / sum(df_PR$Distance) *100) %>%
mutate(ypos = cumsum(prop)- 0.5*prop )
require(scales)
ggplot(df_PR, aes(x="Puerto Rico", y = prop, fill=City)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
theme_void() +
theme(legend.position="none") +
geom_text(aes(y = ypos, label = percent(Distance/100)), color = "white", size=6) +
scale_fill_brewer(palette="Set2")+
labs(title="Distancia")

A través de esta gráfica se puede concluir entonces que, San Juan posee la mayor distancia en todo el territorio nacional, con un 6,918%
Trinidad and Tobago
library(readr)
library(knitr)
df_TNT <- subset (df, Country == "Trinidad and Tobago")
knitr::kable(head(df_TNT))
| 30 |
224 |
9/1/07 |
|
NA |
Trinidad and Tobago |
TT |
Tobago |
17000 |
Scarborough |
9.11607 |
|
11.2415 |
-60.6742 |
(11.2415, -60.674199999999999) |
Landslide |
Landslide |
Medium |
Tropical cyclone |
Hurricane Felix |
NA |
NA |
Trinadad Express |
http://www.trinidadexpress.com/index.pl/article_news?id=161197580 |
| 61 |
357 |
11/17/07 |
|
NA |
Trinidad and Tobago |
TT |
Eastern Tobago |
0 |
Roxborough |
7.33295 |
|
11.2965 |
-60.6312 |
(11.2965, -60.6312) |
Landslide |
Landslide |
Medium |
Rain |
|
NA |
NA |
Trinadad Express |
http://www.trinidadexpress.com/index.pl/article_news?id=161237574 |
| 65 |
390 |
12/11/07 |
|
NA |
Trinidad and Tobago |
TT |
Sangre Grande |
15968 |
Sangre Grande |
29.28864 |
|
10.8410 |
-61.0550 |
(10.840999999999999, -61.055) |
Landslide |
Landslide |
Medium |
Tropical cyclone |
Tropical Storm Olga |
NA |
3 |
Trinidad and Tobago’s Newsday |
http://www.newsday.co.tt/news/0,69681.html |
| 66 |
391 |
12/11/07 |
|
NA |
Trinidad and Tobago |
TT |
Eastern Tobago |
0 |
Roxborough |
8.62938 |
|
11.3000 |
-60.6440 |
(11.3, -60.643999999999998) |
Landslide |
Landslide |
Medium |
Tropical cyclone |
Tropical Storm Olga |
NA |
NA |
Trinidad and Tobago’s Newsday |
http://www.newsday.co.tt/news/0,69681.html |
| 67 |
392 |
12/11/07 |
|
NA |
Trinidad and Tobago |
TT |
Eastern Tobago |
0 |
Roxborough |
2.66802 |
|
11.2670 |
-60.5660 |
(11.266999999999999, -60.566000000000003) |
Landslide |
Landslide |
Small |
Tropical cyclone |
Tropical Storm Olga |
NA |
NA |
Trinidad and Tobago’s Newsday |
http://www.newsday.co.tt/news/0,69681.html |
| 149 |
780 |
9/7/08 |
|
NA |
Trinidad and Tobago |
TT |
Diego Martin |
8140 |
Petit Valley |
10.61854 |
|
10.7603 |
-61.4578 |
(10.760300000000001, -61.457799999999999) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
NA |
|
http://www.newsday.co.tt/news/0,85847.html |
Gráfico de barras
Se compara la densidad población en cada una de las ciudades de Trinidad Y Tobago.
library(ggplot2)
library(dplyr)
ggplot(data=df_TNT, aes(fill=City, x="Trinidad and Tobago", y=Population)) +
geom_bar(position="dodge", stat="identity")

Es importante recalcar, la división mostrada en el gráfico, pertenece a un dato desconocido
Nicaragua
library(readr)
library(knitr)
df_NIC <- subset (df, Country == "Nicaragua")
knitr::kable(head(df_NIC))
| 33 |
229 |
9/4/07 |
|
NA |
Nicaragua |
NI |
Atlántico Norte |
6315 |
Bonanza |
54.90196 |
|
13.6670 |
-84.2435 |
(13.667, -84.243499999999997) |
Landslide |
Complex |
Medium |
Tropical cyclone |
Hurricane Felix |
NA |
NA |
United Nations Development Programme - Relief Web |
http://www.reliefweb.int/ |
| 151 |
826 |
10/3/08 |
|
NA |
Nicaragua |
NI |
Masaya |
5182 |
Tisma |
14.49301 |
|
12.1200 |
-85.8900 |
(12.12, -85.89) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
9 |
CBC |
http://www.cbc.ca/world/story/2008/10/04/nicaragua-flooding.html |
| 420 |
2289 |
8/20/10 |
|
NA |
Nicaragua |
NI |
Managua |
16469 |
El Crucero |
5.84054 |
|
12.0420 |
-86.2998 |
(12.042, -86.299800000000005) |
Landslide |
Mudslide |
Medium |
Downpour |
|
NA |
3 |
|
|
| 424 |
2330 |
8/25/10 |
|
NA |
Nicaragua |
NI |
Jinotega |
2367 |
San José de Bocay |
1.36745 |
|
13.5317 |
-85.5325 |
(13.531700000000001, -85.532499999999999) |
Landslide |
Landslide |
Medium |
Downpour |
|
NA |
NA |
|
|
| 1261 |
6089 |
6/23/14 |
|
NA |
Nicaragua |
NI |
Chontales |
5827 |
Santo Domingo |
31.14242 |
Unknown |
12.3535 |
-84.8095 |
(12.3535, -84.8095) |
Landslide |
Landslide |
Small |
Continuous rain |
|
0 |
0 |
Wilfried Strauch |
|
| 1262 |
6090 |
6/23/14 |
|
NA |
Nicaragua |
NI |
Chontales |
5827 |
Santo Domingo |
31.24511 |
Unknown |
12.3521 |
-84.8080 |
(12.3521, -84.808000000000007) |
Landslide |
Landslide |
Medium |
Continuous rain |
|
0 |
0 |
Wilfried Strauch |
|
df_NIC %>%
select(Country, State, City, Distance, Date)
## Country State City Distance Date
## 33 Nicaragua Atlántico Norte Bonanza 54.90196 9/4/07
## 151 Nicaragua Masaya Tisma 14.49301 10/3/08
## 420 Nicaragua Managua El Crucero 5.84054 8/20/10
## 424 Nicaragua Jinotega San José de Bocay 1.36745 8/25/10
## 1261 Nicaragua Chontales Santo Domingo 31.14242 6/23/14
## 1262 Nicaragua Chontales Santo Domingo 31.24511 6/23/14
## 1263 Nicaragua Chontales Santo Domingo 31.37360 6/23/14
## 1264 Nicaragua Chontales Santo Domingo 31.10125 6/23/14
## 1265 Nicaragua Chontales Santo Domingo 30.99704 6/23/14
## 1266 Nicaragua Chontales Santo Domingo 30.77070 6/23/14
## 1267 Nicaragua Chontales Santo Domingo 30.27546 6/23/14
## 1268 Nicaragua Chontales Santo Domingo 29.95253 6/23/14
## 1269 Nicaragua Chontales Santo Domingo 29.92927 6/23/14
## 1270 Nicaragua Chontales Santo Domingo 28.90294 6/23/14
## 1271 Nicaragua Chontales Santo Domingo 32.69694 6/23/14
## 1272 Nicaragua Chontales Santo Domingo 32.96402 6/23/14
## 1273 Nicaragua Chontales Santo Domingo 32.77401 6/23/14
## 1274 Nicaragua Chontales Santo Domingo 29.94574 6/23/14
## 1299 Nicaragua Managua Ciudad Sandino 5.59574 10/16/14
## 1321 Nicaragua Ogun State Bonanza 0.37593 8/28/14
## 1380 Nicaragua Rivas Altagracia 1.97784 11/23/14
## 1381 Nicaragua Rivas Altagracia 5.77119 10/9/14
## 1382 Nicaragua RÃo San Juan San Carlos 0.67752 10/9/14
## 1626 Nicaragua Jinotega Wiwilà 25.81514 10/8/15
## 1627 Nicaragua Jinotega Jinotega 2.44880 2/19/16
## 1631 Nicaragua Madriz Las Sabanas 7.21108 9/27/15
## 1632 Nicaragua Madriz Las Sabanas 4.86364 9/27/15
## 1633 Nicaragua Managua Terrabona 18.92056 6/12/15
## 1634 Nicaragua Ogun State Bonanza 10.61568 6/10/15
## 1636 Nicaragua Ogun State Siuna 1.68056 7/23/15
## 1637 Nicaragua Masaya San Juan de Oriente 1.56730 5/13/15
head(df_NIC)
## id Date time continent_code Country country_code State
## 33 229 9/4/07 <NA> Nicaragua NI Atlántico Norte
## 151 826 10/3/08 <NA> Nicaragua NI Masaya
## 420 2289 8/20/10 <NA> Nicaragua NI Managua
## 424 2330 8/25/10 <NA> Nicaragua NI Jinotega
## 1261 6089 6/23/14 <NA> Nicaragua NI Chontales
## 1262 6090 6/23/14 <NA> Nicaragua NI Chontales
## Population City Distance location_description latitude
## 33 6315 Bonanza 54.90196 13.6670
## 151 5182 Tisma 14.49301 12.1200
## 420 16469 El Crucero 5.84054 12.0420
## 424 2367 San José de Bocay 1.36745 13.5317
## 1261 5827 Santo Domingo 31.14242 Unknown 12.3535
## 1262 5827 Santo Domingo 31.24511 Unknown 12.3521
## longitude geolocation hazard_type
## 33 -84.2435 (13.667, -84.243499999999997) Landslide
## 151 -85.8900 (12.12, -85.89) Landslide
## 420 -86.2998 (12.042, -86.299800000000005) Landslide
## 424 -85.5325 (13.531700000000001, -85.532499999999999) Landslide
## 1261 -84.8095 (12.3535, -84.8095) Landslide
## 1262 -84.8080 (12.3521, -84.808000000000007) Landslide
## landslide_type landslide_size trigger storm_name injuries
## 33 Complex Medium Tropical cyclone Hurricane Felix NA
## 151 Landslide Medium Downpour NA
## 420 Mudslide Medium Downpour NA
## 424 Landslide Medium Downpour NA
## 1261 Landslide Small Continuous rain 0
## 1262 Landslide Medium Continuous rain 0
## fatalities source_name
## 33 NA United Nations Development Programme - Relief Web
## 151 9 CBC
## 420 3
## 424 NA
## 1261 0 Wilfried Strauch
## 1262 0 Wilfried Strauch
## source_link
## 33 http://www.reliefweb.int/
## 151 http://www.cbc.ca/world/story/2008/10/04/nicaragua-flooding.html
## 420
## 424
## 1261
## 1262
Grafica de barras
Relaciona la densidad población con cada estado/provincia de Nicaragua.
ggplot(data=df_NIC, aes(x=State, y=Population)) + geom_bar(stat="identity", color="aquamarine4", fill="aquamarine4")

Tabla de frecuencia
Se relaciona la población con la cantidad de muertes dada por factores ambientales:
| Bonanza |
6315 |
7 |
| Bonanza |
6315 |
38 |
| Tisma |
5182 |
9 |
| El crucero |
19469 |
3 |
| San José de Bocay |
2367 |
0 |
| Santo Domingo |
5827 |
0 |
| Ciudad Sandino |
70013 |
9 |
| Alta Gracia |
2771 |
1 |
| San Carlos |
13451 |
0 |
| Wiwilí |
6955 |
0 |
| Jinotega |
51073 |
0 |
| Las Sabanas |
1257 |
0 |
| Terrabona |
1902 |
0 |
| Siuna |
16056 |
0 |
| San Juan de Oriente |
2111 |
0 |
En este caso, Bonanza es la ciudad con más muertes (38) producidas por factores ambientales.
Conclusión
Según los resultados y las gráficas presentadas, podemos decir que Rstudio es una aplicación que facilita la aplicación de ciertos procedimientos y cálculos estadísticos. Además, aprovechando las características ofrecidas por la aplicación, se han aprendido herramientas básicas que permiten aplicar conocimientos teóricos ya discutidos en clase.
Utilizando las gráficas de barras, podemos relacionar tanto la población y distancia con su respectiva ciudad. Así también, al utilizar los gráficos circulares, podemos relacionar el espacio que tiene cada ciudad con su estado/provincia.
Estos datos estadísticos, también fueron útiles para crear tablas de frecuencia, así como cajas y extensiones, los cuales nos permitieron tener una visión más amplia de, no sólo la densidad poblacional o el desplazamiento, sino también muertes súbitas, fenómenos ambientales, y desencadenantes que se provocan a partir de estos mismos.
Lo aquí aprendido, será de utilidad sin importar el uso de R que se tenga previsto, pues son conceptos fundamentales que nos permitirán acceder a otros más complejos y avanzados.