Taken from data on fish consumption figures for each province from 2010 to 2019 (https://statistik.kkp.go.id/home.php)
Create a map of fish consumption figures for 2010 and a map of fish consumption figures for year 2019. Give the interpretation. Which province has the lowest fish consumption rate? and the highest fish consumption rate in 2019?
library(rgdal)
## Loading required package: sp
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
##
## rgdal: version: 1.5-27, (SVN revision 1148)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/asus/Documents/R/win-library/4.1/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/asus/Documents/R/win-library/4.1/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## Overwritten PROJ_LIB was C:/Users/asus/Documents/R/win-library/4.1/rgdal/proj
library(maptools)
## Checking rgeos availability: TRUE
## Please note that 'maptools' will be retired by the end of 2023,
## plan transition at your earliest convenience;
## some functionality will be moved to 'sp'.
library(ggplot2)
library(plyr)
## Warning: package 'plyr' was built under R version 4.1.2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(mapproj)
## Loading required package: maps
##
## Attaching package: 'maps'
## The following object is masked from 'package:plyr':
##
## ozone
library(ggsn)
## Loading required package: grid
library(ggspatial)
setwd("D:\\A SEMESTER 5\\KOMSTAT\\uts\\mapIndo")
#mapfile
indo = readOGR(dsn = ".", layer = "archive")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\A SEMESTER 5\KOMSTAT\uts\mapIndo", layer: "archive"
## with 34 features
## It has 23 fields
indo.df = fortify(indo)
## Regions defined for each Polygons
str(indo.df)
## 'data.frame': 1713253 obs. of 7 variables:
## $ long : num 96.9 96.9 96.9 96.9 96.9 ...
## $ lat : num 3.6 3.6 3.6 3.6 3.6 ...
## $ order: int 1 2 3 4 5 6 7 8 9 10 ...
## $ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ piece: Factor w/ 1529 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ id : chr "0" "0" "0" "0" ...
## $ group: Factor w/ 7524 levels "0.1","0.2","0.3",..: 1 1 1 1 1 1 1 1 1 1 ...
library(readxl)
df = read_xlsx(path = "aki_20211027_103806.xlsx", sheet = 1)
df$id <- as.character(df$id)
str(df)
## tibble [35 x 12] (S3: tbl_df/tbl/data.frame)
## $ Provinsi: chr [1:35] "NASIONAL" "ACEH" "BALI" "BANTEN" ...
## $ id : chr [1:35] "0" "1" "2" "3" ...
## $ 2010 : num [1:35] 30.5 40.5 23.2 21.5 23.4 ...
## $ 2011 : num [1:35] 32.2 41 24.4 24.9 25.5 ...
## $ 2012 : num [1:35] 33.9 41.7 24.8 28.4 26.8 ...
## $ 2013 : num [1:35] 35.2 43.3 28.7 28.1 30.7 ...
## $ 2014 : num [1:35] 38.1 45.8 31.5 30.6 32.2 ...
## $ 2015 : num [1:35] 41.1 46.9 33 32.5 34.4 ...
## $ 2016 : num [1:35] 43.9 49.8 36.4 34.8 36.1 ...
## $ 2017 : num [1:35] 47.3 53.2 39.4 38.7 40.2 ...
## $ 2018 : num [1:35] 50.7 54.1 41.4 34 54.6 ...
## $ 2019 : num [1:35] 54.5 63.6 41.5 42.9 40.8 ...
plotData = full_join(indo.df, df, by=c("id"))
str(plotData)
## 'data.frame': 1713254 obs. of 18 variables:
## $ long : num 96.9 96.9 96.9 96.9 96.9 ...
## $ lat : num 3.6 3.6 3.6 3.6 3.6 ...
## $ order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ piece : Factor w/ 1529 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ id : chr "0" "0" "0" "0" ...
## $ group : Factor w/ 7524 levels "0.1","0.2","0.3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Provinsi: chr "NASIONAL" "NASIONAL" "NASIONAL" "NASIONAL" ...
## $ 2010 : num 30.5 30.5 30.5 30.5 30.5 ...
## $ 2011 : num 32.2 32.2 32.2 32.2 32.2 ...
## $ 2012 : num 33.9 33.9 33.9 33.9 33.9 ...
## $ 2013 : num 35.2 35.2 35.2 35.2 35.2 ...
## $ 2014 : num 38.1 38.1 38.1 38.1 38.1 ...
## $ 2015 : num 41.1 41.1 41.1 41.1 41.1 ...
## $ 2016 : num 43.9 43.9 43.9 43.9 43.9 ...
## $ 2017 : num 47.3 47.3 47.3 47.3 47.3 ...
## $ 2018 : num 50.7 50.7 50.7 50.7 50.7 ...
## $ 2019 : num 54.5 54.5 54.5 54.5 54.5 54.5 54.5 54.5 54.5 54.5 ...
#plot peta 2010
cnames <- aggregate(cbind(long, lat) ~ Provinsi, data=plotData,
FUN=function(x)mean(range(x)))
ggplot() + geom_polygon(data = plotData, aes(x = long, y = lat, group = group, fill=`2010`)) +
geom_text(data=cnames, aes(long, lat, label=Provinsi), size = 3)+
coord_map() + labs(title= "Fish Consumption Figures in 2010") +
scale_fill_continuous(high = "#FF5733", low = "#FDEDEC", name= "Consumption Figures") +
theme_minimal() + blank() + theme(plot.title = element_text(hjust = 0.5)) +
annotation_north_arrow(location = "tr", which_north = "true",
pad_x = unit(0.25, "in"),
style = north_arrow_fancy_orienteering)
## Warning in f(...): True north is not meaningful without coord_sf()
Angka Konsumsi Ikan terendah adalah provinsi Lampung, Jawa Tengah, dan Nusa Tenggara Barat, dilihat dari warna yang paling muda yaitu berada di sekitar angka konsumsi kurang dari 10.
Angka Konsumsi Ikan tertinggi adalah provinsi Maluku, Kepulauan Riau, dan Kalimantan Tengah, dilihat dari warna yang paling tua yaitu berada di sekitar angka konsumsi 40.
#plot peta 2019
cnames <- aggregate(cbind(long, lat) ~ Provinsi, data=plotData,
FUN=function(x)mean(range(x)))
ggplot() + geom_polygon(data = plotData, aes(x = long, y = lat, group = group, fill = `2019`)) +
geom_text(data=cnames, aes(long, lat, label=Provinsi), size = 3)+
coord_map() + labs(title= "Fish Consumption Figures in 2019") +
scale_fill_continuous(high = "#FF5733", low = "#FDEDEC", name= "Consumption Figures") +
theme_minimal() + blank() + theme(plot.title = element_text(hjust = 0.5)) +
annotation_north_arrow(location = "tr", which_north = "true",
pad_x = unit(0.25, "in"),
style = north_arrow_fancy_orienteering)
## Warning in f(...): True north is not meaningful without coord_sf()
Angka Konsumsi Ikan terendah adalah provinsi Bengkulu dan Lampung, dilihat dari warna yang paling muda yaitu berada di sekitar angka konsumsi kurang dari 40.
Angka Konsumsi Ikan tertinggi adalah provinsi Maluku dan Gorontalo, dilihat dari warna yang paling tua yaitu berada di sekitar angka konsumsi 70.
library(tidyr)
longer_data <- df %>% pivot_longer(`2010`:`2019`, names_to = "Year", values_to = "Fish_Consumption_Figures")
longer_data
## # A tibble: 350 x 4
## Provinsi id Year Fish_Consumption_Figures
## <chr> <chr> <chr> <dbl>
## 1 NASIONAL 0 2010 30.5
## 2 NASIONAL 0 2011 32.2
## 3 NASIONAL 0 2012 33.9
## 4 NASIONAL 0 2013 35.2
## 5 NASIONAL 0 2014 38.1
## 6 NASIONAL 0 2015 41.1
## 7 NASIONAL 0 2016 43.9
## 8 NASIONAL 0 2017 47.3
## 9 NASIONAL 0 2018 50.7
## 10 NASIONAL 0 2019 54.5
## # ... with 340 more rows
library(ggplot2)
d2 = transform(longer_data, Year = as.numeric(Year))
df3 = data.frame(d2)
ggplot(df3, aes(Year, Fish_Consumption_Figures, col = Provinsi)) + geom_line(aes(colour=Provinsi))
Dari plot di atas terlihat bahwa, hampir seluruh provinsi di Indonesia memiliki angka konsumsi yang meningkat setiap tahunnya. Dari Plot di atas terlihat bahwa, Kalimantan Utara pada tahun 2010- 2014 memiliki Angka Konsumsi Ikan sebesar 0 lalu naik secara signifikan pada tahun 2015 pada Angka konsumsi sekitar 40.
Is the average fish consumption rate in 2019 higher than 2010?
Write down the null hypothesis and the alternative hypothesis. What tests are needed to test this hypothesis? Create your own function to test this hypothesis. Use the 5% significance level.
twosam <- function(y1, y2, d0=0, alpha=0.05) {
n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)
stdev1 <- sd(y1); stdev2 <- sd(y2)
z.star = -1*qnorm(alpha/2)
z <- ((yb1-yb2)-d0)/sqrt((stdev1^2/n1)+(stdev2^2/n2))
margin.error = z.star*sqrt((stdev1^2/n1)+(stdev2^2/n2))
pvalue = 2*pnorm(z, lower.tail = F)
lower.CI <- (yb1-yb2)-margin.error
upper.CI <- (yb1-yb2)+margin.error
list(z.stat=z, p.value = pvalue, "margin of error" = margin.error, "95% CI" = c(lower.CI,upper.CI)
,mean.x = yb1,mean.y=yb2)
}
twosam(df$`2019`, df$`2010`, d0=0, alpha=0.05)
## $z.stat
## [1] 8.538376
##
## $p.value
## [1] 1.36121e-17
##
## $`margin of error`
## [1] 5.434507
##
## $`95% CI`
## [1] 18.24035 29.10936
##
## $mean.x
## [1] 54.49829
##
## $mean.y
## [1] 30.82343
library(BSDA)
## Loading required package: lattice
##
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
##
## Orange
z.test(df$`2019`, df$`2010`, alternative = "two.sided", mu=0, sigma.x = sd(df$`2019`), sigma.y = sd(df$`2010`), conf.level = 0.95)
##
## Two-sample z-Test
##
## data: df$`2019` and df$`2010`
## z = 8.5384, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 18.24035 29.10936
## sample estimates:
## mean of x mean of y
## 54.49829 30.82343
So, it can be concluded that the average fish consumption rate in 2019 was lower than in 2010.