In this Learning By Building (LBB), I’m interested to show unique facts how Indonesian cognitive abilities in counting simple math, hope you enjoy it!
The Indonesian Family Life Survey IFLS is an on-going longitudinal survey in Indonesia. The sample is representative of about 83% of the Indonesian population and contains over 30,000 individuals living in 13 of the 27 provinces in the country. The map below identifies the 13 IFLS provinces in the IFLS.
IFLS Data.
The first wave of the IFLS (IFLS1) was conducted in 1993/94 by RAND in collaboration with Lembaga Demografi, University of Indonesia. IFLS2 and IFLS2+ were conducted in 1997 and 1998, respectively, by RAND in collaboration with UCLA and Lembaga Demografi, University of Indonesia. IFLS2+ covered a 25% sub-sample of the IFLS households. IFLS3, which was fielded in 2000 and covered the full sample, was conducted by RAND in collaboration with the Population Research center, University of Gadjah Mada. The fourth wave of the IFLS (IFLS4), fielded in 2007/2008 covering the full sample, was conducted by RAND, the center for Population and Policy Studies (CPPS) of the University of Gadjah Mada and Survey METRE. The fifth wave of the IFLS (IFLS-5) was fielded 2014-15.
IFLS data has its own challenges because the data is fragmented with each subtopic, so this will be a long journey
# Data Input
bkar0 <- read.dta("bk_ar0.dta")
bkar1 <- read.dta("bk_ar1.dta")
bksc1 <- read.dta("bk_sc1.dta")
b3bco1 <- read.dta("b3b_co1.dta")
b3bcob <- read.dta("b3b_cob.dta")Explanation the variable data :
- bkar0 = book K with variable ar0
- bkar1 = book K with variable ar1
- bksc1 = book K with variable sc1
- b3bco1 = book 3B with variable co1
- b3bcob = book 3B with variable cob
Because there are many more sub-variables inside the variable, cleansing data is necessary to take the data needed
# Cleansing Data for book K
databkar1 <- bkar1[, c("hhid14_9", "pid14", "hhid14", "pidlink", "ar01a","ar02b", "ar07", "ar09", "ar15", "ar15b", "ar15d", "ar16")]
databkar1$ar16 <- as.factor(databkar1$ar16)
glimpse(databkar1)## Observations: 89,382
## Variables: 12
## $ hhid14_9 <chr> "001060000", "001060000", "001060000", "001060000", "00106...
## $ pid14 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 1, 2, 3, 4,...
## $ hhid14 <chr> "0010600", "0010600", "0010600", "0010600", "0010600", "00...
## $ pidlink <chr> "001060001", "001060002", "001060003", "001060004", "00106...
## $ ar01a <dbl> 1, 0, 0, 3, 0, 0, 5, 5, 5, 5, 5, 2, 5, 5, 5, 3, 0, 3, 3, 3...
## $ ar02b <fct> 1:Head of the household, NA, NA, 3:Children (biological, N...
## $ ar07 <dbl> 1, NA, NA, 3, NA, NA, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, NA, 1,...
## $ ar09 <dbl> 59, NA, NA, 29, NA, NA, 39, 16, 12, 3, 1, 29, 30, 6, 3, 59...
## $ ar15 <fct> 2:Protestant, NA, NA, 2:Protestant, NA, NA, 2:Protestant, ...
## $ ar15b <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.0e+06, N...
## $ ar15d <fct> 13:Nias, NA, NA, 13:Nias, NA, NA, 13:Nias, 13:Nias, 13:Nia...
## $ ar16 <fct> 2:Grade school, NA, NA, 2:Grade school, NA, NA, 1:Unschool...
Since many samples have Not Available (NA) Information, I will subset and delete samples that have no information and rename the columns
databkar1 <- databkar1 %>% na.omit()
names(databkar1)[names(databkar1) == "ar01a"] <- "Masih_tinggal_di_RT"
names(databkar1)[names(databkar1) == "ar02b"] <- "Hubungan_dengan_kepala_RT"
names(databkar1)[names(databkar1) == "ar07"] <- "Jenis_kelamin"
names(databkar1)[names(databkar1) == "ar09"] <- "Umur"
names(databkar1)[names(databkar1) == "ar15d"] <- "Suku_bangsa"
names(databkar1)[names(databkar1) == "ar16"] <- "Pendidikan_tertinggi"
names(databkar1)[names(databkar1) == "ar15"] <- "Agama"
names(databkar1)[names(databkar1) == "ar15b"] <- "Pendapatan_12_bulan_terakhir"
databkar1$Masih_tinggal_di_RT <- as.factor(databkar1$Masih_tinggal_di_RT)
databkar1$Jenis_kelamin <- as.factor(databkar1$Jenis_kelamin)
databkar1$Umur <- as.factor(databkar1$Umur)
databkar1 <- databkar1[order(databkar1$pidlink ,decreasing = F),]
glimpse(databkar1)## Observations: 28,900
## Variables: 12
## $ hhid14_9 <chr> "001060004", "001080000", "001080000",...
## $ pid14 <dbl> 2, 1, 3, 1, 4, 5, 6, 8, 9, 1, 3, 1, 6,...
## $ hhid14 <chr> "0010651", "0010800", "0010800", "0010...
## $ pidlink <chr> "001065102", "001080001", "001080003",...
## $ Masih_tinggal_di_RT <fct> 5, 3, 3, 2, 3, 3, 3, 1, 3, 1, 3, 1, 1,...
## $ Hubungan_dengan_kepala_RT <fct> 1:Head of the household, 6:Parents, 8:...
## $ Jenis_kelamin <fct> 1, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1,...
## $ Umur <fct> 30, 59, 36, 36, 34, 32, 30, 26, 24, 55...
## $ Agama <fct> 2:Protestant, 2:Protestant, 2:Protesta...
## $ Pendapatan_12_bulan_terakhir <dbl> 2000000, 12000000, 2400000, 6000000, 1...
## $ Suku_bangsa <fct> 13:Nias, 13:Nias, 13:Nias, 13:Nias, 13...
## $ Pendidikan_tertinggi <fct> 3:General jr. high, 2:Grade school, 1:...
Notes Variables : hhid14_9, pid14, hhid14, pidlink = Individual Code in Households use for merging data later
databksc1 <- bksc1[, c("hhid14_9", "hhid14", "sc01_14_14","sc05")]
databksc1 <- databksc1 %>% na.omit()
names(databksc1)[names(databksc1) == "sc01_14_14"] <- "Provinsi"
names(databksc1)[names(databksc1) == "sc05"] <- "Rural_Urban"
databksc1$Provinsi <- as.factor(databksc1$Provinsi)
glimpse(databksc1)## Observations: 15,921
## Variables: 4
## $ hhid14_9 <chr> "001060000", "001060004", "001080000", "001080003", "00...
## $ hhid14 <chr> "0010600", "0010651", "0010800", "0010851", "0012200", ...
## $ Provinsi <fct> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,...
## $ Rural_Urban <fct> 2:Rural, 2:Rural, 2:Rural, 2:Rural, 2:Rural, 1:Urban, 2...
# Merging Data for book K
databk <- inner_join(databkar1, databksc1, by = c("hhid14", "hhid14_9"))
databk <- distinct(databk, hhid14_9, .keep_all = T)
glimpse(databk)## Observations: 13,761
## Variables: 14
## $ hhid14_9 <chr> "001060004", "001080000", "001080003",...
## $ pid14 <dbl> 2, 1, 1, 1, 1, 1, 4, 1, 1, 3, 1, 1, 1,...
## $ hhid14 <chr> "0010651", "0010800", "0010851", "0012...
## $ pidlink <chr> "001065102", "001080001", "001080003",...
## $ Masih_tinggal_di_RT <fct> 5, 3, 2, 1, 1, 1, 1, 5, 5, 1, 1, 2, 2,...
## $ Hubungan_dengan_kepala_RT <fct> 1:Head of the household, 6:Parents, 1:...
## $ Jenis_kelamin <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ Umur <fct> 30, 59, 36, 55, 34, 24, 35, 50, 30, 34...
## $ Agama <fct> 2:Protestant, 2:Protestant, 2:Protesta...
## $ Pendapatan_12_bulan_terakhir <dbl> 2000000, 12000000, 6000000, 36000000, ...
## $ Suku_bangsa <fct> 13:Nias, 13:Nias, 13:Nias, 13:Nias, 13...
## $ Pendidikan_tertinggi <fct> 3:General jr. high, 2:Grade school, 2:...
## $ Provinsi <fct> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12...
## $ Rural_Urban <fct> 2:Rural, 2:Rural, 2:Rural, 2:Rural, 2:...
Merging data into databk will make the observations decrease into 13.761 samples from databkar1 with 28.900 samples and databksc1 with 15.921 samples.
# Cleansing Data for book 3B
datab3bco1 <- b3bco1[, c("hhid14_9", "pid14", "hhid14", "pidlink", "co04a", "co04b", "co04c", "co04d", "co04e")]
datab3bco1 <- datab3bco1 %>% na.omit()
glimpse(datab3bco1)## Observations: 28,983
## Variables: 9
## $ hhid14_9 <chr> "001060000", "001060004", "001060000", "001060000", "00106...
## $ pid14 <dbl> 1, 1, 7, 8, 2, 1, 8, 1, 2, 1, 6, 1, 1, 2, 3, 3, 7, 2, 12, ...
## $ hhid14 <chr> "0010600", "0010651", "0010600", "0010600", "0010651", "00...
## $ pidlink <chr> "001060001", "001060004", "001060007", "001060008", "00106...
## $ co04a <dbl> 93, 93, 93, 93, 93, 20, 93, 93, 93, 93, 93, 94, 93, 93, 93...
## $ co04b <dbl> 86, 86, 86, 86, 86, 15, 86, 66, 86, 86, 86, 82, 86, 86, 86...
## $ co04c <dbl> 79, 79, 79, 79, 79, 55, 81, 59, 77, 79, 79, 70, 79, 79, 78...
## $ co04d <dbl> 72, 72, 72, 72, 72, 40, 74, 52, 70, 72, 72, 65, 72, 72, 71...
## $ co04e <dbl> 65, 65, 66, 65, 65, 30, 67, 45, 63, 65, 65, 60, 65, 65, 66...
datab3bcob <- b3bcob[, c("hhid14_9", "pid14", "hhid14", "pidlink", "cob01_a", "cob02_a", "cob03_a")]
datab3bcob <- datab3bcob %>% na.omit()
glimpse(datab3bcob)## Observations: 31,409
## Variables: 7
## $ hhid14_9 <chr> "001060000", "001060000", "001060000", "001060004", "00106...
## $ pid14 <dbl> 1, 7, 8, 1, 2, 8, 1, 2, 1, 2, 6, 1, 2, 3, 1, 1, 2, 3, 4, 7...
## $ hhid14 <chr> "0010600", "0010600", "0010600", "0010651", "0010651", "00...
## $ pidlink <chr> "001060001", "001060007", "001060008", "001060004", "00106...
## $ cob01_a <dbl> 9, 9, 9, 9, 9, 9, 6, 98, 9, 9, 9, 9, 9, 9, 9, 9, 98, 9, 98...
## $ cob02_a <dbl> 10, 9, 10, 10, 10, 10, 6, 98, 11, 2, 10, 10, 11, 11, 13, 7...
## $ cob03_a <dbl> 4, 9, 5, 5, 4, 7, 6, 98, 7, 2, 4, 5, 5, 5, 2, 5, 98, 7, 98...
# Merging Data for book 3B
datab3b <- inner_join(datab3bco1, datab3bcob, by = c("pidlink", "hhid14_9", "pid14", "hhid14"))
datab3b <- distinct(datab3b, pidlink, .keep_all = T)
datab3b <- datab3b[order(datab3b$pidlink, decreasing = F),]
glimpse(datab3b)## Observations: 28,971
## Variables: 12
## $ hhid14_9 <chr> "001060000", "001060004", "001060000", "001060000", "00106...
## $ pid14 <dbl> 1, 1, 7, 8, 2, 1, 8, 1, 2, 1, 6, 1, 1, 2, 3, 3, 7, 2, 12, ...
## $ hhid14 <chr> "0010600", "0010651", "0010600", "0010600", "0010651", "00...
## $ pidlink <chr> "001060001", "001060004", "001060007", "001060008", "00106...
## $ co04a <dbl> 93, 93, 93, 93, 93, 20, 93, 93, 93, 93, 93, 94, 93, 93, 93...
## $ co04b <dbl> 86, 86, 86, 86, 86, 15, 86, 66, 86, 86, 86, 82, 86, 86, 86...
## $ co04c <dbl> 79, 79, 79, 79, 79, 55, 81, 59, 77, 79, 79, 70, 79, 79, 78...
## $ co04d <dbl> 72, 72, 72, 72, 72, 40, 74, 52, 70, 72, 72, 65, 72, 72, 71...
## $ co04e <dbl> 65, 65, 66, 65, 65, 30, 67, 45, 63, 65, 65, 60, 65, 65, 66...
## $ cob01_a <dbl> 9, 9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9...
## $ cob02_a <dbl> 10, 10, 9, 10, 10, 6, 10, 11, 2, 10, 10, 13, 7, 11, 11, 9,...
## $ cob03_a <dbl> 4, 5, 9, 5, 4, 6, 7, 7, 2, 5, 4, 2, 5, 5, 5, 7, 5, 7, 10, ...
Merging data into datab3b will make the observations decrease into 28.971 samples from datab3bco1 with 28.983 samples and datab3bcob with 31.409 samples.
#Merging Data for Book K and Book 3B
dataifls <- inner_join(databk, datab3b, by = c("pidlink", "hhid14_9", "pid14", "hhid14"))
dataifls <- distinct(dataifls, pidlink, .keep_all = T)
dataifls <- dataifls[order(dataifls$pidlink ,decreasing = F),]
names(dataifls)[names(dataifls) == "co04a"] <- "Pertanyaan1"
names(dataifls)[names(dataifls) == "co04b"] <- "Pertanyaan2"
names(dataifls)[names(dataifls) == "co04c"] <- "Pertanyaan3"
names(dataifls)[names(dataifls) == "co04d"] <- "Pertanyaan4"
names(dataifls)[names(dataifls) == "co04e"] <- "Pertanyaan5"
names(dataifls)[names(dataifls) == "cob01_a"] <- "Pertanyaan6"
names(dataifls)[names(dataifls) == "cob02_a"] <- "Pertanyaan7"
names(dataifls)[names(dataifls) == "cob03_a"] <- "Pertanyaan8"
dataifls$Pertanyaan1 <- as.factor((dataifls$Pertanyaan1))
dataifls$Pertanyaan2 <- as.factor((dataifls$Pertanyaan2))
dataifls$Pertanyaan3 <- as.factor((dataifls$Pertanyaan3))
dataifls$Pertanyaan4 <- as.factor((dataifls$Pertanyaan4))
dataifls$Pertanyaan5 <- as.factor((dataifls$Pertanyaan5))
dataifls$Pertanyaan6 <- as.factor((dataifls$Pertanyaan6))
dataifls$Pertanyaan7 <- as.factor((dataifls$Pertanyaan7))
dataifls$Pertanyaan8 <- as.factor((dataifls$Pertanyaan8))
dataifls$Pertanyaan1 <- factor(ifelse(dataifls$Pertanyaan1==93, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan2 <- factor(ifelse(dataifls$Pertanyaan2==86, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan3 <- factor(ifelse(dataifls$Pertanyaan3==79, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan4 <- factor(ifelse(dataifls$Pertanyaan4==72, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan5 <- factor(ifelse(dataifls$Pertanyaan5==65, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan6 <- factor(ifelse(dataifls$Pertanyaan6==9, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan7 <- factor(ifelse(dataifls$Pertanyaan7==10, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan8 <- factor(ifelse(dataifls$Pertanyaan8==4, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Jenis_kelamin <- factor(ifelse(dataifls$Jenis_kelamin==1, "Pria", "Wanita"), c("Pria", "Wanita"))
summary(dataifls)## hhid14_9 pid14 hhid14 pidlink
## Length:9708 Min. : 1.000 Length:9708 Length:9708
## Class :character 1st Qu.: 1.000 Class :character Class :character
## Mode :character Median : 1.000 Mode :character Mode :character
## Mean : 1.741
## 3rd Qu.: 2.000
## Max. :19.000
##
## Masih_tinggal_di_RT Hubungan_dengan_kepala_RT Jenis_kelamin
## 1 :6664 1:Head of the household:7110 Pria :6606
## 2 :2476 2:Husband/wife :1802 Wanita:3102
## 3 : 1 3:Children (biological : 356
## 5 : 558 5:Sons/daughters-in-law: 172
## 11: 9 6:Parents : 75
## 8:Siblings : 38
## (Other) : 155
## Umur Agama Pendapatan_12_bulan_terakhir
## 35 : 352 1:Islam :8688 Min. : 0
## 33 : 341 4:Hinduism : 473 1st Qu.: 6000000
## 31 : 340 2:Protestant: 388 Median : 13000000
## 34 : 340 3:Catholic : 140 Mean : 22242390
## 32 : 330 5:Buddhism : 14 3rd Qu.: 26400000
## 30 : 314 7:Konghucu : 4 Max. :999999995
## (Other):7691 (Other) : 1
## Suku_bangsa Pendidikan_tertinggi Provinsi
## 1:Javanese :4378 2:Grade school :2969 32 :1405
## 2:Sundanese:1223 5:General sr. high (SLA) :1504 33 :1218
## 8:Sasak : 490 3:General jr. high :1457 35 :1060
## 9:Minang : 489 6:Vocational sr. high (SMK):1174 52 : 791
## 3:Balinese : 452 61:University S1 :1060 12 : 768
## 4:Batak : 442 60:Diploma (D1, D2, D3) : 382 31 : 645
## (Other) :2234 (Other) :1162 (Other):3821
## Rural_Urban Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4
## 1:Urban:5923 TRUE :9322 TRUE :6120 TRUE :5069 TRUE :4392
## 2:Rural:3785 FALSE: 386 FALSE:3588 FALSE:4639 FALSE:5316
##
##
##
##
##
## Pertanyaan5 Pertanyaan6 Pertanyaan7 Pertanyaan8
## TRUE :3983 TRUE :9120 TRUE :4915 TRUE :3545
## FALSE:5725 FALSE: 588 FALSE:4793 FALSE:6163
##
##
##
##
##
after cleansing data and merging data into dataifls, final observations left 9708 samples.
Notes Variables :
Code answer for “Masih_tinggal_di_RT” Variable
1. Ya, Masih tinggal di RT ini
2. ART Panel, pada survey terakhir tidak di RT ini dan sekarang kembali ke RT ini
3. Tidak lagi tinggal di RT ini
5. ART baru
11. ART kembali dalam putaran yang sama
Code answer for “Provinsi” Variable
List Questions for “Pertanyaanx” variables
Pertanyaan1 = 100 - 7, answer TRUE = 93
Pertanyaan2 = 93 - 7, answer TRUE = 86
Pertanyaan3 = 86 - 7, answer TRUE = 79
Pertanyaan4 = 79 - 7, answer TRUE = 72
Pertanyaan5 = 72 - 7, answer TRUE = 65
Pertanyaan6, Pertanyaan7, Pertanyaan8, answer TRUE = 9, 10, 4
After cleansing data, we will analyze how Indonesian people cognitive ability which is divided into two categories such as,
simple numeric (Pertanyaan1 – Pertanyaan5) and series of numbers (Pertanyaan6 – Pertanyaan8)
dataifls1 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5) %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
print.data.frame(dataifls1)## Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 total
## 1 TRUE TRUE TRUE TRUE TRUE 3879
## 2 TRUE TRUE TRUE TRUE FALSE 425
## 3 TRUE TRUE TRUE FALSE TRUE 16
## 4 TRUE TRUE TRUE FALSE FALSE 691
## 5 TRUE TRUE FALSE TRUE TRUE 11
## 6 TRUE TRUE FALSE TRUE FALSE 1
## 7 TRUE TRUE FALSE FALSE TRUE 9
## 8 TRUE TRUE FALSE FALSE FALSE 1071
## 9 TRUE FALSE TRUE TRUE TRUE 19
## 10 TRUE FALSE TRUE TRUE FALSE 11
## 11 TRUE FALSE TRUE FALSE TRUE 1
## 12 TRUE FALSE TRUE FALSE FALSE 15
## 13 TRUE FALSE FALSE TRUE TRUE 15
## 14 TRUE FALSE FALSE TRUE FALSE 18
## 15 TRUE FALSE FALSE FALSE TRUE 23
## 16 TRUE FALSE FALSE FALSE FALSE 3117
## 17 FALSE TRUE TRUE TRUE TRUE 7
## 18 FALSE TRUE TRUE TRUE FALSE 1
## 19 FALSE TRUE TRUE FALSE FALSE 3
## 20 FALSE TRUE FALSE FALSE FALSE 6
## 21 FALSE FALSE TRUE FALSE FALSE 1
## 22 FALSE FALSE FALSE TRUE FALSE 5
## 23 FALSE FALSE FALSE FALSE TRUE 3
## 24 FALSE FALSE FALSE FALSE FALSE 360
## percentage
## 1 39.95673671
## 2 4.37783272
## 3 0.16481253
## 4 7.11784096
## 5 0.11330861
## 6 0.01030078
## 7 0.09270705
## 8 11.03213844
## 9 0.19571487
## 10 0.11330861
## 11 0.01030078
## 12 0.15451174
## 13 0.15451174
## 14 0.18541409
## 15 0.23691801
## 16 32.10754017
## 17 0.07210548
## 18 0.01030078
## 19 0.03090235
## 20 0.06180470
## 21 0.01030078
## 22 0.05150391
## 23 0.03090235
## 24 3.70828183
Unique Facts :
1. From 9708 total samples, only 3879 people (39,95%) can answer all five questions correctly!. The rest, 5829 people (60,05%) at least one question that must be answered wrong.
2. There are 360 people answer mistake for all questions (3,70%).
3. Which questions many people answer wrong?
dataifls2 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5) %>%
filter((Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & Pertanyaan4 == "TRUE"
& Pertanyaan5 == "FALSE") |(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE"
& Pertanyaan3 == "TRUE" & Pertanyaan4 == "FALSE"
& Pertanyaan5 == "TRUE") | (Pertanyaan1 == "TRUE"
& Pertanyaan2 == "TRUE"
& Pertanyaan3 == "FALSE"
& Pertanyaan4 == "TRUE"
& Pertanyaan5 == "TRUE") |
(Pertanyaan1 == "TRUE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "TRUE"
& Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE") | (Pertanyaan1 == "FALSE"
& Pertanyaan2 == "TRUE"
& Pertanyaan3 == "TRUE"
& Pertanyaan4 == "TRUE"
& Pertanyaan5 == "TRUE")) %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
print.data.frame(dataifls2)## Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 total percentage
## 1 TRUE TRUE TRUE TRUE FALSE 425 4.37783272
## 2 TRUE TRUE TRUE FALSE TRUE 16 0.16481253
## 3 TRUE TRUE FALSE TRUE TRUE 11 0.11330861
## 4 TRUE FALSE TRUE TRUE TRUE 19 0.19571487
## 5 FALSE TRUE TRUE TRUE TRUE 7 0.07210548
Pertanyaan5 (72 - 7), the highest that people answered wrongly as many as 425 people (4,37%)!, whereas other questions people answer wrongly below 20. Maybe they are tired of answering it, hope so :)
dataifls3 <- dataifls %>%
group_by(Pertanyaan6, Pertanyaan7, Pertanyaan7, Pertanyaan8) %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
print.data.frame(dataifls3)## Pertanyaan6 Pertanyaan7 Pertanyaan8 total percentage
## 1 TRUE TRUE TRUE 2246 23.1355583
## 2 TRUE TRUE FALSE 2620 26.9880511
## 3 TRUE FALSE TRUE 1206 12.4227441
## 4 TRUE FALSE FALSE 3048 31.3967862
## 5 FALSE TRUE TRUE 13 0.1339102
## 6 FALSE TRUE FALSE 36 0.3708282
## 7 FALSE FALSE TRUE 80 0.8240626
## 8 FALSE FALSE FALSE 459 4.7280593
Unique Facts :
1. From 9708 samples, only 2246 people (23,13%) can answer all three questions series of numbers correctly!. The rest, 7462 people (76,87%) at least one question that must be answered wrong.
2. There are 459 people answer mistake for all questions (4,72%).
3. Which questions many people answer wrong?
dataifls4 <- dataifls %>%
group_by(Pertanyaan6, Pertanyaan7, Pertanyaan8) %>%
filter((Pertanyaan6 == "TRUE" & Pertanyaan7 == "TRUE" & Pertanyaan8 == "FALSE") |
(Pertanyaan6 == "TRUE" & Pertanyaan7 == "FALSE" & Pertanyaan8 == "TRUE") |
(Pertanyaan6 == "FALSE" & Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE")) %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
print.data.frame(dataifls4)## Pertanyaan6 Pertanyaan7 Pertanyaan8 total percentage
## 1 TRUE TRUE FALSE 2620 26.9880511
## 2 TRUE FALSE TRUE 1206 12.4227441
## 3 FALSE TRUE TRUE 13 0.1339102
still the last questions (Pertanyaan8), the highest that people answered wrongly as many as 2620 people (26,98%)!. Pertanyaan7 also high that people answered wrongly as many as 1206 people.
Combining all questions, I want to see how the distribution that answers all correctly and wrongly.
dataifls5 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8) %>%
filter((Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" &
Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" &
Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") |
(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" &
Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" &
Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE")) %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
print.data.frame(dataifls5)## Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 Pertanyaan6
## 1 TRUE TRUE TRUE TRUE TRUE TRUE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## Pertanyaan7 Pertanyaan8 total percentage
## 1 TRUE TRUE 1271 13.0922950
## 2 FALSE FALSE 79 0.8137618
From 9708 samples, only 1271 people (13,09%) can answer all eight questions correctly!. On other hand, there are 79 people (0,81%) that answer all wrongly.
previously we have seen the best and the worst from general perspectives, in this section we will see how the best and the worst subset with their Education and Gender
dataifls6 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Jenis_kelamin) %>%
filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" &
Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" &
Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
p1 <- ggplot(dataifls6, aes(Pendidikan_tertinggi, total)) +
geom_col(aes(fill = Jenis_kelamin), position = "dodge", width = 0.8) +
geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 3) +
coord_flip() dataifls7 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Jenis_kelamin) %>%
filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" &
Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" &
Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
p2 <- ggplot(dataifls7, aes(Pendidikan_tertinggi, total)) +
geom_col(aes(fill = Jenis_kelamin), position = "dodge", width = 0.8) +
geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270) +
coord_flip()A=Distribution of Gender that answer all questions true
B=Distribution of Gender that answer all questions false
Conclusions can be made :
1. From the best chart, University (S1) has the highest number of men and women who answer correctly and decreases with the level of education taken down.
2. From the worst chart, Unschooled and Grade School (SD) has the highest number of men and women who answer wrongly and decreases with the level of education taken up.
in this section we will see how the best and the worst subset with their Education and Rural/Urban
dataifls8 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Rural_Urban) %>%
filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" &
Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" &
Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
p3 <- ggplot(dataifls8, aes(Pendidikan_tertinggi, total)) +
geom_col(aes(fill = Rural_Urban), position = "dodge", width = 0.8) +
geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 3) +
coord_flip() dataifls9 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Rural_Urban) %>%
filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" &
Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" &
Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
summarise(total = n()) %>%
mutate(percentage = total/9708*100)
p4 <- ggplot(dataifls9, aes(Pendidikan_tertinggi, total)) +
geom_col(aes(fill = Rural_Urban), position = "dodge", width = 0.8) +
geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270) +
coord_flip()C=Distribution of Rural_Urban that answer all questions true
D=Distribution of Rural_Urban that answer all questions false
Conclusions can be made :
1. For all of the education levels, Urban is always higher than Rural that can answer all questions correctly and decreases with the level of education taken down.
2. From the worst perspective, Rural always higher than Urban that answer all questions wrongly.
dataifls10 <- dataifls %>%
group_by(Suku_bangsa) %>%
summarise(total = n())
print.data.frame(dataifls10)## Suku_bangsa total
## 1 1:Javanese 4378
## 2 2:Sundanese 1223
## 3 3:Balinese 452
## 4 4:Batak 442
## 5 5:Bugis 311
## 6 6:Chinese 37
## 7 7:Maduranese 181
## 8 8:Sasak 490
## 9 9:Minang 489
## 10 10:Banjar 346
## 11 11:Bima-Dompu 231
## 12 12:Makassar 139
## 13 13:Nias 36
## 14 14:Palembang 55
## 15 15:Sumbawa 69
## 16 16:Toraja 53
## 17 17:Betawi 331
## 18 18:Dayak 1
## 19 19:Melayu 78
## 20 20:Komering 29
## 21 21:Ambon 3
## 22 22:Manado 3
## 23 23:Aceh 9
## 24 25:Other South Sumatra 240
## 25 26:Banten 34
## 26 27:Cirebon 4
## 27 28:Gorontalo 1
## 28 95:Other 37
## 29 98:Don't know 6
dataifls11 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Suku_bangsa) %>%
filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" &
Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" &
Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
summarise(total_benar = n()) %>%
mutate(percentage = total_benar/9708*100)
p5 <- ggplot(dataifls11, aes(Suku_bangsa, total_benar)) +
geom_col(position = "dodge", width = 0.8, fill = "green") +
geom_text(aes(label=total_benar), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 2) +
coord_flip()dataifls12 <- dataifls %>%
group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Suku_bangsa) %>%
filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" &
Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" &
Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
summarise(total_salah = n()) %>%
mutate(percentage = total_salah/9708*100)
p6 <- ggplot(dataifls12, aes(Suku_bangsa, total_salah)) +
geom_col(position = "dodge", width = 0.8, fill = "red") +
geom_text(aes(label=total_salah), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 2) +
coord_flip()E=Distribution of ethnics that answer all questions true
F=Distribution of ethnics that answer all questions false
Conclusions can be made :
1. From the top three ethnics that can answer all questions correctly, Javanese only 653 people from 4378 people (14,91%). Sundanese only 153 people from 1223 people (12,51%). Balinese only 70 people from 452 people (15,48%).
2. From the worst that answer all questions wrongly, Sasak have 24 people answer all questions wrongly from 490 people (4,89%) and Javanese have 24 people answer wrongly from 4378 people (0,54%).
The higher the level of education and people living in cities, the higher a person’s cognitive level based on basic numerical abilities. However, the level of people who answer all the questions correctly is not high and this needs attention.