1 Background

1.1 Explanation

In this Learning By Building (LBB), I’m interested to show unique facts how Indonesian cognitive abilities in counting simple math, hope you enjoy it!

1.2 IFLS

The Indonesian Family Life Survey IFLS is an on-going longitudinal survey in Indonesia. The sample is representative of about 83% of the Indonesian population and contains over 30,000 individuals living in 13 of the 27 provinces in the country. The map below identifies the 13 IFLS provinces in the IFLS.

IFLS Data.

1.3 History

The first wave of the IFLS (IFLS1) was conducted in 1993/94 by RAND in collaboration with Lembaga Demografi, University of Indonesia. IFLS2 and IFLS2+ were conducted in 1997 and 1998, respectively, by RAND in collaboration with UCLA and Lembaga Demografi, University of Indonesia. IFLS2+ covered a 25% sub-sample of the IFLS households. IFLS3, which was fielded in 2000 and covered the full sample, was conducted by RAND in collaboration with the Population Research center, University of Gadjah Mada. The fourth wave of the IFLS (IFLS4), fielded in 2007/2008 covering the full sample, was conducted by RAND, the center for Population and Policy Studies (CPPS) of the University of Gadjah Mada and Survey METRE. The fifth wave of the IFLS (IFLS-5) was fielded 2014-15.

2 Input Data, Cleansing Data, and Merging Data

IFLS data has its own challenges because the data is fragmented with each subtopic, so this will be a long journey

# Data Input
bkar0 <- read.dta("bk_ar0.dta")
bkar1 <- read.dta("bk_ar1.dta")
bksc1 <- read.dta("bk_sc1.dta")
b3bco1 <- read.dta("b3b_co1.dta")
b3bcob <- read.dta("b3b_cob.dta")

Explanation the variable data :
- bkar0 = book K with variable ar0
- bkar1 = book K with variable ar1
- bksc1 = book K with variable sc1
- b3bco1 = book 3B with variable co1
- b3bcob = book 3B with variable cob

Because there are many more sub-variables inside the variable, cleansing data is necessary to take the data needed

2.1 Cleansing Data and Merging Data for book K

# Cleansing Data for book K
databkar1 <- bkar1[, c("hhid14_9", "pid14", "hhid14", "pidlink", "ar01a","ar02b", "ar07", "ar09", "ar15", "ar15b", "ar15d", "ar16")]
databkar1$ar16 <- as.factor(databkar1$ar16)
glimpse(databkar1)

## Observations: 89,382
## Variables: 12
## $ hhid14_9 <chr> "001060000", "001060000", "001060000", "001060000", "00106...
## $ pid14    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 1, 2, 3, 4,...
## $ hhid14   <chr> "0010600", "0010600", "0010600", "0010600", "0010600", "00...
## $ pidlink  <chr> "001060001", "001060002", "001060003", "001060004", "00106...
## $ ar01a    <dbl> 1, 0, 0, 3, 0, 0, 5, 5, 5, 5, 5, 2, 5, 5, 5, 3, 0, 3, 3, 3...
## $ ar02b    <fct> 1:Head of the household, NA, NA, 3:Children (biological, N...
## $ ar07     <dbl> 1, NA, NA, 3, NA, NA, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, NA, 1,...
## $ ar09     <dbl> 59, NA, NA, 29, NA, NA, 39, 16, 12, 3, 1, 29, 30, 6, 3, 59...
## $ ar15     <fct> 2:Protestant, NA, NA, 2:Protestant, NA, NA, 2:Protestant, ...
## $ ar15b    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.0e+06, N...
## $ ar15d    <fct> 13:Nias, NA, NA, 13:Nias, NA, NA, 13:Nias, 13:Nias, 13:Nia...
## $ ar16     <fct> 2:Grade school, NA, NA, 2:Grade school, NA, NA, 1:Unschool...

Since many samples have Not Available (NA) Information, I will subset and delete samples that have no information and rename the columns

databkar1 <- databkar1 %>% na.omit()
names(databkar1)[names(databkar1) == "ar01a"] <- "Masih_tinggal_di_RT"
names(databkar1)[names(databkar1) == "ar02b"] <- "Hubungan_dengan_kepala_RT"
names(databkar1)[names(databkar1) == "ar07"] <- "Jenis_kelamin"
names(databkar1)[names(databkar1) == "ar09"] <- "Umur"
names(databkar1)[names(databkar1) == "ar15d"] <- "Suku_bangsa"
names(databkar1)[names(databkar1) == "ar16"] <- "Pendidikan_tertinggi"
names(databkar1)[names(databkar1) == "ar15"] <- "Agama"
names(databkar1)[names(databkar1) == "ar15b"] <- "Pendapatan_12_bulan_terakhir"
databkar1$Masih_tinggal_di_RT <- as.factor(databkar1$Masih_tinggal_di_RT)
databkar1$Jenis_kelamin <- as.factor(databkar1$Jenis_kelamin)
databkar1$Umur <- as.factor(databkar1$Umur)
databkar1 <- databkar1[order(databkar1$pidlink ,decreasing = F),]
glimpse(databkar1)

## Observations: 28,900
## Variables: 12
## $ hhid14_9                     <chr> "001060004", "001080000", "001080000",...
## $ pid14                        <dbl> 2, 1, 3, 1, 4, 5, 6, 8, 9, 1, 3, 1, 6,...
## $ hhid14                       <chr> "0010651", "0010800", "0010800", "0010...
## $ pidlink                      <chr> "001065102", "001080001", "001080003",...
## $ Masih_tinggal_di_RT          <fct> 5, 3, 3, 2, 3, 3, 3, 1, 3, 1, 3, 1, 1,...
## $ Hubungan_dengan_kepala_RT    <fct> 1:Head of the household, 6:Parents, 8:...
## $ Jenis_kelamin                <fct> 1, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1,...
## $ Umur                         <fct> 30, 59, 36, 36, 34, 32, 30, 26, 24, 55...
## $ Agama                        <fct> 2:Protestant, 2:Protestant, 2:Protesta...
## $ Pendapatan_12_bulan_terakhir <dbl> 2000000, 12000000, 2400000, 6000000, 1...
## $ Suku_bangsa                  <fct> 13:Nias, 13:Nias, 13:Nias, 13:Nias, 13...
## $ Pendidikan_tertinggi         <fct> 3:General jr. high, 2:Grade school, 1:...

Notes Variables : hhid14_9, pid14, hhid14, pidlink = Individual Code in Households use for merging data later

databksc1 <- bksc1[, c("hhid14_9", "hhid14", "sc01_14_14","sc05")]
databksc1 <- databksc1 %>% na.omit()
names(databksc1)[names(databksc1) == "sc01_14_14"] <- "Provinsi"
names(databksc1)[names(databksc1) == "sc05"] <- "Rural_Urban"
databksc1$Provinsi <- as.factor(databksc1$Provinsi)
glimpse(databksc1)

## Observations: 15,921
## Variables: 4
## $ hhid14_9    <chr> "001060000", "001060004", "001080000", "001080003", "00...
## $ hhid14      <chr> "0010600", "0010651", "0010800", "0010851", "0012200", ...
## $ Provinsi    <fct> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,...
## $ Rural_Urban <fct> 2:Rural, 2:Rural, 2:Rural, 2:Rural, 2:Rural, 1:Urban, 2...

# Merging Data for book K
databk <- inner_join(databkar1, databksc1, by = c("hhid14", "hhid14_9"))
databk <- distinct(databk, hhid14_9, .keep_all = T)
glimpse(databk)

## Observations: 13,761
## Variables: 14
## $ hhid14_9                     <chr> "001060004", "001080000", "001080003",...
## $ pid14                        <dbl> 2, 1, 1, 1, 1, 1, 4, 1, 1, 3, 1, 1, 1,...
## $ hhid14                       <chr> "0010651", "0010800", "0010851", "0012...
## $ pidlink                      <chr> "001065102", "001080001", "001080003",...
## $ Masih_tinggal_di_RT          <fct> 5, 3, 2, 1, 1, 1, 1, 5, 5, 1, 1, 2, 2,...
## $ Hubungan_dengan_kepala_RT    <fct> 1:Head of the household, 6:Parents, 1:...
## $ Jenis_kelamin                <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ Umur                         <fct> 30, 59, 36, 55, 34, 24, 35, 50, 30, 34...
## $ Agama                        <fct> 2:Protestant, 2:Protestant, 2:Protesta...
## $ Pendapatan_12_bulan_terakhir <dbl> 2000000, 12000000, 6000000, 36000000, ...
## $ Suku_bangsa                  <fct> 13:Nias, 13:Nias, 13:Nias, 13:Nias, 13...
## $ Pendidikan_tertinggi         <fct> 3:General jr. high, 2:Grade school, 2:...
## $ Provinsi                     <fct> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12...
## $ Rural_Urban                  <fct> 2:Rural, 2:Rural, 2:Rural, 2:Rural, 2:...

Merging data into databk will make the observations decrease into 13.761 samples from databkar1 with 28.900 samples and databksc1 with 15.921 samples.

2.2 Cleansing Data and Merging Data for book 3B

# Cleansing Data for book 3B
datab3bco1 <- b3bco1[, c("hhid14_9", "pid14", "hhid14", "pidlink", "co04a", "co04b", "co04c", "co04d", "co04e")]
datab3bco1 <- datab3bco1 %>% na.omit()
glimpse(datab3bco1)

## Observations: 28,983
## Variables: 9
## $ hhid14_9 <chr> "001060000", "001060004", "001060000", "001060000", "00106...
## $ pid14    <dbl> 1, 1, 7, 8, 2, 1, 8, 1, 2, 1, 6, 1, 1, 2, 3, 3, 7, 2, 12, ...
## $ hhid14   <chr> "0010600", "0010651", "0010600", "0010600", "0010651", "00...
## $ pidlink  <chr> "001060001", "001060004", "001060007", "001060008", "00106...
## $ co04a    <dbl> 93, 93, 93, 93, 93, 20, 93, 93, 93, 93, 93, 94, 93, 93, 93...
## $ co04b    <dbl> 86, 86, 86, 86, 86, 15, 86, 66, 86, 86, 86, 82, 86, 86, 86...
## $ co04c    <dbl> 79, 79, 79, 79, 79, 55, 81, 59, 77, 79, 79, 70, 79, 79, 78...
## $ co04d    <dbl> 72, 72, 72, 72, 72, 40, 74, 52, 70, 72, 72, 65, 72, 72, 71...
## $ co04e    <dbl> 65, 65, 66, 65, 65, 30, 67, 45, 63, 65, 65, 60, 65, 65, 66...

datab3bcob <- b3bcob[, c("hhid14_9", "pid14", "hhid14", "pidlink", "cob01_a", "cob02_a", "cob03_a")]
datab3bcob <- datab3bcob %>% na.omit()
glimpse(datab3bcob)

## Observations: 31,409
## Variables: 7
## $ hhid14_9 <chr> "001060000", "001060000", "001060000", "001060004", "00106...
## $ pid14    <dbl> 1, 7, 8, 1, 2, 8, 1, 2, 1, 2, 6, 1, 2, 3, 1, 1, 2, 3, 4, 7...
## $ hhid14   <chr> "0010600", "0010600", "0010600", "0010651", "0010651", "00...
## $ pidlink  <chr> "001060001", "001060007", "001060008", "001060004", "00106...
## $ cob01_a  <dbl> 9, 9, 9, 9, 9, 9, 6, 98, 9, 9, 9, 9, 9, 9, 9, 9, 98, 9, 98...
## $ cob02_a  <dbl> 10, 9, 10, 10, 10, 10, 6, 98, 11, 2, 10, 10, 11, 11, 13, 7...
## $ cob03_a  <dbl> 4, 9, 5, 5, 4, 7, 6, 98, 7, 2, 4, 5, 5, 5, 2, 5, 98, 7, 98...

# Merging Data for book 3B
datab3b <- inner_join(datab3bco1, datab3bcob, by = c("pidlink", "hhid14_9", "pid14", "hhid14"))
datab3b <- distinct(datab3b, pidlink, .keep_all = T)
datab3b <- datab3b[order(datab3b$pidlink, decreasing = F),]
glimpse(datab3b)

## Observations: 28,971
## Variables: 12
## $ hhid14_9 <chr> "001060000", "001060004", "001060000", "001060000", "00106...
## $ pid14    <dbl> 1, 1, 7, 8, 2, 1, 8, 1, 2, 1, 6, 1, 1, 2, 3, 3, 7, 2, 12, ...
## $ hhid14   <chr> "0010600", "0010651", "0010600", "0010600", "0010651", "00...
## $ pidlink  <chr> "001060001", "001060004", "001060007", "001060008", "00106...
## $ co04a    <dbl> 93, 93, 93, 93, 93, 20, 93, 93, 93, 93, 93, 94, 93, 93, 93...
## $ co04b    <dbl> 86, 86, 86, 86, 86, 15, 86, 66, 86, 86, 86, 82, 86, 86, 86...
## $ co04c    <dbl> 79, 79, 79, 79, 79, 55, 81, 59, 77, 79, 79, 70, 79, 79, 78...
## $ co04d    <dbl> 72, 72, 72, 72, 72, 40, 74, 52, 70, 72, 72, 65, 72, 72, 71...
## $ co04e    <dbl> 65, 65, 66, 65, 65, 30, 67, 45, 63, 65, 65, 60, 65, 65, 66...
## $ cob01_a  <dbl> 9, 9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9...
## $ cob02_a  <dbl> 10, 10, 9, 10, 10, 6, 10, 11, 2, 10, 10, 13, 7, 11, 11, 9,...
## $ cob03_a  <dbl> 4, 5, 9, 5, 4, 6, 7, 7, 2, 5, 4, 2, 5, 5, 5, 7, 5, 7, 10, ...

Merging data into datab3b will make the observations decrease into 28.971 samples from datab3bco1 with 28.983 samples and datab3bcob with 31.409 samples.

2.3 Merging Data Final for book K and book 3B

#Merging Data for Book K and Book 3B
dataifls <- inner_join(databk, datab3b, by = c("pidlink", "hhid14_9", "pid14", "hhid14"))
dataifls <- distinct(dataifls, pidlink, .keep_all = T)
dataifls <- dataifls[order(dataifls$pidlink ,decreasing = F),]
names(dataifls)[names(dataifls) == "co04a"] <- "Pertanyaan1"
names(dataifls)[names(dataifls) == "co04b"] <- "Pertanyaan2"
names(dataifls)[names(dataifls) == "co04c"] <- "Pertanyaan3"
names(dataifls)[names(dataifls) == "co04d"] <- "Pertanyaan4"
names(dataifls)[names(dataifls) == "co04e"] <- "Pertanyaan5"
names(dataifls)[names(dataifls) == "cob01_a"] <- "Pertanyaan6"
names(dataifls)[names(dataifls) == "cob02_a"] <- "Pertanyaan7"
names(dataifls)[names(dataifls) == "cob03_a"] <- "Pertanyaan8"
dataifls$Pertanyaan1 <- as.factor((dataifls$Pertanyaan1))
dataifls$Pertanyaan2 <- as.factor((dataifls$Pertanyaan2))
dataifls$Pertanyaan3 <- as.factor((dataifls$Pertanyaan3))
dataifls$Pertanyaan4 <- as.factor((dataifls$Pertanyaan4))
dataifls$Pertanyaan5 <- as.factor((dataifls$Pertanyaan5))
dataifls$Pertanyaan6 <- as.factor((dataifls$Pertanyaan6))
dataifls$Pertanyaan7 <- as.factor((dataifls$Pertanyaan7))
dataifls$Pertanyaan8 <- as.factor((dataifls$Pertanyaan8))
dataifls$Pertanyaan1 <- factor(ifelse(dataifls$Pertanyaan1==93, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan2 <- factor(ifelse(dataifls$Pertanyaan2==86, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan3 <- factor(ifelse(dataifls$Pertanyaan3==79, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan4 <- factor(ifelse(dataifls$Pertanyaan4==72, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan5 <- factor(ifelse(dataifls$Pertanyaan5==65, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan6 <- factor(ifelse(dataifls$Pertanyaan6==9, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan7 <- factor(ifelse(dataifls$Pertanyaan7==10, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Pertanyaan8 <- factor(ifelse(dataifls$Pertanyaan8==4, "TRUE", "FALSE"), c("TRUE", "FALSE"))
dataifls$Jenis_kelamin <- factor(ifelse(dataifls$Jenis_kelamin==1, "Pria", "Wanita"), c("Pria", "Wanita"))
summary(dataifls)

##    hhid14_9             pid14           hhid14            pidlink         
##  Length:9708        Min.   : 1.000   Length:9708        Length:9708       
##  Class :character   1st Qu.: 1.000   Class :character   Class :character  
##  Mode  :character   Median : 1.000   Mode  :character   Mode  :character  
##                     Mean   : 1.741                                        
##                     3rd Qu.: 2.000                                        
##                     Max.   :19.000                                        
##                                                                           
##  Masih_tinggal_di_RT           Hubungan_dengan_kepala_RT Jenis_kelamin
##  1 :6664             1:Head of the household:7110        Pria  :6606  
##  2 :2476             2:Husband/wife         :1802        Wanita:3102  
##  3 :   1             3:Children (biological : 356                     
##  5 : 558             5:Sons/daughters-in-law: 172                     
##  11:   9             6:Parents              :  75                     
##                      8:Siblings             :  38                     
##                      (Other)                : 155                     
##       Umur               Agama      Pendapatan_12_bulan_terakhir
##  35     : 352   1:Islam     :8688   Min.   :        0           
##  33     : 341   4:Hinduism  : 473   1st Qu.:  6000000           
##  31     : 340   2:Protestant: 388   Median : 13000000           
##  34     : 340   3:Catholic  : 140   Mean   : 22242390           
##  32     : 330   5:Buddhism  :  14   3rd Qu.: 26400000           
##  30     : 314   7:Konghucu  :   4   Max.   :999999995           
##  (Other):7691   (Other)     :   1                               
##       Suku_bangsa                    Pendidikan_tertinggi    Provinsi   
##  1:Javanese :4378   2:Grade school             :2969      32     :1405  
##  2:Sundanese:1223   5:General sr. high (SLA)   :1504      33     :1218  
##  8:Sasak    : 490   3:General jr. high         :1457      35     :1060  
##  9:Minang   : 489   6:Vocational sr. high (SMK):1174      52     : 791  
##  3:Balinese : 452   61:University S1           :1060      12     : 768  
##  4:Batak    : 442   60:Diploma (D1, D2, D3)    : 382      31     : 645  
##  (Other)    :2234   (Other)                    :1162      (Other):3821  
##   Rural_Urban   Pertanyaan1  Pertanyaan2  Pertanyaan3  Pertanyaan4 
##  1:Urban:5923   TRUE :9322   TRUE :6120   TRUE :5069   TRUE :4392  
##  2:Rural:3785   FALSE: 386   FALSE:3588   FALSE:4639   FALSE:5316  
##                                                                    
##                                                                    
##                                                                    
##                                                                    
##                                                                    
##  Pertanyaan5  Pertanyaan6  Pertanyaan7  Pertanyaan8 
##  TRUE :3983   TRUE :9120   TRUE :4915   TRUE :3545  
##  FALSE:5725   FALSE: 588   FALSE:4793   FALSE:6163  
##                                                     
##                                                     
##                                                     
##                                                     
##

after cleansing data and merging data into dataifls, final observations left 9708 samples.
Notes Variables :
Code answer for “Masih_tinggal_di_RT” Variable
1. Ya, Masih tinggal di RT ini
2. ART Panel, pada survey terakhir tidak di RT ini dan sekarang kembali ke RT ini
3. Tidak lagi tinggal di RT ini
5. ART baru
11. ART kembali dalam putaran yang sama

Code answer for “Provinsi” Variable Kode Provinsi.

List Questions for “Pertanyaanx” variables
Pertanyaan1 = 100 - 7, answer TRUE = 93
Pertanyaan2 = 93 - 7, answer TRUE = 86
Pertanyaan3 = 86 - 7, answer TRUE = 79
Pertanyaan4 = 79 - 7, answer TRUE = 72
Pertanyaan5 = 72 - 7, answer TRUE = 65
Pertanyaan6, Pertanyaan7, Pertanyaan8, answer TRUE = 9, 10, 4
Pertanyaan.

3 Exploratory Data

After cleansing data, we will analyze how Indonesian people cognitive ability which is divided into two categories such as,
simple numeric (Pertanyaan1 – Pertanyaan5) and series of numbers (Pertanyaan6 – Pertanyaan8)

3.1 Simple Numeric Questions

dataifls1 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5) %>% 
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
print.data.frame(dataifls1)

##    Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 total
## 1         TRUE        TRUE        TRUE        TRUE        TRUE  3879
## 2         TRUE        TRUE        TRUE        TRUE       FALSE   425
## 3         TRUE        TRUE        TRUE       FALSE        TRUE    16
## 4         TRUE        TRUE        TRUE       FALSE       FALSE   691
## 5         TRUE        TRUE       FALSE        TRUE        TRUE    11
## 6         TRUE        TRUE       FALSE        TRUE       FALSE     1
## 7         TRUE        TRUE       FALSE       FALSE        TRUE     9
## 8         TRUE        TRUE       FALSE       FALSE       FALSE  1071
## 9         TRUE       FALSE        TRUE        TRUE        TRUE    19
## 10        TRUE       FALSE        TRUE        TRUE       FALSE    11
## 11        TRUE       FALSE        TRUE       FALSE        TRUE     1
## 12        TRUE       FALSE        TRUE       FALSE       FALSE    15
## 13        TRUE       FALSE       FALSE        TRUE        TRUE    15
## 14        TRUE       FALSE       FALSE        TRUE       FALSE    18
## 15        TRUE       FALSE       FALSE       FALSE        TRUE    23
## 16        TRUE       FALSE       FALSE       FALSE       FALSE  3117
## 17       FALSE        TRUE        TRUE        TRUE        TRUE     7
## 18       FALSE        TRUE        TRUE        TRUE       FALSE     1
## 19       FALSE        TRUE        TRUE       FALSE       FALSE     3
## 20       FALSE        TRUE       FALSE       FALSE       FALSE     6
## 21       FALSE       FALSE        TRUE       FALSE       FALSE     1
## 22       FALSE       FALSE       FALSE        TRUE       FALSE     5
## 23       FALSE       FALSE       FALSE       FALSE        TRUE     3
## 24       FALSE       FALSE       FALSE       FALSE       FALSE   360
##     percentage
## 1  39.95673671
## 2   4.37783272
## 3   0.16481253
## 4   7.11784096
## 5   0.11330861
## 6   0.01030078
## 7   0.09270705
## 8  11.03213844
## 9   0.19571487
## 10  0.11330861
## 11  0.01030078
## 12  0.15451174
## 13  0.15451174
## 14  0.18541409
## 15  0.23691801
## 16 32.10754017
## 17  0.07210548
## 18  0.01030078
## 19  0.03090235
## 20  0.06180470
## 21  0.01030078
## 22  0.05150391
## 23  0.03090235
## 24  3.70828183

Unique Facts :
1. From 9708 total samples, only 3879 people (39,95%) can answer all five questions correctly!. The rest, 5829 people (60,05%) at least one question that must be answered wrong.
2. There are 360 people answer mistake for all questions (3,70%).
3. Which questions many people answer wrong?

dataifls2 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5) %>%
  filter((Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & Pertanyaan4 == "TRUE" 
         & Pertanyaan5 == "FALSE") |(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" 
                                     & Pertanyaan3 == "TRUE" & Pertanyaan4 == "FALSE" 
                                     & Pertanyaan5 == "TRUE") | (Pertanyaan1 == "TRUE" 
                                                                 & Pertanyaan2 == "TRUE" 
                                                                 & Pertanyaan3 == "FALSE" 
                                                                 & Pertanyaan4 == "TRUE"
                                                                 & Pertanyaan5 == "TRUE") | 
           (Pertanyaan1 == "TRUE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "TRUE" 
            & Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE") | (Pertanyaan1 == "FALSE" 
                                                                 & Pertanyaan2 == "TRUE" 
                                                                 & Pertanyaan3 == "TRUE" 
                                                                 & Pertanyaan4 == "TRUE" 
                                                                 & Pertanyaan5 == "TRUE")) %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
print.data.frame(dataifls2)

##   Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 total percentage
## 1        TRUE        TRUE        TRUE        TRUE       FALSE   425 4.37783272
## 2        TRUE        TRUE        TRUE       FALSE        TRUE    16 0.16481253
## 3        TRUE        TRUE       FALSE        TRUE        TRUE    11 0.11330861
## 4        TRUE       FALSE        TRUE        TRUE        TRUE    19 0.19571487
## 5       FALSE        TRUE        TRUE        TRUE        TRUE     7 0.07210548

Pertanyaan5 (72 - 7), the highest that people answered wrongly as many as 425 people (4,37%)!, whereas other questions people answer wrongly below 20. Maybe they are tired of answering it, hope so :)

3.2 Series of Numbers Questions

dataifls3 <- dataifls %>% 
  group_by(Pertanyaan6, Pertanyaan7, Pertanyaan7, Pertanyaan8) %>% 
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
print.data.frame(dataifls3)

##   Pertanyaan6 Pertanyaan7 Pertanyaan8 total percentage
## 1        TRUE        TRUE        TRUE  2246 23.1355583
## 2        TRUE        TRUE       FALSE  2620 26.9880511
## 3        TRUE       FALSE        TRUE  1206 12.4227441
## 4        TRUE       FALSE       FALSE  3048 31.3967862
## 5       FALSE        TRUE        TRUE    13  0.1339102
## 6       FALSE        TRUE       FALSE    36  0.3708282
## 7       FALSE       FALSE        TRUE    80  0.8240626
## 8       FALSE       FALSE       FALSE   459  4.7280593

Unique Facts :
1. From 9708 samples, only 2246 people (23,13%) can answer all three questions series of numbers correctly!. The rest, 7462 people (76,87%) at least one question that must be answered wrong.
2. There are 459 people answer mistake for all questions (4,72%).
3. Which questions many people answer wrong?

dataifls4 <- dataifls %>% 
  group_by(Pertanyaan6, Pertanyaan7, Pertanyaan8) %>% 
  filter((Pertanyaan6 == "TRUE" & Pertanyaan7 == "TRUE" & Pertanyaan8 == "FALSE") | 
           (Pertanyaan6 == "TRUE" & Pertanyaan7 == "FALSE" & Pertanyaan8 == "TRUE") |
           (Pertanyaan6 == "FALSE" & Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE")) %>% 
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
print.data.frame(dataifls4)

##   Pertanyaan6 Pertanyaan7 Pertanyaan8 total percentage
## 1        TRUE        TRUE       FALSE  2620 26.9880511
## 2        TRUE       FALSE        TRUE  1206 12.4227441
## 3       FALSE        TRUE        TRUE    13  0.1339102

still the last questions (Pertanyaan8), the highest that people answered wrongly as many as 2620 people (26,98%)!. Pertanyaan7 also high that people answered wrongly as many as 1206 people.

3.3 The Best and The Worst

Combining all questions, I want to see how the distribution that answers all correctly and wrongly.

dataifls5 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8) %>% 
  filter((Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & 
            Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" & 
            Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") |
           (Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" & 
            Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" & 
            Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE")) %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
print.data.frame(dataifls5)

##   Pertanyaan1 Pertanyaan2 Pertanyaan3 Pertanyaan4 Pertanyaan5 Pertanyaan6
## 1        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE
## 2       FALSE       FALSE       FALSE       FALSE       FALSE       FALSE
##   Pertanyaan7 Pertanyaan8 total percentage
## 1        TRUE        TRUE  1271 13.0922950
## 2       FALSE       FALSE    79  0.8137618

From 9708 samples, only 1271 people (13,09%) can answer all eight questions correctly!. On other hand, there are 79 people (0,81%) that answer all wrongly.

3.3.1 The Best and The Worst looking from Education and Gender

previously we have seen the best and the worst from general perspectives, in this section we will see how the best and the worst subset with their Education and Gender

dataifls6 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Jenis_kelamin) %>% 
  filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & 
            Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" & 
            Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
p1 <- ggplot(dataifls6, aes(Pendidikan_tertinggi, total)) +
  geom_col(aes(fill = Jenis_kelamin), position = "dodge", width = 0.8) +
  geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 3) +
  coord_flip()

dataifls7 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Jenis_kelamin) %>% 
  filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" & 
            Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" & 
            Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
p2 <- ggplot(dataifls7, aes(Pendidikan_tertinggi, total)) +
  geom_col(aes(fill = Jenis_kelamin), position = "dodge", width = 0.8) +
  geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270) +
  coord_flip()

combine <- ggarrange(p1, p2,
                     labels = c("A", "B"),
                     ncol = , nrow = 2)
combine

A=Distribution of Gender that answer all questions true
B=Distribution of Gender that answer all questions false

Conclusions can be made :
1. From the best chart, University (S1) has the highest number of men and women who answer correctly and decreases with the level of education taken down.
2. From the worst chart, Unschooled and Grade School (SD) has the highest number of men and women who answer wrongly and decreases with the level of education taken up.

3.3.2 The Best and The Worst looking from Education and Rural/Urban

in this section we will see how the best and the worst subset with their Education and Rural/Urban

dataifls8 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Rural_Urban) %>% 
  filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & 
            Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" & 
            Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
p3 <- ggplot(dataifls8, aes(Pendidikan_tertinggi, total)) +
  geom_col(aes(fill = Rural_Urban), position = "dodge", width = 0.8) +
  geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 3) +
  coord_flip()

dataifls9 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Pendidikan_tertinggi, Rural_Urban) %>% 
  filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" & 
            Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" & 
            Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
  summarise(total = n()) %>% 
  mutate(percentage = total/9708*100)
p4 <- ggplot(dataifls9, aes(Pendidikan_tertinggi, total)) +
  geom_col(aes(fill = Rural_Urban), position = "dodge", width = 0.8) +
  geom_text(aes(label=total), position=position_dodge(width=1.0), vjust=-0.25, angle = 270) +
  coord_flip()

combine1 <- ggarrange(p3, p4,
                     labels = c("C", "D"),
                     ncol = , nrow = 2)
combine1

C=Distribution of Rural_Urban that answer all questions true
D=Distribution of Rural_Urban that answer all questions false

Conclusions can be made :
1. For all of the education levels, Urban is always higher than Rural that can answer all questions correctly and decreases with the level of education taken down.
2. From the worst perspective, Rural always higher than Urban that answer all questions wrongly.

3.3.3 The Best and The Worst looking from Ethnic

dataifls10 <- dataifls %>% 
  group_by(Suku_bangsa) %>% 
  summarise(total = n())
print.data.frame(dataifls10)

##               Suku_bangsa total
## 1              1:Javanese  4378
## 2             2:Sundanese  1223
## 3              3:Balinese   452
## 4                 4:Batak   442
## 5                 5:Bugis   311
## 6               6:Chinese    37
## 7            7:Maduranese   181
## 8                 8:Sasak   490
## 9                9:Minang   489
## 10              10:Banjar   346
## 11          11:Bima-Dompu   231
## 12            12:Makassar   139
## 13                13:Nias    36
## 14           14:Palembang    55
## 15             15:Sumbawa    69
## 16              16:Toraja    53
## 17              17:Betawi   331
## 18               18:Dayak     1
## 19              19:Melayu    78
## 20            20:Komering    29
## 21               21:Ambon     3
## 22              22:Manado     3
## 23                23:Aceh     9
## 24 25:Other South Sumatra   240
## 25              26:Banten    34
## 26             27:Cirebon     4
## 27           28:Gorontalo     1
## 28               95:Other    37
## 29          98:Don't know     6

dataifls11 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Suku_bangsa) %>% 
  filter(Pertanyaan1 == "TRUE" & Pertanyaan2 == "TRUE" & Pertanyaan3 == "TRUE" & 
            Pertanyaan4 == "TRUE" & Pertanyaan5 == "TRUE" & Pertanyaan6 == "TRUE" & 
            Pertanyaan7 == "TRUE" & Pertanyaan8 == "TRUE") %>%
  summarise(total_benar = n()) %>% 
  mutate(percentage = total_benar/9708*100)

p5 <- ggplot(dataifls11, aes(Suku_bangsa, total_benar)) +
  geom_col(position = "dodge", width = 0.8, fill = "green") +
  geom_text(aes(label=total_benar), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 2) +
  coord_flip()

dataifls12 <- dataifls %>% 
  group_by(Pertanyaan1, Pertanyaan2, Pertanyaan3, Pertanyaan4, Pertanyaan5, Pertanyaan6, Pertanyaan7, Pertanyaan8, Suku_bangsa) %>% 
  filter(Pertanyaan1 == "FALSE" & Pertanyaan2 == "FALSE" & Pertanyaan3 == "FALSE" & 
            Pertanyaan4 == "FALSE" & Pertanyaan5 == "FALSE" & Pertanyaan6 == "FALSE" & 
            Pertanyaan7 == "FALSE" & Pertanyaan8 == "FALSE") %>%
  summarise(total_salah = n()) %>% 
  mutate(percentage = total_salah/9708*100)

p6 <- ggplot(dataifls12, aes(Suku_bangsa, total_salah)) +
  geom_col(position = "dodge", width = 0.8, fill = "red") +
  geom_text(aes(label=total_salah), position=position_dodge(width=1.0), vjust=-0.25, angle = 270, size = 2) +
  coord_flip()

combine2 <- ggarrange(p5, p6,
                     labels = c("E", "F"),
                     ncol = , nrow = 2)
combine2

E=Distribution of ethnics that answer all questions true
F=Distribution of ethnics that answer all questions false

Conclusions can be made :
1. From the top three ethnics that can answer all questions correctly, Javanese only 653 people from 4378 people (14,91%). Sundanese only 153 people from 1223 people (12,51%). Balinese only 70 people from 452 people (15,48%).
2. From the worst that answer all questions wrongly, Sasak have 24 people answer all questions wrongly from 490 people (4,89%) and Javanese have 24 people answer wrongly from 4378 people (0,54%).

4 Conclusions

The higher the level of education and people living in cities, the higher a person’s cognitive level based on basic numerical abilities. However, the level of people who answer all the questions correctly is not high and this needs attention.

How Good Indonesia’s Cognitive Ability?

Aji Putera Tanumihardja

1/27/2020