df <- read.csv("E:/Binus University/Semester 2/Data Mining and Visualization/bike_buyers.csv")
dim(df)
## [1] 1000 13
EXPLANATION Data Set Buyer Bike terdiri dari 1000 baris dan 13 kolom.
str(df)
## 'data.frame': 1000 obs. of 13 variables:
## $ ï..ID : int 12496 24107 14177 24381 25597 13507 27974 19364 22155 19280 ...
## $ Marital.Status : chr "Married" "Married" "Married" "Single" ...
## $ Gender : chr "Female" "Male" "Male" "" ...
## $ Income : int 40000 30000 80000 70000 30000 10000 160000 40000 20000 NA ...
## $ Children : int 1 3 5 0 0 2 2 1 2 2 ...
## $ Education : chr "Bachelors" "Partial College" "Partial College" "Bachelors" ...
## $ Occupation : chr "Skilled Manual" "Clerical" "Professional" "Professional" ...
## $ Home.Owner : chr "Yes" "Yes" "No" "Yes" ...
## $ Cars : int 0 1 2 1 0 0 4 0 2 1 ...
## $ Commute.Distance: chr "0-1 Miles" "0-1 Miles" "2-5 Miles" "5-10 Miles" ...
## $ Region : chr "Europe" "Europe" "Europe" "Pacific" ...
## $ Age : int 42 43 60 41 36 50 33 43 58 NA ...
## $ Purchased.Bike : chr "No" "No" "No" "Yes" ...
EXPLANATION ID : ID buyer bike, dengan tipe data integer
Marital Status: Status pernikahan buyer bike, dengan tipe data character
Gender : Jenis kelamin buyer bike, dengan tipe data character
Income : Besar pendapatan buyer bike, dengan tipe data integer
Children : Jumlah anak buyer bike, dengan tipe data integer
Education : Latar belakang pendidikan buyer bike, dengan tipe data character
Occupation : Pekerjaan buyer bike, dengan tipe data character
Home.Owner : Apakah buyer bike memiliki rumah atau tidak, dengan tipe data character
Cars : Jumlah mobil buyer bike, dengan tipe data integer
Commute.Distance : Jarak rumah buyer bike dengan perusahaan, dengan tipe data character
Region : Daerah tempat tinggal buyer bike, dengan tipe data character
Age : Usia buyer bike, dengan tipe data integer
Purchased.Bike : apakah buyer bike jadi membeli atau tidak, dengan tipe data character
BasicSummary <- function(df, dgts = 3){
## #
## ################################################################
## #
## # Create a basic summary of variables in the data frame df,
## # a data frame with one row for each column of df giving the
## # variable name, type, number of unique levels, the most
## # frequent level, its frequency and corresponding fraction of
## # records, the number of missing values and its corresponding
## # fraction of records
## #
## ################################################################
## #
m <- ncol(df)
varNames <- colnames(df)
varType <- vector("character",m)
topLevel <- vector("character",m)
topCount <- vector("numeric",m)
missCount <- vector("numeric",m)
levels <- vector("numeric", m)
for (i in 1:m){
x <- df[,i]
varType[i] <- class(x)
xtab <- table(x, useNA = "ifany")
levels[i] <- length(xtab)
nums <- as.numeric(xtab)
maxnum <- max(nums)
topCount[i] <- maxnum
maxIndex <- which.max(nums)
lvls <- names(xtab)
topLevel[i] <- lvls[maxIndex]
missIndex <- which((is.na(x)) | (x == "") | (x == " "))
missCount[i] <- length(missIndex)
}
n <- nrow(df)
topFrac <- round(topCount/n, digits = dgts)
missFrac <- round(missCount/n, digits = dgts)
## #
summaryFrame <- data.frame(variable = varNames, type = varType,
levels = levels, topLevel = topLevel,
topCount = topCount, topFrac = topFrac,
missFreq = missCount, missFrac = missFrac)
return(summaryFrame)
}
BasicSummary(df)
## variable type levels topLevel topCount topFrac missFreq
## 1 ï..ID integer 1000 11000 1 0.001 0
## 2 Marital.Status character 3 Married 535 0.535 7
## 3 Gender character 3 Male 500 0.500 11
## 4 Income integer 17 60000 165 0.165 6
## 5 Children integer 7 0 274 0.274 8
## 6 Education character 5 Bachelors 306 0.306 0
## 7 Occupation character 5 Professional 276 0.276 0
## 8 Home.Owner character 3 Yes 682 0.682 4
## 9 Cars integer 6 2 342 0.342 9
## 10 Commute.Distance character 5 0-1 Miles 366 0.366 0
## 11 Region character 3 North America 508 0.508 0
## 12 Age integer 54 40 40 0.040 8
## 13 Purchased.Bike character 2 No 519 0.519 0
## missFrac
## 1 0.000
## 2 0.007
## 3 0.011
## 4 0.006
## 5 0.008
## 6 0.000
## 7 0.000
## 8 0.004
## 9 0.009
## 10 0.000
## 11 0.000
## 12 0.008
## 13 0.000
EXPLANATION ID : memiliki 1000 unique value, yang berarti seluruh id buyers bike berbeda. Angka 11000 muncul dengan frekuensi sebanyak 1 kali dan persentase sebesar 0.001. Tipe data integer dan tidak ada missing value.
Marital Status: memiliki 3 unique value. “Married” paling sering muncul, dengan frekuensi sebanyak 535 kali dan persentase sebesar 0.535. Tipe data character dan terdapat 7 missing value dengan persentase 0.007.
Gender : memiliki 3 unique value. “Male” paling sering muncul, dengan frekuensi sebanyak 500 kali dan persentase sebesar 0.500. Tipe data character dan terdapat 11 missing value dengan persentase 0.011.
Income : memiliki 17 unique value. Angka 60000 paling sering muncul, dengan frekuensi sebanyak 165 kali dan persentase sebesar 0.165. Tipe data integer dan terdapat 6 missing value dengan persentase 0.006.
Children : memiliki 7 unique value. Angka 0 paling sering muncul, dengan frekuensi sebanyak 274 kali dan persentase sebesar 0.274. Tipe data integer dan terdapat 8 missing value dengan persentase 0.008.
Education : memiliki 5 unique value. “Bachelors” paling sering muncul, dengan frekuensi sebanyak 306 kali dan persentase sebesar 0.306. Tipe data character dan tidak memiliki missing value.
Occupation : memiliki 5 unique value. “Professional” paling sering muncul, dengan frekuensi sebanyak 276 kali dan persentase sebesar 0.276. Tipe data character dan tidak memiliki missing value.
Home.Owner : memiliki 3 unique value. “Yes” paling sering muncul, dengan frekuensi sebanyak 682 kali dan persentase sebesar 0.682. Tipe data character dan terdapat 4 missing value dengan persentase 0.004.
Cars : memiliki 6 unique value. Angka 2 paling sering muncul, dengan frekuensi sebanyak 342 kali dan persentase sebesar 0.342. Tipe data integer dan terdapat 9 missing value dengan persentase 0.009.
Commute.Distance : memiliki 5 unique value. “0-1 miles” paling sering muncul, dengan frekuensi sebanyak 366 kali dan persentase sebesar 0.366. Tipe data character dan tidak memiliki missing value.
Region : memiliki 3 unique value. “North America” paling sering muncul, dengan frekuensi sebanyak 508 kali dan persentase sebesar 0.508. Tipe data integer dan tidak memiliki missing value.
Age : memiliki 54 unique value. Angka 40 paling sering muncul, dengan frekuensi sebanyak 40 kali dan persentase sebesar 0.040. Tipe data integer dan terdapat 8 missing value dengan persentase 0.008.
Purchased.Bike : memiliki 2 unique value, yaitu yes dan no. “No” paling sering muncul, dengan frekuensi sebanyak 519 kali dan persentase sebesar 0.519. Tipe data character dan tidak memiliki missing value.
# Compute the mean of each column
sapply(df[, c(1,4,5,9,12)], mean, na.rm=TRUE)
## ï..ID Income Children Cars Age
## 19965.992000 56267.605634 1.910282 1.455096 44.181452
# Compute quartiles
sapply(df[, c(1,4,5,9,12)], quantile, na.rm=TRUE)
## ï..ID Income Children Cars Age
## 0% 11000.00 10000 0 0 25
## 25% 15290.75 30000 0 1 35
## 50% 19744.00 60000 2 1 43
## 75% 24470.75 70000 3 2 52
## 100% 29447.00 170000 5 4 89
EXPLANATION: 1. 25% dari data buyer bike, tidak memiliki anak dan sudah memiliki mobil 2. 75% dari income berjumlah dibawah 100000 3. Quantile 1 dari income = 30000, children = 0, cars = 1, age = 35 4. Quantile 2 (Median) dari income = 60000, children = 2, cars = 1, age = 43 5. Quantile 3 dari income = 70000, children = 3, cars = 2, age = 52
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.1.3
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.3
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
describe(df)
## df
##
## 13 Variables 1000 Observations
## --------------------------------------------------------------------------------
## ï..ID
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 19966 6176 11781 12627
## .25 .50 .75 .90 .95
## 15291 19744 24471 27544 28413
##
## lowest : 11000 11047 11061 11090 11116, highest: 29337 29355 29380 29424 29447
## --------------------------------------------------------------------------------
## Marital.Status
## n missing distinct
## 993 7 2
##
## Value Married Single
## Frequency 535 458
## Proportion 0.539 0.461
## --------------------------------------------------------------------------------
## Gender
## n missing distinct
## 989 11 2
##
## Value Female Male
## Frequency 489 500
## Proportion 0.494 0.506
## --------------------------------------------------------------------------------
## Income
## n missing distinct Info Mean Gmd .05 .10
## 994 6 16 0.986 56268 34273 10000 20000
## .25 .50 .75 .90 .95
## 30000 60000 70000 100000 120000
##
## lowest : 10000 20000 30000 40000 50000, highest: 120000 130000 150000 160000 170000
##
## Value 10000 20000 30000 40000 50000 60000 70000 80000 90000
## Frequency 73 74 134 153 40 165 123 90 38
## Proportion 0.073 0.074 0.135 0.154 0.040 0.166 0.124 0.091 0.038
##
## Value 100000 110000 120000 130000 150000 160000 170000
## Frequency 29 16 17 32 4 3 3
## Proportion 0.029 0.016 0.017 0.032 0.004 0.003 0.003
## --------------------------------------------------------------------------------
## Children
## n missing distinct Info Mean Gmd
## 992 8 6 0.96 1.91 1.827
##
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##
## Value 0 1 2 3 4 5
## Frequency 274 169 209 133 126 81
## Proportion 0.276 0.170 0.211 0.134 0.127 0.082
## --------------------------------------------------------------------------------
## Education
## n missing distinct
## 1000 0 5
##
## lowest : Bachelors Graduate Degree High School Partial College Partial High School
## highest: Bachelors Graduate Degree High School Partial College Partial High School
##
## Value Bachelors Graduate Degree High School
## Frequency 306 174 179
## Proportion 0.306 0.174 0.179
##
## Value Partial College Partial High School
## Frequency 265 76
## Proportion 0.265 0.076
## --------------------------------------------------------------------------------
## Occupation
## n missing distinct
## 1000 0 5
##
## lowest : Clerical Management Manual Professional Skilled Manual
## highest: Clerical Management Manual Professional Skilled Manual
##
## Value Clerical Management Manual Professional
## Frequency 177 173 119 276
## Proportion 0.177 0.173 0.119 0.276
##
## Value Skilled Manual
## Frequency 255
## Proportion 0.255
## --------------------------------------------------------------------------------
## Home.Owner
## n missing distinct
## 996 4 2
##
## Value No Yes
## Frequency 314 682
## Proportion 0.315 0.685
## --------------------------------------------------------------------------------
## Cars
## n missing distinct Info Mean Gmd
## 991 9 5 0.925 1.455 1.226
##
## lowest : 0 1 2 3 4, highest: 0 1 2 3 4
##
## Value 0 1 2 3 4
## Frequency 238 267 342 85 59
## Proportion 0.240 0.269 0.345 0.086 0.060
## --------------------------------------------------------------------------------
## Commute.Distance
## n missing distinct
## 1000 0 5
##
## lowest : 0-1 Miles 1-2 Miles 10+ Miles 2-5 Miles 5-10 Miles
## highest: 0-1 Miles 1-2 Miles 10+ Miles 2-5 Miles 5-10 Miles
##
## Value 0-1 Miles 1-2 Miles 10+ Miles 2-5 Miles 5-10 Miles
## Frequency 366 169 111 162 192
## Proportion 0.366 0.169 0.111 0.162 0.192
## --------------------------------------------------------------------------------
## Region
## n missing distinct
## 1000 0 3
##
## Value Europe North America Pacific
## Frequency 300 508 192
## Proportion 0.300 0.508 0.192
## --------------------------------------------------------------------------------
## Age
## n missing distinct Info Mean Gmd .05 .10
## 992 8 53 0.999 44.18 12.85 28.00 30.00
## .25 .50 .75 .90 .95
## 35.00 43.00 52.00 60.90 65.45
##
## lowest : 25 26 27 28 29, highest: 73 74 78 80 89
## --------------------------------------------------------------------------------
## Purchased.Bike
## n missing distinct
## 1000 0 2
##
## Value No Yes
## Frequency 519 481
## Proportion 0.519 0.481
## --------------------------------------------------------------------------------
EXPLANATION: 13 Kolom varibel, 1000 baris observations ID : mean 19966, missing value 0, lowest : 11000, highest: 29447
Marital.Status : missing value 7,distinct 2 value, yaitu Married dan Single Frequency Married : 535, Frequency single : 458 Proportion (freq / n) Married : 0.539, Single : 0.461
Gender : 11 missing value, distinct 2 value, yaitu Female dan Male Frequency Female : 489, Male : 500 Proportion Female : 0.494, Male : 0.506
Income : 6 missing value, lowest : 10000, highest: 170000, mean : 56268
Children : 8 missing value, mean 1.91, lowest : 0 , highest: 5
Education : distinct 5 value, lowest : Bachelors, highest: Partial High School
Occupation: 0 missing value, distinct 5 value, lowest : Clerical, highest: Skilled Manual
Home.Owner : 4 missing value, distinct value 2, Value No Yes Frequency 314 682 Proportion 0.315 0.685
Region : distinct value 3 Value Europe North America Pacific Frequency 300 508 192 Proportion 0.300 0.508 0.192
Cars : mean : 1.455, lowest = 0, highest = 5, missing value 9, distinct value = 5
Commute.Distance : lowest = 0-1 miles, highes = 5 - 10 miles
Age : mean : 44.18
Purchased bike : Value No Yes Frequency 519 481 Proportion 0.519 0.481
an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data
library(car)
## Warning: package 'car' was built under R version 4.1.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.1.3
qqPlot(df$Income)
## [1] 13 44
EXPLANATION Dari qq-plot df$income terlihat bahwa data tidak distribusi normal.
library(car)
qqPlot(df$Children)
## [1] 3 13
EXPLANATION Dari qq-plot df$income terlihat bahwa data tidak distribusi normal.
library(car)
qqPlot(df$Age)
## [1] 376 402
EXPLANATION Dari qq-plot df$Age terlihat bahwa data mendekati distribusi normal.
out <- boxplot.stats(df$Income)$out
boxplot(df$Income,
ylab = "",
main = "Income"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
EXPLANATION Dari visualisasi boxplot di atas, terlihat
bahwa terdapat beberapa outliers dalam variabel Income.
out <- boxplot.stats(df$Children)$out
boxplot(df$Children,
ylab = "",
main = "Children"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
EXPLANATION Dari visualisasi boxplot di atas, terlihat
bahwa variabel Children tidak memiliki outliers.
out <- boxplot.stats(df$Cars)$out
boxplot(df$Cars,
ylab = "",
main = "Cars"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
EXPLANATION Dari visualisasi boxplot di atas, terlihat
bahwa variabel Cars memiliki outliers.
out <- boxplot.stats(df$Age)$out
boxplot(df$Age,
ylab = "",
main = "Age"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
EXPLANATION Dari visualisasi boxplot di atas, terlihat bahwa variabel Age memiliki outliers.
ThreeSigma <- function(x, t = 3){
mu <- mean(x, na.rm = TRUE)
sig <- sd(x, na.rm = TRUE)
if (sig == 0){
message("All non-missing x-values are identical")
}
up <- mu + t * sig
down <- mu - t * sig
out <- list(up = up, down = down)
return(out)
}
Hampel <- function(x, t = 3){
mu <- median(x, na.rm = TRUE)
sig <- mad(x, na.rm = TRUE)
if (sig == 0){
message("Hampel identifer implosion: MAD scale estimate is zero")
}
up <- mu + t * sig
down <- mu - t * sig
out <- list(up = up, down = down)
return(out)
}
BoxplotRule<- function(x, t = 1.5){
xL <- quantile(x, na.rm = TRUE, probs = 0.25, names = FALSE)
xU <- quantile(x, na.rm = TRUE, probs = 0.75, names = FALSE)
Q <- xU - xL
if (Q == 0){
message("Boxplot rule implosion: interquartile distance is zero")
}
up <- xU + t * Q
down <- xU - t * Q
out <- list(up = up, down = down)
return(out)
}
ExtractDetails <- function(x, down, up){
outClass <- rep("N", length(x))
indexLo <- which(x < down)
indexHi <- which(x > up)
outClass[indexLo] <- "L"
outClass[indexHi] <- "U"
index <- union(indexLo, indexHi)
values <- x[index]
outClass <- outClass[index]
nOut <- length(index)
maxNom <- max(x[which(x <= up)])
minNom <- min(x[which(x >= down)])
outList <- list(nOut = nOut, lowLim = down,
upLim = up, minNom = minNom,
maxNom = maxNom, index = index,
values = values,
outClass = outClass)
return(outList)
}
FindOutliers <- function(x, t3 = 3, tH = 3, tb = 1.5){
threeLims <- ThreeSigma(x, t = t3)
HampLims <- Hampel(x, t = tH)
boxLims <- BoxplotRule(x, t = tb)
n <- length(x)
nMiss <- length(which(is.na(x)))
threeList <- ExtractDetails(x, threeLims$down, threeLims$up)
HampList <- ExtractDetails(x, HampLims$down, HampLims$up)
boxList <- ExtractDetails(x, boxLims$down, boxLims$up)
sumFrame <- data.frame(method = "ThreeSigma", n = n,
nMiss = nMiss, nOut = threeList$nOut,
lowLim = threeList$lowLim,
upLim = threeList$upLim,
minNom = threeList$minNom,
maxNom = threeList$maxNom)
upFrame <- data.frame(method = "Hampel", n = n,
nMiss = nMiss, nOut = HampList$nOut,
lowLim = HampList$lowLim,
upLim = HampList$upLim,
minNom = HampList$minNom,
maxNom = HampList$maxNom)
sumFrame <- rbind.data.frame(sumFrame, upFrame)
upFrame <- data.frame(method = "BoxplotRule", n = n,
nMiss = nMiss, nOut = boxList$nOut,
lowLim = boxList$lowLim,
upLim = boxList$upLim,
minNom = boxList$minNom,
maxNom = boxList$maxNom)
sumFrame <- rbind.data.frame(sumFrame, upFrame)
threeFrame <- data.frame(index = threeList$index,
values = threeList$values,
type = threeList$outClass)
HampFrame <- data.frame(index = HampList$index,
values = HampList$values,
type = HampList$outClass)
boxFrame <- data.frame(index = boxList$index,
values = boxList$values,
type = boxList$outClass)
outList <- list(summary = sumFrame, threeSigma = threeFrame,
Hampel = HampFrame, boxplotRule = boxFrame)
return(outList)
}
fullSummary <- FindOutliers(df$Income)
fullSummary$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1000 6 10 -36935.85 149471.1 10000 130000
## 2 Hampel 1000 6 10 -28956.00 148956.0 10000 130000
## 3 BoxplotRule 1000 6 10 10000.00 130000.0 10000 130000
EXPLANATION Ketiga metode yang digunakan untuk mendeteksi outliers memberikan hasil yang sama, yaitu 10. Dapat disimpulkan bahwa outliers dari df$Income adalah sebanyak 10 buah.
fullSummary <- FindOutliers(df$Children)
fullSummary$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1000 8 0 -2.970448 6.791013 0 5
## 2 Hampel 1000 8 0 -2.447800 6.447800 0 5
## 3 BoxplotRule 1000 8 0 -1.500000 7.500000 0 5
EXPLANATION Ketiga metode yang digunakan untuk mendeteksi outliers memberikan hasil yang sama, yaitu 0. Dapat disimpulkan bahwa df$Children tidak memiliki outliers.
fullSummary <- FindOutliers(df$Cars)
fullSummary$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1000 9 0 -1.91017 4.820362 0 4
## 2 Hampel 1000 9 0 -3.44780 5.447800 0 4
## 3 BoxplotRule 1000 9 297 0.50000 3.500000 1 3
EXPLANATION Metode ThreeSigma dan Hampel yang digunakan untuk mendeteksi outliers memberikan hasil yang sama, yaitu 0. Sedangkan, metode Boxplotrule mendeteksi 297 outliers. Oleh karena itu, dapat disimpulkan bahwa df$Cars tidak memiliki outliers, karena 2 dari 3 metode mendeteksi 0 outliers.
fullSummary <- FindOutliers(df$Age)
fullSummary$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1000 8 2 10.09543 78.26747 25 78
## 2 Hampel 1000 8 2 7.41760 78.58240 25 78
## 3 BoxplotRule 1000 8 25 26.50000 77.50000 27 74
EXPLANATION Metode ThreeSigma dan Hampel yang digunakan untuk mendeteksi outliers memberikan hasil yang sama, yaitu 2. Sedangkan, metode Boxplotrule mendeteksi 25 outliers. Oleh karena itu, dapat disimpulkan bahwa df$Age memiliki 2 outliers, karena 2 dari 3 metode mendeteksi 2 outliers.
count <- table(df$Cars, df$Purchased.Bike)
count
##
## No Yes
## 0 91 147
## 1 115 152
## 2 218 124
## 3 52 33
## 4 38 21
semakin sedikit jumlah mobil, semakin tinggi persentase yang beli sepeda
count <- table(df$Income, df$Purchased.Bike)
count
##
## No Yes
## 10000 45 28
## 20000 43 31
## 30000 81 53
## 40000 64 89
## 50000 20 20
## 60000 84 81
## 70000 58 65
## 80000 56 34
## 90000 14 24
## 100000 18 11
## 110000 8 8
## 120000 8 9
## 130000 17 15
## 150000 1 3
## 160000 0 3
## 170000 2 1
semakin tinggi income, semakin tinggi persentase yang beli sepeda
count <- table(df$Children, df$Purchased.Bike)
count
##
## No Yes
## 0 135 139
## 1 72 97
## 2 112 97
## 3 61 72
## 4 72 54
## 5 63 18
semakin sedikit anaknya, semakin tinggi persentase yang beli sepeda
# Create the layout
nf <- layout(matrix(c(1,1,2,3), nrow=2, byrow=TRUE))
# Fill with plots
mosaicplot(Income ~ Purchased.Bike, data = df, main = "", las = 1, shade = TRUE)
# Scatterplot
plot(df$Cars, df$Income)
#Boxplot
boxplot(Children ~ Purchased.Bike, data= df, xlab = "Children", ylab ="Purchased.Bike")
matrix(c(1,1,2,3), nrow=2) creates a matrix of 2 rows and 2 columns.
First 2 panels will be for the first chart, the third for chart2 and the
last for chart 3.
Mosaic plots describe the relationship between two categorical variables. Essentially, these plots are graphical representations of contingency tables that tell us how many times the values of two categorical variables occur together in a dataset.
EXPLANATION: 1. semakin tinggi income, semakin tinggi persentase yang beli sepeda 2. semakin sedikit jumlah mobil, semakin tinggi persentase yang beli sepeda 3. semakin sedikit anaknya, semakin tinggi persentase yang beli sepeda