Previously on STAT 412:
Data cleaning is crucial since it helps make sure that the information we’re working with is accurate and reliable. When we clean data, we fix mistakes and make sure everything is consistent and complete. This means we can trust the data more when we’re making decisions or analyzing it. It also saves time because we don’t have to deal with confusing or incorrect information.
What are the steps for cleaning?
REMINDER: The main thing in data cleaning is finding problems first before you start cleaning. This is called inspection. Basically, you need to understand what’s wrong with the data before fixing it.
To tidy the data, we prefer to use the package “stringr”. The “stringr” package is used to work with text data in the R programming language. It helps to do different things with text, like cutting it into pieces, joining it together, searching for specific parts, and changing how it looks. This package is very handy for tasks like cleaning up messy text or analyzing text-based data.
str_to_upper(): converts to upper case.
str_to_lower(): converts to lower case.
str_to_title(): converts to title case, where only the first letter of each word is capitalized.
str_to_sentence(): converts to sentence case, where only the first letter of sentence is capitalized.
## [1] "Shine bright like a diamond"
## [1] "SHINE BRIGHT LIKE A DIAMOND"
## [1] "shine bright like a diamond"
## [1] "Shine Bright Like A Diamond"
## [1] "Shine bright like a diamond"
str_c(): combines multiple character vectors into a single character vector.
## [1] "shine bright like a diamond"
str_detect(): returns a logical vector with TRUE for each element of string that matches pattern and FALSE otherwise.
## [1] "hello" "darkness" "my" "old" "friend"
## [1] TRUE TRUE FALSE FALSE TRUE
## [1] TRUE TRUE FALSE TRUE TRUE
str_count(): counts the number of times pattern is found within each element of string.
## [1] "hello" "darkness" "my" "old" "friend"
## [1] 5 8 2 3 6
## [1] 1 1 0 0 1
## [1] 1 2 1 0 1
str_dup(): duplicates the characters within a string.
## [1] "nothing" "else" "matters"
## [1] "nothingnothing" "elseelse" "mattersmatters"
## [1] "nothing" "elseelse" "mattersmattersmatters"
str_sub(): extracts or replaces the elements at a single position in each string.
## [1] " alive"
str_subset(): returns all elements of string where there’s at least one match to pattern.
## [1] "I" "am" "feeling" "good"
## [1] "good"
str_locate(): returns the start and end position of the first match.
str_locate_all(): returns the start and end position of each match.
## [1] "Dream" "of" "Californication"
## start end
## [1,] 4 4
## [2,] NA NA
## [3,] 2 2
## [[1]]
## start end
## [1,] 4 4
##
## [[2]]
## start end
##
## [[3]]
## start end
## [1,] 2 2
## [2,] 11 11
str_replace(): replaces the first match.
str_replace_all(): replaces all matches.
## [1] "fly" "me" "to" "the" "moon"
## [1] "fly" "me" "tO" "the" "mOon"
## [1] "fly" "me" "tO" "the" "mOOn"
str_remove(): remove matches.
str_remove_all(): remove all matches.
## [1] "it" "is" "like" "you" "are" "my" "mirror"
## [1] "it" "is" "like" "you" "ae" "my" "miror"
## [1] "it" "is" "like" "you" "ae" "my" "mio"
str_trim(): removes whitespace from start and end of string.
str_squish() removes whitespace at the start and end, and replaces all internal whitespace with a single space.
## [1] " I am an English man in New York"
## [1] "I am an English man in New York"
## [1] "I am an English man in New York"
str_pad(): pad a string to a fixed width.
## [1] "I am an English man in New York"
## [1] " I am an English man in New York "
str_wrap: wrap words into paragraph, minimizing the “raggedness” of the lines (i.e. the variation in length line) using the Knuth-Plass algorithm.
## [1] "It's a new dawn It's a new day It's a new life For me And I'm feeling good"
## It's a new dawn
## It's a new day
## It's a new life
## For me And I'm
## feeling good
House Prices in the City of Windsor, Canada Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
Format: A data frame containing 546 observations on 12 variables.
Use the read.csv command to import the data into R. Since it’s a CSV file, this command is specifically designed for reading CSV files.
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Details: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987. | |||||||||||
| A data frame containing 546 observations on 12 variables. | |||||||||||
| QQ.sale.price.of.House | LOTSIZE | BEdrooms | bathROOMS | stories\ | driveway_1:yes2:no | ReCreatioNaL | fuLLBase | gas heat | AIRCOND | garage?? | PReferR |
| 42000 | 5850 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no |
| 38500 | 4000 | two | 1 | 1 | 1 | no | n o | no | no | 0 | no |
| 49500 | 3060 | 3 | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 60500 | 6650 | 3 | 1 | 2 | yes | yes | n o | no | no | 0 | no |
| 61000 | 6360 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 66000 | 4160 | 3 | 1 | 1 | yyy | yes | yes | no | yes | 0 | no |
| 66000 | 3880 | 3 | 2 | 2 | y | no | yes | no | no | 2 | no |
| 69000 | 4160 | 3 | 1 | 3 | yes | no | n o | no | no | 0 | no |
| 83800 | 4800 | 3 | 1 | 1 | yes | yes | yes | no | no | 0 | no |
| 88500$ | 5500 | 3 | 2 | 4 | yes | yes | n o | no | yes | 1 | no |
| 90000 | 7200 | 3 | 2 | 1 | yes | no | yes | no | yes | 3 | no |
| 30500 | 3000 | two | 1 | 1 | not | no | n o | no | no | 0 | no |
| 27000 | 1700 | 3 | 1 | 2 | yes | no | n o | no | no | 0 | no |
| 36000 | 2880 | 3 | 1 | 1 | 2 | no | n o | no | no | 0 | no |
| 37000 | 3600 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 37900 | 3185 | two | 1 | 1 | yes | no | n o | no | yes | 0 | no |
| 40500 | 3300 | 3 | 1 | 2 | not | no | n o | no | no | 1 | no |
| 40750 | 5200 | 4 | 1 | 3 | y | no | n o | no | no | 0 | no |
| 45000 | 3450 | one | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 45000 | 3986 | two | 2 | 1 | not | yes | yes | no | no | 1 | no |
| 48500 | 4785 | 3 | 1 | 2 | yes | yes | yes | no | yes | 1 | no |
| 65900 | 4510 | 4 | 2 | 2 | 1 | no | yes | no | no | 0 | no |
| 37900 | 4000 | 3 | 1 | 2 | yes | no | n o | no | yes | 0 | no |
| 38000 | 3934 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 42000 | 4960 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 42300 | 3000 | two | 1 | 2 | yes | no | n o | no | no | 0 | no |
| 43500 | 3800 | two | 1 | 1 | 1 | no | n o | no | no | 0 | no |
| 44000 | 4960 | two | 1 | 1 | yes | no | yes | no | yes | 0 | no |
| 44500 | 3000 | 3 | 1 | 1 | 2 | no | n o | no | yes | 0 | no |
| 44900 | 4500 | 3 | 1 | 2 | yes | no | n o | no | yes | 0 | no |
| 45000 | 3500 | two | 1 | 1 | not | no | yes | no | no | 0 | no |
| 48000 | 3500 | 4 | 1 | 2 | yes | no | n o | no | yes | 2 | no |
| 49000 | 4000 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 51500 | 4500 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 61000 | 6360 | two | 1 | 2 | yes | no | n o | no | no | 0 | no |
| 61000 | 4500 | two | 1 | 1 | yes | no | n o | no | yes | 2 | no |
| -61700 | 4032 | two | 1 | 1 | yyy | no | yes | no | no | 0 | no |
| 67000 | 5170 | 3 | 1 | 4 | yes | no | n o | no | yes | 0 | no |
| 82000 | 5400 | 4 | 2 | 2 | yes | no | n o | no | yes | 2 | no |
| 54500 | 3150 | two | 2 | 1 | not | no | yes | no | no | 0 | no |
| 66500$ | 3745 | 3 | 1 | 2 | yes | no | yes | no | no | 0 | no |
| 70000 | 4520 | 3 | 1 | 2 | 1 | no | yes | no | yes | 0 | no |
| 82000 | 4640 | 4 | 1 | 2 | yes | no | n o | no | no | 1 | no |
| 92000 | 8580 | 5 | 3 | 2 | yes | no | n o | no | no | 2 | no |
| 38000 | 2000 | two | 1 | 2 | yes | no | n o | no | no | 0 | no |
| 44000 | 2160 | 3 | 1 | 2 | not | no | yes | no | no | 0 | no |
Examine the data set seen above and specify the problems.
The initial problem often arises with the first three rows. Therefore, we prefer to skip these rows when reading the data into R.
| QQ.sale.price.of.House | LOTSIZE | BEdrooms | bathROOMS | stories.. | driveway_1.yes2.no | ReCreatioNaL | fuLLBase | gas……….heat | AIRCOND | garage.. | PReferR |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 42000 | 5850 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no |
| 38500 | 4000 | two | 1 | 1 | 1 | no | n o | no | no | 0 | no |
| 49500 | 3060 | 3 | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 60500 | 6650 | 3 | 1 | 2 | yes | yes | n o | no | no | 0 | no |
| 61000 | 6360 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 66000 | 4160 | 3 | 1 | 1 | yyy | yes | yes | no | yes | 0 | no |
| 66000 | 3880 | 3 | 2 | 2 | y | no | yes | no | no | 2 | no |
| 69000 | 4160 | 3 | 1 | 3 | yes | no | n o | no | no | 0 | no |
| 83800 | 4800 | 3 | 1 | 1 | yes | yes | yes | no | no | 0 | no |
| 88500$ | 5500 | 3 | 2 | 4 | yes | yes | n o | no | yes | 1 | no |
It is good to check whether there is a missing value in our data set.
## [1] 0
Hopefully, the data is fully observed.
There are issues about the column names. Let’s proceed with fixing them.
## [1] "QQ.sale.price.of.House" "LOTSIZE" "BEdrooms"
## [4] "bathROOMS" "stories.." "driveway_1.yes2.no"
## [7] "ReCreatioNaL" "fuLLBase" "gas..........heat"
## [10] "AIRCOND" "garage.." "PReferR"
One common and initial solution is to convert the column names to lowercase.
## [1] "qq.sale.price.of.house" "lotsize" "bedrooms"
## [4] "bathrooms" "stories.." "driveway_1.yes2.no"
## [7] "recreational" "fullbase" "gas..........heat"
## [10] "aircond" "garage.." "preferr"
Seems better…
Then, delete the unnecessary dots in the column names.
## [1] "qqsalepriceofhouse" "lotsize" "bedrooms"
## [4] "bathrooms" "stories" "driveway_1yes2no"
## [7] "recreational" "fullbase" "gasheat"
## [10] "aircond" "garage" "preferr"
For the last variable, it is nice to use str_sub function to obtain “prefer” instead of “preferr”. Additionally, the first and the sixth variable names seem problematic. Let’s fix them.
## [1] "houseprice" "lotsize" "bedrooms" "bathrooms" "stories"
## [6] "driveway" "recreational" "fullbase" "gasheat" "aircond"
## [11] "garage" "prefer"
| houseprice | lotsize | bedrooms | bathrooms | stories | driveway | recreational | fullbase | gasheat | aircond | garage | prefer |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 42000 | 5850 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no |
| 38500 | 4000 | two | 1 | 1 | 1 | no | n o | no | no | 0 | no |
| 49500 | 3060 | 3 | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 60500 | 6650 | 3 | 1 | 2 | yes | yes | n o | no | no | 0 | no |
| 61000 | 6360 | two | 1 | 1 | yes | no | n o | no | no | 0 | no |
| 66000 | 4160 | 3 | 1 | 1 | yyy | yes | yes | no | yes | 0 | no |
We have resolved the issues with the variable names. What about the observations?
Examine the class of variables (houseprice, bedrooms etc.)
## houseprice lotsize bedrooms bathrooms
## Length:546 Min. : 1650 Length:546 Min. :1.000
## Class :character 1st Qu.: 3600 Class :character 1st Qu.:1.000
## Mode :character Median : 4600 Mode :character Median :1.000
## Mean : 5150 Mean :1.286
## 3rd Qu.: 6360 3rd Qu.:2.000
## Max. :16200 Max. :4.000
## stories driveway recreational fullbase
## Min. :1.000 Length:546 Length:546 Length:546
## 1st Qu.:1.000 Class :character Class :character Class :character
## Median :2.000 Mode :character Mode :character Mode :character
## Mean :1.808
## 3rd Qu.:2.000
## Max. :4.000
## gasheat aircond garage prefer
## Length:546 Length:546 Min. :0.0000 Length:546
## Class :character Class :character 1st Qu.:0.0000 Class :character
## Mode :character Mode :character Median :0.0000 Mode :character
## Mean :0.6923
## 3rd Qu.:1.0000
## Max. :3.0000
## 'data.frame': 546 obs. of 12 variables:
## $ houseprice : chr "42000" "38500" "49500" "60500" ...
## $ lotsize : int 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms : chr "3" "two" "3" "3" ...
## $ bathrooms : int 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : int 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway : chr "yes" "1" "yes" "yes" ...
## $ recreational: chr "no" "no" "no" "yes" ...
## $ fullbase : chr "yes" "n o" "n o" "n o" ...
## $ gasheat : chr "no" "no" "no" "no" ...
## $ aircond : chr "no" "no" "no" "no" ...
## $ garage : int 1 0 0 0 0 0 2 0 0 1 ...
## $ prefer : chr "no" "no" "no" "no" ...
Can the price be negative? Also, why is it defined as a character?
##
## -103000 -61700 100000 100500 101000 102000 103000 103500 104900 105000
## 1 1 1 1 2 1 1 1 1 4
## 106000 106500 107000 107500 108000 110000 112000 112500 113000 113750
## 3 1 1 1 2 2 1 1 1 1
## 114000 114900 115442 116000 117000 118500 120000 120900 122000 122500
## 1 1 1 1 1 1 5 1 1 1
## 123500 124000 125000 126500 127000 128000 130000 132000 133000 138300
## 2 1 1 1 1 1 2 2 1 1
## 140000 141000 145000 155000 163000 174500 175000 190000 25000 25245
## 2 1 2 1 1 1 2 1 3 1
## 26000 26500 27000 28000 30000 30500 31900 32000 32500 33000
## 1 1 2 1 3 1 1 1 3 1
## 33500 34000 34400 35000 35000$ 35500 36000 37000 37200 37900
## 1 3 1 5 1 2 3 3 1 2
## 38000 38500 39000 40000 40500 40750 41000 42000 42300 42500
## 7 1 2 2 3 1 4 8 1 1
## 42900 43000 43500 44000 44100 44500 44500$ 44555 44700 44900
## 1 7 1 4 1 2 1 1 1 1
## 45000 46000 46200 46500 47000 47000$ 47500 47600 47900 48000
## 9 4 1 2 7 1 2 1 1 8
## 48500 48900 49000 49500 49900 50000 50500 51000 51000$ 51500
## 3 1 6 3 1 17 1 3 1 2
## 51900 52000 52500 52900 53000 53500 53900 54000 54500 54800
## 1 9 4 2 5 1 3 6 1 1
## 55000 55500 56000 57000 57250 57500 58000 58500 58550 58900
## 7 2 7 5 2 3 5 3 1 1
## 59000 59500 59900 59900$ 60000 60500 61000 61100 61500 62000
## 2 3 1 1 17 2 6 1 2 5
## 62500 62600 62900 63000 63500 63900 64000 64500 64900 65000
## 1 1 3 2 1 3 5 4 2 7
## 65500 65900 66000 66500$ 67000 67900 68000 68100 68500 69000
## 2 1 5 1 6 1 3 1 2 4
## 69500 69900 70000 70100 70500 70800 71000 71500 71900 72000
## 1 2 12 1 1 1 2 1 1 4
## 72500 73000 73500 74500 74700 74900 75000 75000$ 75500 76000
## 1 4 2 3 1 1 8 1 1 1
## 76900 77000 77500 78000 78500 78900 79000 79500 80000 80750
## 1 1 1 4 2 1 3 2 9 1
## 82000 82500 82900 83000 83800 83900 84000 84900 85000 86000
## 6 1 1 3 1 2 2 1 8 3
## 86900 87000 87250 87500 88000 88500 88500$ 89000 89500 89900
## 2 3 1 1 2 2 1 2 1 1
## 90000 91500 91700 92000 92500 93000 94000 94500 94700 95000
## 5 1 1 2 2 3 1 2 1 6
## 95500 96000 96500 97000 98000 98500 99000
## 1 1 1 2 1 1 2
price[price$houseprice ==-61700,]$houseprice=61700
price[price$houseprice ==-103000,]$houseprice=103000The gsub(): substitutes the string or the characters in a vector or a data frame with a specific string.
Why is “bedroom” defined as a character although it represents numerical observations.
##
## 3 4 5 6 one two
## 301 95 10 2 2 136
What about driveway? Does it include 10 levels?
##
## 1 11 2 22 222 no not y yes yyy
## 13 1 12 1 1 50 13 2 449 4
price[price$driveway ==11,]$driveway=1
price[price$driveway ==22,]$driveway=2
price[price$driveway ==222,]$driveway=2
price[price$driveway =="no",]$driveway=2
price[price$driveway =="not",]$driveway=2
price[price$driveway =="yes",]$driveway=1
price[price$driveway =="yyy",]$driveway=1
price[price$driveway =="y",]$driveway=1Let’s have a look at the variable fullbase. It has white-space problem in one of the levels which is “no”.
##
## n o yes
## 355 191
##
## no yes
## 355 191
We have fixed certain errors but it is good to consider class of each variable.
## 'data.frame': 546 obs. of 12 variables:
## $ houseprice : chr "42000" "38500" "49500" "60500" ...
## $ lotsize : int 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms : chr "3" "2" "3" "3" ...
## $ bathrooms : int 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : int 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway : chr "1" "1" "1" "1" ...
## $ recreational: chr "no" "no" "no" "yes" ...
## $ fullbase : chr "yes" "no" "no" "no" ...
## $ gasheat : chr "no" "no" "no" "no" ...
## $ aircond : chr "no" "no" "no" "no" ...
## $ garage : int 1 0 0 0 0 0 2 0 0 1 ...
## $ prefer : chr "no" "no" "no" "no" ...
price$houseprice=as.integer(price$houseprice)
price$bedrooms=as.integer(price$bedrooms)
price$driveway=as.factor(price$driveway)
price$recreational=as.factor(price$recreational)
price$fullbase=as.factor(price$fullbase)
price$gasheat=as.factor(price$gasheat)
price$aircond=as.factor(price$aircond)
price$prefer=as.factor(price$prefer)It seems there are still some problems about the observations of air condition. Now, we have to fix it.
## 'data.frame': 546 obs. of 12 variables:
## $ houseprice : int 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
## $ lotsize : int 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms : int 3 2 3 3 2 3 3 3 3 3 ...
## $ bathrooms : int 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : int 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ recreational: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
## $ fullbase : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
## $ gasheat : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ aircond : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
## $ garage : int 1 0 0 0 0 0 2 0 0 1 ...
## $ prefer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
OK, the data seems reasonable! Now, we can move on analysis steps.
| houseprice | lotsize | bedrooms | bathrooms | stories | driveway | recreational | fullbase | gasheat | aircond | garage | prefer |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 42000 | 5850 | 3 | 1 | 2 | 1 | no | yes | no | no | 1 | no |
| 38500 | 4000 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 49500 | 3060 | 3 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 60500 | 6650 | 3 | 1 | 2 | 1 | yes | no | no | no | 0 | no |
| 61000 | 6360 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 66000 | 4160 | 3 | 1 | 1 | 1 | yes | yes | no | yes | 0 | no |
| 66000 | 3880 | 3 | 2 | 2 | 1 | no | yes | no | no | 2 | no |
| 69000 | 4160 | 3 | 1 | 3 | 1 | no | no | no | no | 0 | no |
| 83800 | 4800 | 3 | 1 | 1 | 1 | yes | yes | no | no | 0 | no |
| 88500 | 5500 | 3 | 2 | 4 | 1 | yes | no | no | yes | 1 | no |
| 90000 | 7200 | 3 | 2 | 1 | 1 | no | yes | no | yes | 3 | no |
| 30500 | 3000 | 2 | 1 | 1 | 2 | no | no | no | no | 0 | no |
| 27000 | 1700 | 3 | 1 | 2 | 1 | no | no | no | no | 0 | no |
| 36000 | 2880 | 3 | 1 | 1 | 2 | no | no | no | no | 0 | no |
| 37000 | 3600 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 37900 | 3185 | 2 | 1 | 1 | 1 | no | no | no | yes | 0 | no |
| 40500 | 3300 | 3 | 1 | 2 | 2 | no | no | no | no | 1 | no |
| 40750 | 5200 | 4 | 1 | 3 | 1 | no | no | no | no | 0 | no |
| 45000 | 3450 | 1 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 45000 | 3986 | 2 | 2 | 1 | 2 | yes | yes | no | no | 1 | no |
| 48500 | 4785 | 3 | 1 | 2 | 1 | yes | yes | no | yes | 1 | no |
| 65900 | 4510 | 4 | 2 | 2 | 1 | no | yes | no | no | 0 | no |
| 37900 | 4000 | 3 | 1 | 2 | 1 | no | no | no | yes | 0 | no |
| 38000 | 3934 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 42000 | 4960 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 42300 | 3000 | 2 | 1 | 2 | 1 | no | no | no | no | 0 | no |
| 43500 | 3800 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 44000 | 4960 | 2 | 1 | 1 | 1 | no | yes | no | yes | 0 | no |
| 44500 | 3000 | 3 | 1 | 1 | 2 | no | no | no | yes | 0 | no |
| 44900 | 4500 | 3 | 1 | 2 | 1 | no | no | no | yes | 0 | no |
| 45000 | 3500 | 2 | 1 | 1 | 2 | no | yes | no | no | 0 | no |
| 48000 | 3500 | 4 | 1 | 2 | 1 | no | no | no | yes | 2 | no |
| 49000 | 4000 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 51500 | 4500 | 2 | 1 | 1 | 1 | no | no | no | no | 0 | no |
| 61000 | 6360 | 2 | 1 | 2 | 1 | no | no | no | no | 0 | no |
| 61000 | 4500 | 2 | 1 | 1 | 1 | no | no | no | yes | 2 | no |
| 61700 | 4032 | 2 | 1 | 1 | 1 | no | yes | no | no | 0 | no |
| 67000 | 5170 | 3 | 1 | 4 | 1 | no | no | no | yes | 0 | no |
| 82000 | 5400 | 4 | 2 | 2 | 1 | no | no | no | yes | 2 | no |
| 54500 | 3150 | 2 | 2 | 1 | 2 | no | yes | no | no | 0 | no |
| 66500 | 3745 | 3 | 1 | 2 | 1 | no | yes | no | no | 0 | no |
| 70000 | 4520 | 3 | 1 | 2 | 1 | no | yes | no | yes | 0 | no |
| 82000 | 4640 | 4 | 1 | 2 | 1 | no | no | no | no | 1 | no |
| 92000 | 8580 | 5 | 3 | 2 | 1 | no | no | no | no | 2 | no |
| 38000 | 2000 | 2 | 1 | 2 | 1 | no | no | no | no | 0 | no |
| 44000 | 2160 | 3 | 1 | 2 | 2 | no | yes | no | no | 0 | no |
| 41000 | 3040 | 2 | 1 | 1 | 2 | no | no | no | no | 0 | no |
| 43000 | 3090 | 3 | 1 | 2 | 2 | no | no | no | no | 0 | no |
| 48000 | 4960 | 4 | 1 | 3 | 2 | no | no | no | no | 0 | no |
| 54800 | 3350 | 3 | 1 | 2 | 1 | no | no | no | no | 0 | no |