Previously on STAT 412:

  • Descriptive Statistics
  • Visualization Techniques
    • Lattice-plots
    • GG-plots
  • Data Manipulation via “dplyr”

Introduction to data cleaning

Data cleaning is crucial since it helps make sure that the information we’re working with is accurate and reliable. When we clean data, we fix mistakes and make sure everything is consistent and complete. This means we can trust the data more when we’re making decisions or analyzing it. It also saves time because we don’t have to deal with confusing or incorrect information.

What are the steps for cleaning?

  • Dealing with duplicates
  • Standardizing formats
  • Correcting errors such as typos
  • Handling missing data

REMINDER: The main thing in data cleaning is finding problems first before you start cleaning. This is called inspection. Basically, you need to understand what’s wrong with the data before fixing it.

To tidy the data, we prefer to use the package “stringr”. The “stringr” package is used to work with text data in the R programming language. It helps to do different things with text, like cutting it into pieces, joining it together, searching for specific parts, and changing how it looks. This package is very handy for tasks like cleaning up messy text or analyzing text-based data.

library(stringr)

str_to_upper(): converts to upper case.

str_to_lower(): converts to lower case.

str_to_title(): converts to title case, where only the first letter of each word is capitalized.

str_to_sentence(): converts to sentence case, where only the first letter of sentence is capitalized.

diamond = c("Shine bright like a diamond")
diamond
## [1] "Shine bright like a diamond"
str_to_upper(diamond)
## [1] "SHINE BRIGHT LIKE A DIAMOND"
str_to_lower(diamond)
## [1] "shine bright like a diamond"
str_to_title(diamond)
## [1] "Shine Bright Like A Diamond"
str_to_sentence("shine bright like a diamond")
## [1] "Shine bright like a diamond"

str_c(): combines multiple character vectors into a single character vector.

str_c("shine","bright","like","a","diamond",sep=" ")
## [1] "shine bright like a diamond"

str_detect(): returns a logical vector with TRUE for each element of string that matches pattern and FALSE otherwise.

darkness=c("hello","darkness","my","old","friend")
darkness
## [1] "hello"    "darkness" "my"       "old"      "friend"
str_detect(darkness, "e")
## [1]  TRUE  TRUE FALSE FALSE  TRUE
str_detect(darkness, "[aeiou]")
## [1]  TRUE  TRUE FALSE  TRUE  TRUE

str_count(): counts the number of times pattern is found within each element of string.

darkness=c("hello","darkness","my","old","friend")
darkness
## [1] "hello"    "darkness" "my"       "old"      "friend"
str_count(darkness)
## [1] 5 8 2 3 6
str_count(darkness, c("e"))
## [1] 1 1 0 0 1
str_count(darkness, c("e","s","m","z","d"))
## [1] 1 2 1 0 1

str_dup(): duplicates the characters within a string.

matter = c("nothing", "else", "matters")
matter
## [1] "nothing" "else"    "matters"
str_dup(matter, 2)
## [1] "nothingnothing" "elseelse"       "mattersmatters"
str_dup(matter, 1:3)
## [1] "nothing"               "elseelse"              "mattersmattersmatters"

str_sub(): extracts or replaces the elements at a single position in each string.

alive=c("staying alive")
str_sub(alive,start=8,end = 13) 
## [1] " alive"

str_subset(): returns all elements of string where there’s at least one match to pattern.

feeling_2=c("I","am","feeling","good")
feeling_2
## [1] "I"       "am"      "feeling" "good"
str_subset(feeling_2,"o")
## [1] "good"

str_locate(): returns the start and end position of the first match.

str_locate_all(): returns the start and end position of each match.

dream = c("Dream", "of", "Californication")
dream
## [1] "Dream"           "of"              "Californication"
str_locate(dream, "a")
##      start end
## [1,]     4   4
## [2,]    NA  NA
## [3,]     2   2
str_locate_all(dream, "a")
## [[1]]
##      start end
## [1,]     4   4
## 
## [[2]]
##      start end
## 
## [[3]]
##      start end
## [1,]     2   2
## [2,]    11  11

str_replace(): replaces the first match.

str_replace_all(): replaces all matches.

moon=c("fly","me","to","the","moon")
moon
## [1] "fly"  "me"   "to"   "the"  "moon"
str_replace(moon, "o", "O") 
## [1] "fly"  "me"   "tO"   "the"  "mOon"
str_replace_all(moon, "o", "O") 
## [1] "fly"  "me"   "tO"   "the"  "mOOn"

str_remove(): remove matches.

str_remove_all(): remove all matches.

mirror=c("it","is","like","you","are","my","mirror")
mirror
## [1] "it"     "is"     "like"   "you"    "are"    "my"     "mirror"
str_remove(mirror,"r")
## [1] "it"    "is"    "like"  "you"   "ae"    "my"    "miror"
str_remove_all(mirror,"r")
## [1] "it"   "is"   "like" "you"  "ae"   "my"   "mio"

str_trim(): removes whitespace from start and end of string.

str_squish() removes whitespace at the start and end, and replaces all internal whitespace with a single space.

man=c("    I am an English      man in New York") 
man
## [1] "    I am an English      man in New York"
str_trim(man,side="left")    #side = c("both", "left", "right")
## [1] "I am an English      man in New York"
str_squish(man)
## [1] "I am an English man in New York"

str_pad(): pad a string to a fixed width.

man=c("I am an English man in New York") #the length is 31 and add 4 white spaces.
man
## [1] "I am an English man in New York"
str_pad(man,width=35,side="both")
## [1] "  I am an English man in New York  "

str_wrap: wrap words into paragraph, minimizing the “raggedness” of the lines (i.e. the variation in length line) using the Knuth-Plass algorithm.

feeling=c("It's a new dawn It's a new day It's a new life For me And I'm feeling good")
feeling
## [1] "It's a new dawn It's a new day It's a new life For me And I'm feeling good"
cat(str_wrap(feeling,width = 18))
## It's a new dawn
## It's a new day
## It's a new life
## For me And I'm
## feeling good

Working with the messy data

House Prices in the City of Windsor, Canada Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Format: A data frame containing 546 observations on 12 variables.

  • price: Sale price of a house.
  • lotsize: Lot size of a property in square feet.
  • bedrooms: Number of bedrooms.
  • bathrooms: Number of full bathrooms.
  • stories: Number of stories excluding basement.
  • driveway: Factor. Does the house have a driveway?
  • recreation: Factor. Does the house have a recreational room?
  • fullbase: Factor. Does the house have a full finished basement?
  • gasheat: Factor. Does the house use gas for hot water heating?
  • aircond: Factor. Is there central air conditioning?
  • garage: Number of garage places.
  • prefer: Factor. Is the house located in the preferred neighborhood of the city?

Use the read.csv command to import the data into R. Since it’s a CSV file, this command is specifically designed for reading CSV files.

price=read.csv("HousePrices.csv",header=FALSE,sep=";")
head(price,50)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Details: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
A data frame containing 546 observations on 12 variables.
QQ.sale.price.of.House LOTSIZE BEdrooms bathROOMS stories\ driveway_1:yes2:no ReCreatioNaL fuLLBase gas heat AIRCOND garage?? PReferR
42000 5850 3 1 2 yes no yes no no 1 no
38500 4000 two 1 1 1 no n o no no 0 no
49500 3060 3 1 1 yes no n o no no 0 no
60500 6650 3 1 2 yes yes n o no no 0 no
61000 6360 two 1 1 yes no n o no no 0 no
66000 4160 3 1 1 yyy yes yes no yes 0 no
66000 3880 3 2 2 y no yes no no 2 no
69000 4160 3 1 3 yes no n o no no 0 no
83800 4800 3 1 1 yes yes yes no no 0 no
88500$ 5500 3 2 4 yes yes n o no yes 1 no
90000 7200 3 2 1 yes no yes no yes 3 no
30500 3000 two 1 1 not no n o no no 0 no
27000 1700 3 1 2 yes no n o no no 0 no
36000 2880 3 1 1 2 no n o no no 0 no
37000 3600 two 1 1 yes no n o no no 0 no
37900 3185 two 1 1 yes no n o no yes 0 no
40500 3300 3 1 2 not no n o no no 1 no
40750 5200 4 1 3 y no n o no no 0 no
45000 3450 one 1 1 yes no n o no no 0 no
45000 3986 two 2 1 not yes yes no no 1 no
48500 4785 3 1 2 yes yes yes no yes 1 no
65900 4510 4 2 2 1 no yes no no 0 no
37900 4000 3 1 2 yes no n o no yes 0 no
38000 3934 two 1 1 yes no n o no no 0 no
42000 4960 two 1 1 yes no n o no no 0 no
42300 3000 two 1 2 yes no n o no no 0 no
43500 3800 two 1 1 1 no n o no no 0 no
44000 4960 two 1 1 yes no yes no yes 0 no
44500 3000 3 1 1 2 no n o no yes 0 no
44900 4500 3 1 2 yes no n o no yes 0 no
45000 3500 two 1 1 not no yes no no 0 no
48000 3500 4 1 2 yes no n o no yes 2 no
49000 4000 two 1 1 yes no n o no no 0 no
51500 4500 two 1 1 yes no n o no no 0 no
61000 6360 two 1 2 yes no n o no no 0 no
61000 4500 two 1 1 yes no n o no yes 2 no
-61700 4032 two 1 1 yyy no yes no no 0 no
67000 5170 3 1 4 yes no n o no yes 0 no
82000 5400 4 2 2 yes no n o no yes 2 no
54500 3150 two 2 1 not no yes no no 0 no
66500$ 3745 3 1 2 yes no yes no no 0 no
70000 4520 3 1 2 1 no yes no yes 0 no
82000 4640 4 1 2 yes no n o no no 1 no
92000 8580 5 3 2 yes no n o no no 2 no
38000 2000 two 1 2 yes no n o no no 0 no
44000 2160 3 1 2 not no yes no no 0 no

Examine the data set seen above and specify the problems.

The initial problem often arises with the first three rows. Therefore, we prefer to skip these rows when reading the data into R.

price=read.csv("HousePrices.csv",sep=";",skip=3)
head(price,10)
QQ.sale.price.of.House LOTSIZE BEdrooms bathROOMS stories.. driveway_1.yes2.no ReCreatioNaL fuLLBase gas……….heat AIRCOND garage.. PReferR
42000 5850 3 1 2 yes no yes no no 1 no
38500 4000 two 1 1 1 no n o no no 0 no
49500 3060 3 1 1 yes no n o no no 0 no
60500 6650 3 1 2 yes yes n o no no 0 no
61000 6360 two 1 1 yes no n o no no 0 no
66000 4160 3 1 1 yyy yes yes no yes 0 no
66000 3880 3 2 2 y no yes no no 2 no
69000 4160 3 1 3 yes no n o no no 0 no
83800 4800 3 1 1 yes yes yes no no 0 no
88500$ 5500 3 2 4 yes yes n o no yes 1 no

It is good to check whether there is a missing value in our data set.

sum(is.na(price))
## [1] 0

Hopefully, the data is fully observed.

There are issues about the column names. Let’s proceed with fixing them.

colnames(price)
##  [1] "QQ.sale.price.of.House" "LOTSIZE"                "BEdrooms"              
##  [4] "bathROOMS"              "stories.."              "driveway_1.yes2.no"    
##  [7] "ReCreatioNaL"           "fuLLBase"               "gas..........heat"     
## [10] "AIRCOND"                "garage.."               "PReferR"

One common and initial solution is to convert the column names to lowercase.

colnames(price)=str_to_lower(colnames(price))
colnames(price)
##  [1] "qq.sale.price.of.house" "lotsize"                "bedrooms"              
##  [4] "bathrooms"              "stories.."              "driveway_1.yes2.no"    
##  [7] "recreational"           "fullbase"               "gas..........heat"     
## [10] "aircond"                "garage.."               "preferr"

Seems better…

new_col=colnames(price)

Then, delete the unnecessary dots in the column names.

new_col2=str_remove_all(new_col,"\\.")
new_col2
##  [1] "qqsalepriceofhouse" "lotsize"            "bedrooms"          
##  [4] "bathrooms"          "stories"            "driveway_1yes2no"  
##  [7] "recreational"       "fullbase"           "gasheat"           
## [10] "aircond"            "garage"             "preferr"

For the last variable, it is nice to use str_sub function to obtain “prefer” instead of “preferr”. Additionally, the first and the sixth variable names seem problematic. Let’s fix them.

new_col2[c(12)]=str_sub(new_col2[12],1,6)
new_col2[c(1,6)]=c("houseprice","driveway")
new_col2
##  [1] "houseprice"   "lotsize"      "bedrooms"     "bathrooms"    "stories"     
##  [6] "driveway"     "recreational" "fullbase"     "gasheat"      "aircond"     
## [11] "garage"       "prefer"
colnames(price)=new_col2
head(price)
houseprice lotsize bedrooms bathrooms stories driveway recreational fullbase gasheat aircond garage prefer
42000 5850 3 1 2 yes no yes no no 1 no
38500 4000 two 1 1 1 no n o no no 0 no
49500 3060 3 1 1 yes no n o no no 0 no
60500 6650 3 1 2 yes yes n o no no 0 no
61000 6360 two 1 1 yes no n o no no 0 no
66000 4160 3 1 1 yyy yes yes no yes 0 no

We have resolved the issues with the variable names. What about the observations?

Examine the class of variables (houseprice, bedrooms etc.)

summary(price)
##   houseprice           lotsize        bedrooms           bathrooms    
##  Length:546         Min.   : 1650   Length:546         Min.   :1.000  
##  Class :character   1st Qu.: 3600   Class :character   1st Qu.:1.000  
##  Mode  :character   Median : 4600   Mode  :character   Median :1.000  
##                     Mean   : 5150                      Mean   :1.286  
##                     3rd Qu.: 6360                      3rd Qu.:2.000  
##                     Max.   :16200                      Max.   :4.000  
##     stories        driveway         recreational         fullbase        
##  Min.   :1.000   Length:546         Length:546         Length:546        
##  1st Qu.:1.000   Class :character   Class :character   Class :character  
##  Median :2.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :1.808                                                           
##  3rd Qu.:2.000                                                           
##  Max.   :4.000                                                           
##    gasheat            aircond              garage          prefer         
##  Length:546         Length:546         Min.   :0.0000   Length:546        
##  Class :character   Class :character   1st Qu.:0.0000   Class :character  
##  Mode  :character   Mode  :character   Median :0.0000   Mode  :character  
##                                        Mean   :0.6923                     
##                                        3rd Qu.:1.0000                     
##                                        Max.   :3.0000
str(price)
## 'data.frame':    546 obs. of  12 variables:
##  $ houseprice  : chr  "42000" "38500" "49500" "60500" ...
##  $ lotsize     : int  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms    : chr  "3" "two" "3" "3" ...
##  $ bathrooms   : int  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories     : int  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway    : chr  "yes" "1" "yes" "yes" ...
##  $ recreational: chr  "no" "no" "no" "yes" ...
##  $ fullbase    : chr  "yes" "n o" "n o" "n o" ...
##  $ gasheat     : chr  "no" "no" "no" "no" ...
##  $ aircond     : chr  "no" "no" "no" "no" ...
##  $ garage      : int  1 0 0 0 0 0 2 0 0 1 ...
##  $ prefer      : chr  "no" "no" "no" "no" ...

Can the price be negative? Also, why is it defined as a character?

table(price$houseprice)
## 
## -103000  -61700  100000  100500  101000  102000  103000  103500  104900  105000 
##       1       1       1       1       2       1       1       1       1       4 
##  106000  106500  107000  107500  108000  110000  112000  112500  113000  113750 
##       3       1       1       1       2       2       1       1       1       1 
##  114000  114900  115442  116000  117000  118500  120000  120900  122000  122500 
##       1       1       1       1       1       1       5       1       1       1 
##  123500  124000  125000  126500  127000  128000  130000  132000  133000  138300 
##       2       1       1       1       1       1       2       2       1       1 
##  140000  141000  145000  155000  163000  174500  175000  190000   25000   25245 
##       2       1       2       1       1       1       2       1       3       1 
##   26000   26500   27000   28000   30000   30500   31900   32000   32500   33000 
##       1       1       2       1       3       1       1       1       3       1 
##   33500   34000   34400   35000  35000$   35500   36000   37000   37200   37900 
##       1       3       1       5       1       2       3       3       1       2 
##   38000   38500   39000   40000   40500   40750   41000   42000   42300   42500 
##       7       1       2       2       3       1       4       8       1       1 
##   42900   43000   43500   44000   44100   44500  44500$   44555   44700   44900 
##       1       7       1       4       1       2       1       1       1       1 
##   45000   46000   46200   46500   47000  47000$   47500   47600   47900   48000 
##       9       4       1       2       7       1       2       1       1       8 
##   48500   48900   49000   49500   49900   50000   50500   51000  51000$   51500 
##       3       1       6       3       1      17       1       3       1       2 
##   51900   52000   52500   52900   53000   53500   53900   54000   54500   54800 
##       1       9       4       2       5       1       3       6       1       1 
##   55000   55500   56000   57000   57250   57500   58000   58500   58550   58900 
##       7       2       7       5       2       3       5       3       1       1 
##   59000   59500   59900  59900$   60000   60500   61000   61100   61500   62000 
##       2       3       1       1      17       2       6       1       2       5 
##   62500   62600   62900   63000   63500   63900   64000   64500   64900   65000 
##       1       1       3       2       1       3       5       4       2       7 
##   65500   65900   66000  66500$   67000   67900   68000   68100   68500   69000 
##       2       1       5       1       6       1       3       1       2       4 
##   69500   69900   70000   70100   70500   70800   71000   71500   71900   72000 
##       1       2      12       1       1       1       2       1       1       4 
##   72500   73000   73500   74500   74700   74900   75000  75000$   75500   76000 
##       1       4       2       3       1       1       8       1       1       1 
##   76900   77000   77500   78000   78500   78900   79000   79500   80000   80750 
##       1       1       1       4       2       1       3       2       9       1 
##   82000   82500   82900   83000   83800   83900   84000   84900   85000   86000 
##       6       1       1       3       1       2       2       1       8       3 
##   86900   87000   87250   87500   88000   88500  88500$   89000   89500   89900 
##       2       3       1       1       2       2       1       2       1       1 
##   90000   91500   91700   92000   92500   93000   94000   94500   94700   95000 
##       5       1       1       2       2       3       1       2       1       6 
##   95500   96000   96500   97000   98000   98500   99000 
##       1       1       1       2       1       1       2
price[price$houseprice ==-61700,]$houseprice=61700
price[price$houseprice ==-103000,]$houseprice=103000

The gsub(): substitutes the string or the characters in a vector or a data frame with a specific string.

price$houseprice = gsub("\\$", "", price$houseprice)

Why is “bedroom” defined as a character although it represents numerical observations.

table(price$bedrooms)
## 
##   3   4   5   6 one two 
## 301  95  10   2   2 136
price[price$bedrooms=="one",]$bedrooms=1
price[price$bedrooms=="two",]$bedrooms=2

What about driveway? Does it include 10 levels?

table(price$driveway) #Remember 1:yes 2:no
## 
##   1  11   2  22 222  no not   y yes yyy 
##  13   1  12   1   1  50  13   2 449   4
price[price$driveway ==11,]$driveway=1
price[price$driveway ==22,]$driveway=2
price[price$driveway ==222,]$driveway=2
price[price$driveway =="no",]$driveway=2
price[price$driveway =="not",]$driveway=2
price[price$driveway =="yes",]$driveway=1
price[price$driveway =="yyy",]$driveway=1
price[price$driveway =="y",]$driveway=1

Let’s have a look at the variable fullbase. It has white-space problem in one of the levels which is “no”.

table(price$fullbase)
## 
## n o yes 
## 355 191
price$fullbase=gsub(" ", "", price$fullbase)
table(price$fullbase)
## 
##  no yes 
## 355 191

We have fixed certain errors but it is good to consider class of each variable.

str(price)
## 'data.frame':    546 obs. of  12 variables:
##  $ houseprice  : chr  "42000" "38500" "49500" "60500" ...
##  $ lotsize     : int  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms    : chr  "3" "2" "3" "3" ...
##  $ bathrooms   : int  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories     : int  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway    : chr  "1" "1" "1" "1" ...
##  $ recreational: chr  "no" "no" "no" "yes" ...
##  $ fullbase    : chr  "yes" "no" "no" "no" ...
##  $ gasheat     : chr  "no" "no" "no" "no" ...
##  $ aircond     : chr  "no" "no" "no" "no" ...
##  $ garage      : int  1 0 0 0 0 0 2 0 0 1 ...
##  $ prefer      : chr  "no" "no" "no" "no" ...
price$houseprice=as.integer(price$houseprice)
price$bedrooms=as.integer(price$bedrooms)
price$driveway=as.factor(price$driveway)
price$recreational=as.factor(price$recreational)
price$fullbase=as.factor(price$fullbase)
price$gasheat=as.factor(price$gasheat)
price$aircond=as.factor(price$aircond)
price$prefer=as.factor(price$prefer)

It seems there are still some problems about the observations of air condition. Now, we have to fix it.

price$aircond=str_to_lower(price$aircond)
price$aircond=as.factor(price$aircond)
str(price)
## 'data.frame':    546 obs. of  12 variables:
##  $ houseprice  : int  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
##  $ lotsize     : int  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms    : int  3 2 3 3 2 3 3 3 3 3 ...
##  $ bathrooms   : int  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories     : int  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway    : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ recreational: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
##  $ fullbase    : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
##  $ gasheat     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ aircond     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
##  $ garage      : int  1 0 0 0 0 0 2 0 0 1 ...
##  $ prefer      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

OK, the data seems reasonable! Now, we can move on analysis steps.

head(price,50)
houseprice lotsize bedrooms bathrooms stories driveway recreational fullbase gasheat aircond garage prefer
42000 5850 3 1 2 1 no yes no no 1 no
38500 4000 2 1 1 1 no no no no 0 no
49500 3060 3 1 1 1 no no no no 0 no
60500 6650 3 1 2 1 yes no no no 0 no
61000 6360 2 1 1 1 no no no no 0 no
66000 4160 3 1 1 1 yes yes no yes 0 no
66000 3880 3 2 2 1 no yes no no 2 no
69000 4160 3 1 3 1 no no no no 0 no
83800 4800 3 1 1 1 yes yes no no 0 no
88500 5500 3 2 4 1 yes no no yes 1 no
90000 7200 3 2 1 1 no yes no yes 3 no
30500 3000 2 1 1 2 no no no no 0 no
27000 1700 3 1 2 1 no no no no 0 no
36000 2880 3 1 1 2 no no no no 0 no
37000 3600 2 1 1 1 no no no no 0 no
37900 3185 2 1 1 1 no no no yes 0 no
40500 3300 3 1 2 2 no no no no 1 no
40750 5200 4 1 3 1 no no no no 0 no
45000 3450 1 1 1 1 no no no no 0 no
45000 3986 2 2 1 2 yes yes no no 1 no
48500 4785 3 1 2 1 yes yes no yes 1 no
65900 4510 4 2 2 1 no yes no no 0 no
37900 4000 3 1 2 1 no no no yes 0 no
38000 3934 2 1 1 1 no no no no 0 no
42000 4960 2 1 1 1 no no no no 0 no
42300 3000 2 1 2 1 no no no no 0 no
43500 3800 2 1 1 1 no no no no 0 no
44000 4960 2 1 1 1 no yes no yes 0 no
44500 3000 3 1 1 2 no no no yes 0 no
44900 4500 3 1 2 1 no no no yes 0 no
45000 3500 2 1 1 2 no yes no no 0 no
48000 3500 4 1 2 1 no no no yes 2 no
49000 4000 2 1 1 1 no no no no 0 no
51500 4500 2 1 1 1 no no no no 0 no
61000 6360 2 1 2 1 no no no no 0 no
61000 4500 2 1 1 1 no no no yes 2 no
61700 4032 2 1 1 1 no yes no no 0 no
67000 5170 3 1 4 1 no no no yes 0 no
82000 5400 4 2 2 1 no no no yes 2 no
54500 3150 2 2 1 2 no yes no no 0 no
66500 3745 3 1 2 1 no yes no no 0 no
70000 4520 3 1 2 1 no yes no yes 0 no
82000 4640 4 1 2 1 no no no no 1 no
92000 8580 5 3 2 1 no no no no 2 no
38000 2000 2 1 2 1 no no no no 0 no
44000 2160 3 1 2 2 no yes no no 0 no
41000 3040 2 1 1 2 no no no no 0 no
43000 3090 3 1 2 2 no no no no 0 no
48000 4960 4 1 3 2 no no no no 0 no
54800 3350 3 1 2 1 no no no no 0 no