About This Demo

This file demonstrates a typical process of using R package “cleandata” to prepare data for machine learning.

R Package “cleandata”

A collection of functions that work with data frame to inspect, impute, and encode data. The functions for imputation and encoding can produce log files to help you keep track of data manipulation process.

Available on CRAN: https://cran.r-project.org/package=cleandata

Source Code on GitHub: https://github.com/sherrisherry/cleandata

Issues and Suggestions: https://github.com/sherrisherry/cleandata/issues

Dataset Used in This Demo

  • Name: Ames Housing dataset
  • Source: Kaggle contest House Prices

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa for predicting housing prices, this dataset is a typical example of what a business analyst encounters everyday.


Inspection

According to the description of this dataset, the “NA”s in some columns aren’t missing value. To prevent R from comfusing them with true missing values, read in the data files without converting any value to the NA in R.

The train set should have only one more column SalePrice than the test set.

# import 'cleandata' package.
library('cleandata')

# read in the training and test datasets without converting 'NA's to missing values.
train <- read.csv('data/train.csv', na.strings = "", strip.white = TRUE)
test <- read.csv('data/test.csv', na.strings = "", strip.white = TRUE)

# summarize the training set and test set
cat(paste('train: ', nrow(train), 'obs. ', ncol(train), 'cols\ncolumn names:\n', toString(colnames(train)), 
          '\n\ntest: ', nrow(test), 'obs. ', ncol(test), 'cols\ncolumn names:\n', toString(colnames(test)), '\n'))
## train:  1460 obs.  81 cols
## column names:
##  Id, MSSubClass, MSZoning, LotFrontage, LotArea, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborhood, Condition1, Condition2, BldgType, HouseStyle, OverallQual, OverallCond, YearBuilt, YearRemodAdd, RoofStyle, RoofMatl, Exterior1st, Exterior2nd, MasVnrType, MasVnrArea, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinSF1, BsmtFinType2, BsmtFinSF2, BsmtUnfSF, TotalBsmtSF, Heating, HeatingQC, CentralAir, Electrical, X1stFlrSF, X2ndFlrSF, LowQualFinSF, GrLivArea, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath, BedroomAbvGr, KitchenAbvGr, KitchenQual, TotRmsAbvGrd, Functional, Fireplaces, FireplaceQu, GarageType, GarageYrBlt, GarageFinish, GarageCars, GarageArea, GarageQual, GarageCond, PavedDrive, WoodDeckSF, OpenPorchSF, EnclosedPorch, X3SsnPorch, ScreenPorch, PoolArea, PoolQC, Fence, MiscFeature, MiscVal, MoSold, YrSold, SaleType, SaleCondition, SalePrice 
## 
## test:  1459 obs.  80 cols
## column names:
##  Id, MSSubClass, MSZoning, LotFrontage, LotArea, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborhood, Condition1, Condition2, BldgType, HouseStyle, OverallQual, OverallCond, YearBuilt, YearRemodAdd, RoofStyle, RoofMatl, Exterior1st, Exterior2nd, MasVnrType, MasVnrArea, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinSF1, BsmtFinType2, BsmtFinSF2, BsmtUnfSF, TotalBsmtSF, Heating, HeatingQC, CentralAir, Electrical, X1stFlrSF, X2ndFlrSF, LowQualFinSF, GrLivArea, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath, BedroomAbvGr, KitchenAbvGr, KitchenQual, TotRmsAbvGrd, Functional, Fireplaces, FireplaceQu, GarageType, GarageYrBlt, GarageFinish, GarageCars, GarageArea, GarageQual, GarageCond, PavedDrive, WoodDeckSF, OpenPorchSF, EnclosedPorch, X3SsnPorch, ScreenPorch, PoolArea, PoolQC, Fence, MiscFeature, MiscVal, MoSold, YrSold, SaleType, SaleCondition

To ensure consistency in the following imputation and encoding process across the train set and the test set, I appended the test set to the train set. The SalePrice values of the rows of the test set was set to NA to distinguish them from the rows of the train set. The resulting data frame was called df.

# filling the target columns of the test set with NA then combining test and training sets
test$SalePrice <- NA
df <- rbind(train, test)
rm(train, test)

Function inspect_na

inspect_na() counts the number of NAs in each column and sort them in descending order. In the following operation, inspect_na() returned the top 5 columns with missing values. If you want to see the number of missing values in every column, leave parameter top as default. As supposed, only SalePrice contained missing values, which equaled to the number of rows in the test set.

inspect_na(df, top = 5)
##   SalePrice          Id  MSSubClass    MSZoning LotFrontage 
##        1459           0           0           0           0

The NAs in the columns listed in NAisNoA were what was refered to as ‘none’-but-not-‘NA’ values. In these columns, NA had only one possible value - “not applicable”. I replaced these NAs with NoA to prevent imputing them later.

# in the 'NAisNoA' columns, NA means this attribute doesn't apply to them, not missing.
NAisNoA <- c('Alley', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 
             'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 
             'PoolQC', 'Fence', 'MiscFeature')
for(i in NAisNoA){levels(df[, i])[levels(df[, i]) == "NA"] <- "NoA"}

At this stage, I reconstructed the data frame df to inspect the true missing values.

We can see that only LotFrontage had about 20% missing values. The other columns had few to no missing value.

# write the dataset into a csv file then read this file back to df to reconstruct df
write.csv(df, file = 'data/data.csv', row.names = FALSE)
df <- read.csv('data/data.csv', na.strings = "NA", strip.white = TRUE)

# see which predictors have most NAs
inspect_na(df[, -ncol(df)], top = 25)
##  LotFrontage  GarageYrBlt   MasVnrType   MasVnrArea     MSZoning 
##          486          159           24           23            4 
##    Utilities BsmtFullBath BsmtHalfBath   Functional  Exterior1st 
##            2            2            2            2            1 
##  Exterior2nd   BsmtFinSF1   BsmtFinSF2    BsmtUnfSF  TotalBsmtSF 
##            1            1            1            1            1 
##   Electrical  KitchenQual   GarageCars   GarageArea     SaleType 
##            1            1            1            1            1 
##           Id   MSSubClass      LotArea       Street        Alley 
##            0            0            0            0            0

Function inspect_map

inspect_map() classifies the columns of a data frame. Before I further explain this function, I’d like to introduce ‘scheme’. In package “cleandata”, a scheme refers to a set of all the possible values of an enumerator. The factor objects in R are enumerators.

Function inspect_map returns a list of factor_cols (list), factor_levels (list), num_cols (vector), char_cols (vector), ordered_cols (vector), and other_cols (vector).

In the following codes, I specified that 2 factorial columns share the same scheme if their levels had more than 2 same values by setting the common parameter to 2. By default, the common parameter is 0, which means every level of 2 factorial columns should be the same for them to share the same scheme.

# create a map for imputation and encoding
data_map <- inspect_map(df[, -ncol(df)], common = 2)
## Id integer factors: 0 nums: 1 chars: 0 ordered: 0 others: 0 
## MSSubClass integer factors: 0 nums: 2 chars: 0 ordered: 0 others: 0 
## MSZoning factor factors: 1 nums: 2 chars: 0 ordered: 0 others: 0 
## LotFrontage integer factors: 1 nums: 3 chars: 0 ordered: 0 others: 0 
## LotArea integer factors: 1 nums: 4 chars: 0 ordered: 0 others: 0 
## Street factor factors: 2 nums: 4 chars: 0 ordered: 0 others: 0 
## Alley factor factors: 3 nums: 4 chars: 0 ordered: 0 others: 0 
## LotShape factor factors: 4 nums: 4 chars: 0 ordered: 0 others: 0 
## LandContour factor factors: 5 nums: 4 chars: 0 ordered: 0 others: 0 
## Utilities factor factors: 6 nums: 4 chars: 0 ordered: 0 others: 0 
## LotConfig factor factors: 7 nums: 4 chars: 0 ordered: 0 others: 0 
## LandSlope factor factors: 8 nums: 4 chars: 0 ordered: 0 others: 0 
## Neighborhood factor factors: 9 nums: 4 chars: 0 ordered: 0 others: 0 
## Condition1 factor factors: 10 nums: 4 chars: 0 ordered: 0 others: 0 
## Condition2 factor factors: 11 nums: 4 chars: 0 ordered: 0 others: 0 
## BldgType factor factors: 12 nums: 4 chars: 0 ordered: 0 others: 0 
## HouseStyle factor factors: 13 nums: 4 chars: 0 ordered: 0 others: 0 
## OverallQual integer factors: 13 nums: 5 chars: 0 ordered: 0 others: 0 
## OverallCond integer factors: 13 nums: 6 chars: 0 ordered: 0 others: 0 
## YearBuilt integer factors: 13 nums: 7 chars: 0 ordered: 0 others: 0 
## YearRemodAdd integer factors: 13 nums: 8 chars: 0 ordered: 0 others: 0 
## RoofStyle factor factors: 14 nums: 8 chars: 0 ordered: 0 others: 0 
## RoofMatl factor factors: 15 nums: 8 chars: 0 ordered: 0 others: 0 
## Exterior1st factor factors: 16 nums: 8 chars: 0 ordered: 0 others: 0 
## Exterior2nd factor factors: 17 nums: 8 chars: 0 ordered: 0 others: 0 
## MasVnrType factor factors: 18 nums: 8 chars: 0 ordered: 0 others: 0 
## MasVnrArea integer factors: 18 nums: 9 chars: 0 ordered: 0 others: 0 
## ExterQual factor factors: 19 nums: 9 chars: 0 ordered: 0 others: 0 
## ExterCond factor factors: 20 nums: 9 chars: 0 ordered: 0 others: 0 
## Foundation factor factors: 21 nums: 9 chars: 0 ordered: 0 others: 0 
## BsmtQual factor factors: 22 nums: 9 chars: 0 ordered: 0 others: 0 
## BsmtCond factor factors: 23 nums: 9 chars: 0 ordered: 0 others: 0 
## BsmtExposure factor factors: 24 nums: 9 chars: 0 ordered: 0 others: 0 
## BsmtFinType1 factor factors: 25 nums: 9 chars: 0 ordered: 0 others: 0 
## BsmtFinSF1 integer factors: 25 nums: 10 chars: 0 ordered: 0 others: 0 
## BsmtFinType2 factor factors: 26 nums: 10 chars: 0 ordered: 0 others: 0 
## BsmtFinSF2 integer factors: 26 nums: 11 chars: 0 ordered: 0 others: 0 
## BsmtUnfSF integer factors: 26 nums: 12 chars: 0 ordered: 0 others: 0 
## TotalBsmtSF integer factors: 26 nums: 13 chars: 0 ordered: 0 others: 0 
## Heating factor factors: 27 nums: 13 chars: 0 ordered: 0 others: 0 
## HeatingQC factor factors: 28 nums: 13 chars: 0 ordered: 0 others: 0 
## CentralAir factor factors: 29 nums: 13 chars: 0 ordered: 0 others: 0 
## Electrical factor factors: 30 nums: 13 chars: 0 ordered: 0 others: 0 
## X1stFlrSF integer factors: 30 nums: 14 chars: 0 ordered: 0 others: 0 
## X2ndFlrSF integer factors: 30 nums: 15 chars: 0 ordered: 0 others: 0 
## LowQualFinSF integer factors: 30 nums: 16 chars: 0 ordered: 0 others: 0 
## GrLivArea integer factors: 30 nums: 17 chars: 0 ordered: 0 others: 0 
## BsmtFullBath integer factors: 30 nums: 18 chars: 0 ordered: 0 others: 0 
## BsmtHalfBath integer factors: 30 nums: 19 chars: 0 ordered: 0 others: 0 
## FullBath integer factors: 30 nums: 20 chars: 0 ordered: 0 others: 0 
## HalfBath integer factors: 30 nums: 21 chars: 0 ordered: 0 others: 0 
## BedroomAbvGr integer factors: 30 nums: 22 chars: 0 ordered: 0 others: 0 
## KitchenAbvGr integer factors: 30 nums: 23 chars: 0 ordered: 0 others: 0 
## KitchenQual factor factors: 31 nums: 23 chars: 0 ordered: 0 others: 0 
## TotRmsAbvGrd integer factors: 31 nums: 24 chars: 0 ordered: 0 others: 0 
## Functional factor factors: 32 nums: 24 chars: 0 ordered: 0 others: 0 
## Fireplaces integer factors: 32 nums: 25 chars: 0 ordered: 0 others: 0 
## FireplaceQu factor factors: 33 nums: 25 chars: 0 ordered: 0 others: 0 
## GarageType factor factors: 34 nums: 25 chars: 0 ordered: 0 others: 0 
## GarageYrBlt integer factors: 34 nums: 26 chars: 0 ordered: 0 others: 0 
## GarageFinish factor factors: 35 nums: 26 chars: 0 ordered: 0 others: 0 
## GarageCars integer factors: 35 nums: 27 chars: 0 ordered: 0 others: 0 
## GarageArea integer factors: 35 nums: 28 chars: 0 ordered: 0 others: 0 
## GarageQual factor factors: 36 nums: 28 chars: 0 ordered: 0 others: 0 
## GarageCond factor factors: 37 nums: 28 chars: 0 ordered: 0 others: 0 
## PavedDrive factor factors: 38 nums: 28 chars: 0 ordered: 0 others: 0 
## WoodDeckSF integer factors: 38 nums: 29 chars: 0 ordered: 0 others: 0 
## OpenPorchSF integer factors: 38 nums: 30 chars: 0 ordered: 0 others: 0 
## EnclosedPorch integer factors: 38 nums: 31 chars: 0 ordered: 0 others: 0 
## X3SsnPorch integer factors: 38 nums: 32 chars: 0 ordered: 0 others: 0 
## ScreenPorch integer factors: 38 nums: 33 chars: 0 ordered: 0 others: 0 
## PoolArea integer factors: 38 nums: 34 chars: 0 ordered: 0 others: 0 
## PoolQC factor factors: 39 nums: 34 chars: 0 ordered: 0 others: 0 
## Fence factor factors: 40 nums: 34 chars: 0 ordered: 0 others: 0 
## MiscFeature factor factors: 41 nums: 34 chars: 0 ordered: 0 others: 0 
## MiscVal integer factors: 41 nums: 35 chars: 0 ordered: 0 others: 0 
## MoSold integer factors: 41 nums: 36 chars: 0 ordered: 0 others: 0 
## YrSold integer factors: 41 nums: 37 chars: 0 ordered: 0 others: 0 
## SaleType factor factors: 42 nums: 37 chars: 0 ordered: 0 others: 0 
## SaleCondition factor factors: 43 nums: 37 chars: 0 ordered: 0 others: 0
summary(data_map)
##               Length Class  Mode     
## factor_cols   31     -none- list     
## factor_levels 31     -none- list     
## num_cols      37     -none- character
## char_cols      0     -none- NULL     
## ordered_cols   0     -none- NULL     
## other_cols     0     -none- NULL

This dataset only had factorial and numeric columns. I unpacked data_map before heading to imputation and encoding.

factor_cols <- data_map$factor_cols
factor_levels <- data_map$factor_levels
num_cols <- data_map$num_cols
rm(data_map)

Imputation and Encoding

The functions for imputation and encoding keep track of your process by producing log files. This feature is by default disabled. To enable log files, I passed a list of arguments to the log parameter in every imputation or encoding function.

# create a list of arguments for producing a log file
log_arg <- list(file = 'log.txt', append = TRUE, split = FALSE)

log_arg stroed the arguments I passed to parameter log. The log file was named “log.txt”, new information was appended to the file, and the contents to the log file weren’t printed to the standard output.

Imputation Functions

To prevent leakage, I instructed the imputation functions to use only rows of the train set to calculate the imputation values by passing an index to parameter idx.

Function impute_mode, impute_median, impute_mean

impute_mode() works with both numerical, string, and factorial columns. It impute NAs by the modes of their corresponding columns.

impute_median() and impute_mean() only work with numerical columns. They impute NAs by medians and means respectively.

# impute NAs in factorial columns by the mode of corresponding columns
lst <- unlist(factor_cols)
df <- impute_mode(df, cols = lst, idx = !is.na(df$SalePrice), log = log_arg)

# impute NAs in numerical columns by the median of corresponding columns
lst <- num_cols
df <- impute_median(df, cols = lst, idx = !is.na(df$SalePrice), log = log_arg)

# check the result
inspect_na(df[, -ncol(df)], top = 5)
##          Id  MSSubClass    MSZoning LotFrontage     LotArea 
##           0           0           0           0           0

Encoding Functions

Every encoding function prints summary of the columns before and after encoding by default. The output of every funcion is by default factorial. If you want numerical output, set parameter out.int to TRUE after making sure no missing value in the input.

In this demo, I kept the encoded columns factorial because I intended to save the dataset into a csv file, which doesn’t distinguish between factorial and numerical columns.

Encoding Ordinal Columns

In business datasets, we can often find ratings, which are ordinal and use similar schemes. Based on my experience, if many columns share the same scheme, they are likely to be ratings.

summary(factor_cols)
##               Length Class  Mode     
## MSZoning       1     -none- character
## Street         1     -none- character
## Alley          1     -none- character
## LotShape       1     -none- character
## LandContour    1     -none- character
## Utilities      1     -none- character
## LotConfig      1     -none- character
## LandSlope      1     -none- character
## Neighborhood   1     -none- character
## Condition1     2     -none- character
## BldgType       1     -none- character
## HouseStyle     1     -none- character
## RoofStyle      1     -none- character
## RoofMatl       1     -none- character
## Exterior1st    2     -none- character
## MasVnrType     1     -none- character
## ExterQual     10     -none- character
## Foundation     1     -none- character
## BsmtExposure   1     -none- character
## BsmtFinType1   2     -none- character
## Heating        1     -none- character
## CentralAir     1     -none- character
## Electrical     1     -none- character
## Functional     1     -none- character
## GarageType     1     -none- character
## GarageFinish   1     -none- character
## PavedDrive     1     -none- character
## Fence          1     -none- character
## MiscFeature    1     -none- character
## SaleType       1     -none- character
## SaleCondition  1     -none- character

In our dataset ExterQual and other 9 columns share the same scheme. After I checked their scheme and the description file, I was sure that they were ordinal.

factor_levels$ExterQual
## [1] "Ex"  "Fa"  "Gd"  "NoA" "Po"  "TA"

“Po”: poor, “Fa”: fair, “TA”: typical/average, “Gd”: good, “Ex”: excellent

Function encode_ordinal

encode_ordinal() encodes ordinal data into sequential integers by a given order. The argument passed to none is always encoded to 0. The 1st member of the vector passed to order is encoded to 1.

# encoding ordinal columns
i <- 'ExterQual'; lst <- c('Po', 'Fa', 'TA', 'Gd', 'Ex')
df[, factor_cols[[i]]] <- encode_ordinal(df[, factor_cols[[i]]], order = lst, none = 'NoA', log = log_arg)
##  ExterQual ExterCond BsmtQual   BsmtCond   HeatingQC KitchenQual
##  Ex: 107   Ex:  12   Ex : 258   Fa : 104   Ex:1493   Ex: 205    
##  Fa:  35   Fa:  67   Fa :  88   Gd : 122   Fa:  92   Fa:  70    
##  Gd: 979   Gd: 299   Gd :1209   NoA:  82   Gd: 474   Gd:1151    
##  TA:1798   Po:   3   NoA:  81   Po :   5   Po:   3   TA:1493    
##            TA:2538   TA :1283   TA :2606   TA: 857              
##                                                                 
##  FireplaceQu GarageQual GarageCond PoolQC    
##  Ex :  43    Ex :   3   Ex :   3   Ex :   4  
##  Fa :  74    Fa : 124   Fa :  74   Fa :   2  
##  Gd : 744    Gd :  24   Gd :  15   Gd :   4  
##  NoA:1420    NoA: 159   NoA: 159   NoA:2909  
##  Po :  46    Po :   5   Po :  14             
##  TA : 592    TA :2604   TA :2654             
## coded 10 cols 5 levels 
##  ExterQual ExterCond BsmtQual BsmtCond HeatingQC KitchenQual FireplaceQu
##  5: 107    5:  12    5: 258   2: 104   5:1493    5: 205      5:  43     
##  2:  35    2:  67    2:  88   4: 122   2:  92    2:  70      2:  74     
##  4: 979    4: 299    4:1209   0:  82   4: 474    4:1151      4: 744     
##  3:1798    1:   3    0:  81   1:   5   1:   3    3:1493      0:1420     
##            3:2538    3:1283   3:2606   3: 857                1:  46     
##                                                              3: 592     
##  GarageQual GarageCond PoolQC  
##  5:   3     5:   3     5:   4  
##  2: 124     2:  74     2:   2  
##  4:  24     4:  15     4:   4  
##  0: 159     0: 159     0:2909  
##  1:   5     1:  14             
##  3:2604     3:2654
# removing encoded columns from the map
factor_levels[[i]] <- NULL
factor_cols[[i]] <- NULL

The Utilities column was binary according the dataset.

factor_levels$Utilities
## [1] "AllPub" "NoSeWa"
levels(df$Utilities)
## [1] "AllPub" "NoSeWa"

However, the description file indicates that it has 4 possible values: ‘ELO’, ‘NoSeWa’, ‘NoSewr’, ‘AllPub’. Therefore, I encoded it as having 4 levels.

# in dataset only "AllPub" "NoSeWa", with 2 NAs
i <- 'Utilities'; lst <- c('ELO', 'NoSeWa', 'NoSewr', 'AllPub')
df[, factor_cols[[i]]] <- encode_ordinal(df[, factor_cols[[i]], drop=FALSE], order = lst, log = log_arg)
##   Utilities   
##  AllPub:2918  
##  NoSeWa:   1  
## coded 1 cols 4 levels 
##  Utilities
##  4:2918   
##  2:   1
factor_levels[[i]]<-NULL
factor_cols[[i]]<-NULL

Encoding Binary Columns

# find all the 2-level columns
lst <- lapply(factor_levels, length)
lst <- as.data.frame(lst)
colnames(lst[, lst == 2])
## [1] "Street"     "CentralAir"

Function encode_binary

encode_binary() encodes binary data into 0 and 1, regardless of order.

# encode all the 2-level columns
i <- c(factor_cols$Street, factor_cols$CentralAir)
df[, i] <- encode_binary(df[, i, drop=FALSE], log = log_arg)
##   Street     CentralAir
##  Grvl:  12   N: 196    
##  Pave:2907   Y:2723    
## coded 1 cols 
## coded 1 cols 
##  Street   CentralAir
##  0:  12   0: 196    
##  1:2907   1:2723

The Log File

Let’s check the log file at the end.

cat(paste(readLines('log.txt'), collapse = '\n'))
## Columns Imputed by Mode:
##  MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborhood, Condition1, Condition2, BldgType, HouseStyle, RoofStyle, RoofMatl, Exterior1st, Exterior2nd, MasVnrType, ExterQual, ExterCond, BsmtQual, BsmtCond, HeatingQC, KitchenQual, FireplaceQu, GarageQual, GarageCond, PoolQC, Foundation, BsmtExposure, BsmtFinType1, BsmtFinType2, Heating, CentralAir, Electrical, Functional, GarageType, GarageFinish, PavedDrive, Fence, MiscFeature, SaleType, SaleCondition
## 
## Columns Imputed by Median:
##  Id, MSSubClass, LotFrontage, LotArea, OverallQual, OverallCond, YearBuilt, YearRemodAdd, MasVnrArea, BsmtFinSF1, BsmtFinSF2, BsmtUnfSF, TotalBsmtSF, X1stFlrSF, X2ndFlrSF, LowQualFinSF, GrLivArea, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath, BedroomAbvGr, KitchenAbvGr, TotRmsAbvGrd, Fireplaces, GarageYrBlt, GarageCars, GarageArea, WoodDeckSF, OpenPorchSF, EnclosedPorch, X3SsnPorch, ScreenPorch, PoolArea, MiscVal, MoSold, YrSold
## 
## Columns:
##  ExterQual, ExterCond, BsmtQual, BsmtCond, HeatingQC, KitchenQual, FireplaceQu, GarageQual, GarageCond, PoolQC
## Scheme:
## NoA  Po  Fa  TA  Gd  Ex 
##   0   1   2   3   4   5 
## 
## Columns:
##  Utilities
## Scheme:
##           ELO NoSeWa NoSewr AllPub 
##      0      1      2      3      4 
## 
## Columns:
##  Street
## Scheme:
## Grvl Pave 
##    0    1 
## 
## Columns:
##  CentralAir
## Scheme:
## N Y 
## 0 1

=== end ===