Data set 1: HousePrices.csv

The data set includes prices and characteristics of n=128 houses. The following demonstrates the relationships between price and square footage and price and the number of bedrooms.

library(lattice)
hop<-read.csv("c:/DataMining/Data/HousePrices.csv")
xyplot(Price~SqFt,hop,col="black")

xyplot(Price~Bedrooms,hop,col="black", scales=list(x=list(at=c(1,2,3,4,5), limits=c(1,5))))

summary(hop)
##      HomeID           Price             SqFt         Bedrooms    
##  Min.   :  1.00   Min.   : 69100   Min.   :1450   Min.   :2.000  
##  1st Qu.: 32.75   1st Qu.:111325   1st Qu.:1880   1st Qu.:3.000  
##  Median : 64.50   Median :125950   Median :2000   Median :3.000  
##  Mean   : 64.50   Mean   :130427   Mean   :2001   Mean   :3.023  
##  3rd Qu.: 96.25   3rd Qu.:148250   3rd Qu.:2140   3rd Qu.:3.000  
##  Max.   :128.00   Max.   :211200   Max.   :2590   Max.   :5.000  
##    Bathrooms         Offers      Brick    Neighborhood
##  Min.   :2.000   Min.   :1.000   No :86   East :45    
##  1st Qu.:2.000   1st Qu.:2.000   Yes:42   North:44    
##  Median :2.000   Median :3.000            West :39    
##  Mean   :2.445   Mean   :2.578                        
##  3rd Qu.:3.000   3rd Qu.:3.000                        
##  Max.   :4.000   Max.   :6.000

Data set 2: DirectMarketing.csv

The data set includes data from a direct marketer who sells products only via direct mail.The following demonstrates the relationship between salary and the amount spent on products.It also shows a comparison between married and single persons and past purchase volumn history.

dim<-read.csv("c:/DataMining/Data/DirectMarketing.csv")
smoothScatter(dim$Salary,dim$AmountSpent)

hist.mr.tbl=table(His=dim$History,Mr=dim$Married)
hist.mr.tbl
##         Mr
## His      Married Single
##   High       205     50
##   Low         52    178
##   Medium     112    100
barchart(hist.mr.tbl,horizontal=FALSE,groups=FALSE,xlab="History",col="black")

Data set 3: Gender Discrimination.csv

The data includes gender,experience, and salary of n=208 individuals.The following demonstrates the relationship between experience and salary and then compares gender and salary.

ged<-read.csv("c:/DataMining/Data/GenderDiscrimination.csv")
xyplot(Experience~Salary,ged,col="black")

boxplot(Salary~Gender,data=ged,ylab="Salary",xlab="Gender")

Data set 4: LoanData.csv

The data set lists the outcomes of n= 5611 loans. The following shows the frequency of different crredit grades and a summary of the data set.

lod<-read.csv("c:/DataMining/Data/LoanData.csv")
barchart(lod$Credit.Grade,ylab="Credit Grade",col="black")

summary(lod)
##      Status      Credit.Grade      Amount           Age        
##  Current:5186   HR     :1217   Min.   : 1000   Min.   : 0.000  
##  Default:  75   E      :1129   1st Qu.: 2025   1st Qu.: 2.000  
##  Late   : 350   D      : 927   Median : 3001   Median : 4.000  
##                 C      : 843   Mean   : 4817   Mean   : 4.504  
##                 B      : 553   3rd Qu.: 6000   3rd Qu.: 7.000  
##                 AA     : 451   Max.   :25000   Max.   :14.000  
##                 (Other): 491                                   
##  Borrower.Rate    Debt.To.Income.Ratio
##  Min.   :0.0000   Min.   :    0.00    
##  1st Qu.:0.1425   1st Qu.:    0.09    
##  Median :0.1950   Median :    0.16    
##  Mean   :0.1937   Mean   :   45.38    
##  3rd Qu.:0.2500   3rd Qu.:    0.25    
##  Max.   :0.4975   Max.   :51280.07    
## 
sd(lod$Amount)
## [1] 4436.923
sd(lod$Borrower.Rate)
## [1] 0.06875547

Data set 5: FinancialIndicators.csv

The data set lists indicators of the financial health of n=7112 companies listed at various stock exchanges.The following shows a lack of strong relationships between stock price and accounting information.

fii<-read.csv("c:/DataMining/Data/FinancialIndicators.csv")
xyplot(Stock.Price~Net.Income,fii,col="black")

xyplot(Stock.Price~ Growth.in.Revenue..last.year,fii,col="black")

xyplot(Stock.Price~Trailing.Net.Income,fii,col="black")