Exercise 2 (REDO)

Housing Prices

Run Housing Prices Data

hp<-read.csv("https://www.biz.uiowa.edu/faculty/jledolter/datamining/HousePrices.csv")

attach(hp)
library(nutshell)

## Loading required package: nutshell.bbdb

## Loading required package: nutshell.audioscrobbler

library(lattice)
library(latticeExtra)

## Loading required package: RColorBrewer

offers.ho=table(hp$Offers)
offers.ho

## 
##  1  2  3  4  5  6 
## 23 36 46 19  3  1

barchart(offers.ho,horizontal=FALSE,ylab="Houses",xlab="Number of Offers",col="purple")

This graph is plotting the number of houses that had different amount of offers. As shown in the graph, most houses received three offers while the least received six offers. The graph is highly skewed towards 1 and does not follow a normal distribution.

Price<-hp$Price

densityplot(~Price,groups=hp$Neighborhood,data=hp,plot.points=FALSE)

This graph shows the density of house prices grouped by neighborhood. Because I do not know how to make plots, it is difficult to discern which group is which. However, based on previous analysis, the green group is the West neighborhood because it had the highest overall price. Based on the group overlap, the average house price seems to be around $125,000.

Direct Marketing Data

Run Direct Marketing Data

if(!file.exists("DirectMarketing.csv")){
  dir.create("DirectMarketing.csv")
}
download.file("https://www.biz.uiowa.edu/faculty/jledolter/datamining/DirectMarketing.csv", "DirectMarketing.csv",method="curl")
DirectMk<-read.csv("https://www.biz.uiowa.edu/faculty/jledolter/datamining/DirectMarketing.csv")
attach(DirectMk)

smoothScatter(DirectMk$Salary,DirectMk$AmountSpent,ylab="Amount Spent", xlab="Salary")

This is a smooth scatter plot of salary against amount spent. According to this scatter plot, there seems to be a strong, positive linear relationship between a customer’s salary and how much they spend. There are some outliers who have high salaries and spend little money.

HM.tbl=table(HM=DirectMk$OwnHome)
HM.tbl

## HM
##  Own Rent 
##  516  484

barchart(HM.tbl)

This bar graph plots the amount of people who rent vs own homes. According to this graph, there seems to be an even split between the amount that rent and those that own. This variable may not be useful because there is such an even divide between the two groups.

Gender Discrimination

Run Gender Discrimination Data

if(!file.exists("GenderDiscrimination.csv")){
  dir.create("GenderDiscrimination.csv")
}
download.file("https://www.biz.uiowa.edu/faculty/jledolter/datamining/GenderDiscrimination.csv", "GenderDiscrimination.csv",method="curl")
GenDsc<-read.csv("https://www.biz.uiowa.edu/faculty/jledolter/datamining/GenderDiscrimination.csv")
attach(GenDsc)

smoothScatter(GenDsc$Experience,GenDsc$Salary,ylab="Salary",xlab="Experience")

This is a smooth scatter plot plotting experience against salary. There seems to be a slight positive linear relationship between the two variables. However, there are a large number of outliers that have many years of experience but low salaries. Experience alone would not be enough to predict salaries.

xyplot(GenDsc$Experience~GenDsc$Salary|GenDsc$Gender,data=GenDsc,ylab="Experience",xlab="Salary",layout=c(1,2),col="black")

This is an xy plot that splits the male and female data points to measure see if the previous graph is different between the genders. Overall, there seems to be a slight positive linear relationship between the two variables in both graphs. However, there are a few points in the male category that have high experience and high salaries. There would to be more analysis to see if gender is a factor in salary level.

Loan Data

Run Loan Data

if(!file.exists("LoanData.csv")){
  dir.create("LoanData.csv")
}
download.file("https://www.biz.uiowa.edu/faculty/jledolter/datamining/LoanData.csv", "LoanData.csv",method="curl")
LoanData<-read.csv("https://www.biz.uiowa.edu/faculty/jledolter/datamining/LoanData.csv")
attach(LoanData)

boxplot(LoanData$Borrower.Rate~LoanData$Age,data=LoanData,ylab="Borrower Rate",xlab="Age")

These box plots show the average, minimum, and maximum borrower rate for each loan age. Based on these, few loans get to age 13 and the loan age with the greatest variability is age 10. Further investigation is needed to see why that is. Other than that, most loan ages have similar borrower rates.

cred.gr=table(LoanData$Credit.Grade)
cred.gr

## 
##    A   AA    B    C    D    E   HR   NC 
##  424  451  553  843  927 1129 1217   67

barchart(cred.gr,horizontal=FALSE,xlab="Credit Grade",ylab="Frequency",col="Black")

This bar graph plots the frequency of credit grades. Based on this graph, the better the credit grade, the less people are likely to have it (A is the best rating and HR is the worst). There is one column for no credit for those that do not have credit scores. This pattern makes sense because it is difficult to have a perfect credit rating.

Financial Indicators

Run Financial Indicators Data

if(!file.exists("FinancialIndicators.csv")){
  dir.create("FinancialIndicators.csv")
}
download.file("https://www.biz.uiowa.edu/faculty/jledolter/datamining/FinancialIndicators.csv", "FinancialIndicators.csv",method="curl")
FinInd<-read.csv("https://www.biz.uiowa.edu/faculty/jledolter/datamining/FinancialIndicators.csv")
attach(FinInd)

Cntry.name=table(FinInd$Country)
Cntry.name

## 
## Foreign      US 
##    1783    5329

barchart(Cntry.name,horizontal=FALSE,xlab="Country Name",ylab="Frequency",col="Black")

It was very difficult to find charts that would work with this data set because of the large amounts of variables and variability within those variables. This graph shows how many foreign stocks there are compared to US stocks. Because this is United States data, this is expected.

fact=factor(FinInd$Country)
rest=FinInd$Growth.in.Revenue..last.year
t3=tapply(rest,fact,mean,na.rm=TRUE)
t3

##    Foreign         US 
## 0.13100953 0.09507975

This chart shows the average growth in revenue last year for each country in the data. Based on this chart, on average foreign stocks grew at a higher rate that those in the US last year. This may be important information for investors who want to predict which stocks may yield the most returns.