Making decision on location for buying home is one of the most important decision for lot of people in their lifetime. Lot of effort and time is spent to do research on various locations on different location attributes. This project aims at building a reusable framework that can be used for home buying decision making process. Following are the key objectives of this project- 1. Build a reusable framework to do comparative data analysis among various locations around a selected location 2. Build a scoring model to factors various city livability parameters and come up with livability score for locations 3. Build decision metrics considering price and livability score to come up with comparative decision index for cities
$1. For the purpose of the project Cleveland was selected as a location around which best other cities to be compared. Below site was used to collect data on top cities around Cleveland by doing web scrapping. %a. http://www.bestplaces.net/people/city/ohio/strongsville $2. Web scrapping was done to extract city livability parameters. Following parameters were used %a. City population growth rate, population density, diversity, young population %, mass commute provision, avg commute time, crime rate, investment in education. $3. Build overall scoring model with the above mention factors and come up with a combined scoring model. $4. Source home price information by city $a. Zillow data will be used here $5. Analyze home price by city and combined score in point 4 $6. Analyze the top factors that drives home prices of a city
suppressWarnings(library(dplyr))##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
suppressWarnings(library(stringr))
suppressWarnings(library(XML))
suppressWarnings(library(RCurl))## Loading required package: bitops
suppressWarnings(library(ggplot2))
suppressWarnings(library(reshape2))
suppressWarnings(library(varhandle))
suppressWarnings(library(tidyr))##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
# Following url was used to identify the top cities near cleveland
best_place <- 'http://www.areavibes.com/cleveland-oh/best-places-to-live/'
# Extract list of cities and score
webpage<-getURL(best_place)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
text1<-unlist(xpathSApply(pagetree,"/html/body/div/div/div/a",xmlValue))
best_cities<-data.frame()
name<-c(unlist(gsub(",","",(str_sub(str_extract(text1,"[[:alnum:] ]{1,},"),start=3)))))
score<-c(unlist(str_sub(str_extract(text1,"[[:alnum:] ]{1,},"),start=1 , end=2)))
best_cities<-data.frame(name=c(name),score=c(score))
best_cities<-best_cities[1:10,]
best_cities$name<-tolower(best_cities$name)
best_cities$name<-gsub(" ","%20",best_cities$name)
# name of best cities and related bestplaces.net score
best_cities## name score
## 1 cleveland 72
## 2 solon 83
## 3 orange 83
## 4 rocky%20river 82
## 5 bay%20village 82
## 6 brecksville 82
## 7 mayfield 82
## 8 berea 81
## 9 olmsted%20falls 81
## 10 strongsville 81
df_population<-data.frame(PEOPLE=c(1,2,3,4,5,6,7,8))
for (i in 1:10)
{
# Extract information from below website
webpage_popu<-getURL(gsub(" ","",paste('http://www.bestplaces.net/people/city/ohio/',best_cities$name[i])))
webpage_det <- readLines(tc <- textConnection(webpage_popu)); close(tc)
pagetree_popu <- htmlTreeParse(webpage_det, error=function(...){}, useInternalNodes = TRUE)
city_popu<-unlist(xpathSApply(pagetree_popu,"//td",xmlValue))
# Creating data frame on city population
df_population1<-matrix(city_popu[5:28],ncol=8, nrow=3)
df_population1<-as.data.frame(df_population1)
df_population1<-as.data.frame(t(df_population1))
df_population1<-df_population1[,-3]
df_population1<-unfactor(df_population1)
colnames(df_population1)<-df_population1[1,]
df_population<-cbind(df_population,df_population1)
}
df_population<-unfactor(df_population)
df_population<-df_population[,-c(1,4,6,8,10,12,14,16,18,20,22)]
df_population_format<-df_population[c(2,3),]
df_population_format<-t(df_population_format)
df_population_format<-data.frame(df_population_format)
colnames(df_population_format)<-c("current_population","population_2010")
df_population_format<-df_population_format[-1,]
df_population_format$current_population<-as.numeric(gsub(",","",(df_population_format$current_population)))
df_population_format$population_2010<-as.numeric(gsub(",","",(df_population_format$population_2010)))
df_population_format$city<-rownames(df_population_format)
# Calculate % chnage in population
df_population_format$Popu_change<-as.numeric(format(round(((df_population_format[,1]-df_population_format[,2])*100/df_population_format[,2]),2),nsmall = 2))
# Plot data in a chart--> 1
ggplot(data=df_population_format, aes(x=city, y=Popu_change,fill=city)) +geom_bar(colour="black", stat="identity")+ ggtitle("City population %change 2000-2010")+ geom_text(aes(label = Popu_change)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))## Warning: Stacking not well defined when ymin != 0
# Rank cities by change in population
df_population_format<-df_population_format %>% mutate(cityrank_population_change = rank(Popu_change, ties.method ="first"))
df_rank_consolidate<-df_population_format[,4:5]
df_rank_consolidate<-cbind(best_cities$name,df_rank_consolidate)
colnames(df_rank_consolidate)<-c("name","popu change","cityrank_population_change")df_population_density<-df_population[8,]
df_population_density<-t(df_population_density)
df_population_density<-as.data.frame(df_population_density)
df_population_density$city<-rownames(df_population_density)
colnames(df_population_density)<-c("density", "city")
df_population_density<-df_population_density[-1,]
df_population_density$density<-as.numeric(gsub(",","",df_population_density$density))
# plot city density data chart-->2
ggplot(data=df_population_density, aes(x=city, y=density,fill=city)) +geom_bar(colour="black", stat="identity")+ ggtitle("City population by density")+ geom_text(aes(label = density)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))# rank city with population density
df_population_density<-df_population_density %>% mutate(cityrank_density = rank(-density, ties.method ="first"))
# consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_population_density[,3])df_diversity_combine<-data.frame(popultion=c(),city=c(),USA=c(),city_name=c())
for (i in 1:10)
{
# Extract data from webpage
webpage_div<-getURL(gsub(" ","",paste('http://www.bestplaces.net/people/city/ohio/',best_cities$name[i])))
webpage_div <- readLines(tc <- textConnection(webpage_div)); close(tc)
pagetree_div <- htmlTreeParse(webpage_div, error=function(...){}, useInternalNodes = TRUE)
city_div<-unlist(xpathSApply(pagetree_div,"//td",xmlValue))
# creating table on population
df_diversity<-matrix(city_div[5:109],ncol=35, nrow=3)
df_diversity<-as.data.frame(df_diversity)
df_diversity_h<-df_diversity[1,]
df_diversity<-df_diversity[-c(1,3),]
df_diversity_combine<-rbind(df_diversity_combine,df_diversity)
}
# creating data frame for race diversity
df_diversity_race<-df_diversity_combine[,30:35]
df_diversity_race<-cbind(df_diversity_combine[,1],df_diversity_race)
df_diversity_race<-unfactor(df_diversity_race)
colnames(df_diversity_race)<-c("city","white","black","asian","native_american","hawaiian_pacific_Islander","other")
df_diversity_race$white<-gsub("%","",df_diversity_race$white)
df_diversity_race$black <-gsub("%","",df_diversity_race$black)
df_diversity_race$asian<-gsub("%","",df_diversity_race$asian)
df_diversity_race$native_american<-gsub("%","",df_diversity_race$native_american)
df_diversity_race$hawaiian_pacific_Islander<-gsub("%","",df_diversity_race$hawaiian_pacific_Islander)
df_diversity_race$other<-gsub("%","",df_diversity_race$other)
df_diversity_format<-df_diversity_race %>% gather(race,percentage,white:other)
df_diversity_format$percentage<-as.numeric(df_diversity_format$percentage)
# Plot city diversity data chart -->3
ggplot(data=df_diversity_format, aes(x=city, y=percentage, fill=race)) +
geom_bar(stat="identity") +ggtitle("City diversity by race") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + coord_flip()# Rank the city with diversity by using % of data for white population as reference
df_diversity_rank<-df_diversity_format[df_diversity_format$race=="white",]
df_diversity_rank<-df_diversity_rank %>% mutate(cityrank_divercity = rank(-percentage, ties.method ="first"))
# consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_diversity_rank[,4])df_diversity_age<-df_diversity_combine[,15:28]
df_diversity_age<-cbind(df_diversity_combine[,1],df_diversity_age)
colnames(df_diversity_age)<-c("city","0_4","5_9","10_14","15_17","18_20","21_24","25_34","35_44","45_54","55_59","60_64","65_74","75_84","85_over")
df_diversity_age$`0_4`<-gsub("%","",df_diversity_age$`0_4`)
df_diversity_age$`5_9`<-gsub("%","",df_diversity_age$`5_9`)
df_diversity_age$`10_14`<-gsub("%","",df_diversity_age$`10_14`)
df_diversity_age$`15_17`<-gsub("%","",df_diversity_age$`15_17`)
df_diversity_age$`18_20`<-gsub("%","",df_diversity_age$`18_20`)
df_diversity_age$`21_24`<-gsub("%","",df_diversity_age$`21_24`)
df_diversity_age$`25_34`<-gsub("%","",df_diversity_age$`25_34`)
df_diversity_age$`35_44`<-gsub("%","",df_diversity_age$`35_44`)
df_diversity_age$`45_54`<-gsub("%","",df_diversity_age$`45_54`)
df_diversity_age$`55_59`<-gsub("%","",df_diversity_age$`55_59`)
df_diversity_age$`60_64`<-gsub("%","",df_diversity_age$`60_64`)
df_diversity_age$`65_74`<-gsub("%","",df_diversity_age$`65_74`)
df_diversity_age$`75_84`<-gsub("%","",df_diversity_age$`75_84`)
df_diversity_age$`85_over`<-gsub("%","",df_diversity_age$`85_over`)
# Plot city data by age group chart -->4
df_diversity_age_format<-df_diversity_age %>% gather(age_grp,percentage, `0_4`:`85_over`)
df_diversity_age_format$percentage<-as.numeric(df_diversity_age_format$percentage)
ggplot(data=df_diversity_age_format, aes(x=city, y=percentage, fill=age_grp)) +
geom_bar(stat="identity") +ggtitle("City population by age") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ coord_flip()# Ranking the city on youth population %
df_diversity_rank_age<-df_diversity_age_format[df_diversity_age_format$age_grp=="25_34",]
df_diversity_rank_age<-df_diversity_rank_age %>% mutate(cityrank_youthfulness = rank(percentage, ties.method ="first"))
# Consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_diversity_rank_age[,4])df_mass_com<-data.frame(TRANSPORTATION=c(),Commute_Time=c())
for (i in 1:10)
{
# Extract data from the webpage
webpage_trans<-getURL(gsub(" ","",paste('http://www.bestplaces.net/transportation/city/ohio/',best_cities$name[i])))
webpage_trans <- readLines(tc <- textConnection(webpage_trans)); close(tc)
pagetree_trans <- htmlTreeParse(webpage_trans, error=function(...){}, useInternalNodes = TRUE)
city_trans<-unlist(xpathSApply(pagetree_trans,"//td",xmlValue))
df_trans<-matrix(city_trans[5:10],ncol=2, nrow=3)
df_trans<-as.data.frame(df_trans)
colnames(df_trans)<-c("TRANSPORTATION","Commute_Time")
df_trans<-df_trans[-c(1,3),]
df_mass_com<-rbind(df_mass_com,df_trans)
}
df_mass_com$TRANSPORTATION<-unfactor(df_mass_com$TRANSPORTATION)
df_mass_com$Commute_Time<-unfactor(df_mass_com$Commute_Time)
# plot commute time for different cities (same bar diagram as below) chart-->5
ggplot(data=df_mass_com, aes(x=TRANSPORTATION, y=Commute_Time,fill=TRANSPORTATION)) +geom_bar(colour="black", stat="identity")+ ggtitle("City Avg. Commute Time")+ geom_text(aes(label = Commute_Time)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))# Rank cities by avg commute time
df_mass_com<-df_mass_com %>% mutate(cityrank_commute = rank(-Commute_Time, ties.method ="first"))
# Consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_mass_com[,3])df_trans_combine<-data.frame(TRANSPORTATION=c(),Commute_Time=c(),COMMUTE_MODE=c(),Auto_alone=c(),Carpool=c(),Mass_Transit=c(),Bicycle=c(),Walk=c(),Work_at_Home=c())
for (i in 1:10)
{
# Extract data fro the webpage
webpage_trans_mass<-getURL(gsub(" ","",paste('http://www.bestplaces.net/transportation/city/ohio/',best_cities$name[i])))
webpage_trans_mass <- readLines(tc <- textConnection(webpage_trans_mass)); close(tc)
pagetree_trans_mass <- htmlTreeParse(webpage_trans_mass, error=function(...){}, useInternalNodes = TRUE)
city_trans_mass<-unlist(xpathSApply(pagetree_trans_mass,"//td",xmlValue))
df_trans_mass<-matrix(city_trans_mass[5:31],ncol=9, nrow=3)
df_trans_mass<-as.data.frame(df_trans_mass)
colnames(df_trans_mass)<-c("TRANSPORTATION","Commute_Time","COMMUTE_MODE","Auto_alone", "Carpool","Mass_Transit","Bicycle","Walk","Work_at_Home")
df_trans_mass<-df_trans_mass[-c(1,3),]
df_trans_combine<-rbind(df_trans_combine,df_trans_mass)
}
df_trans_combine<-df_trans_combine[,-3]
df_trans_combine<-unfactor(df_trans_combine)
df_trans_combine$Auto_alone<-gsub("%","",df_trans_combine$Auto_alone)
df_trans_combine$Carpool<-gsub("%","",df_trans_combine$Carpool)
df_trans_combine$Mass_Transit<-gsub("%","",df_trans_combine$Mass_Transit)
df_trans_combine$Bicycle<-gsub("%","",df_trans_combine$Bicycle)
df_trans_combine$Walk<-gsub("%","",df_trans_combine$Walk)
df_trans_combine$Work_at_Home<-gsub("%","",df_trans_combine$Work_at_Home)
df_trans_combine$Mass_Transit<-as.numeric(df_trans_combine$Mass_Transit)
# plot city mass transport data points chart -->6
ggplot(data=df_trans_combine, aes(x=TRANSPORTATION, y=Mass_Transit,fill=TRANSPORTATION)) +geom_bar(colour="black", stat="identity")+ ggtitle("City Avg. mass transportaion %")+ geom_text(aes(label = Mass_Transit)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))# Rank the city based on mass transit
df_trans_combine<-df_trans_combine %>% mutate(cityrank_mass = rank(Mass_Transit, ties.method ="first"))
# consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_trans_combine[,9])df_crime_combine<-data.frame(city=c(),violent_crime=c(),property_crime=c())
for (i in 1:10)
{
# Extracting data from the webpage
webpage_crime<-getURL(gsub(" ","",paste('http://www.bestplaces.net/crime/city/ohio/',best_cities$name[i])))
webpage_crime <- readLines(tc <- textConnection(webpage_crime)); close(tc)
pagetree_crime <- htmlTreeParse(webpage_crime, error=function(...){}, useInternalNodes = TRUE)
city_crime<-unlist(xpathSApply(pagetree_crime,"//td",xmlValue))
df_crime<-matrix(city_crime[5:13],ncol=3, nrow=3)
df_crime<-as.data.frame(df_crime)
df_crime<-df_crime[-c(1,3),]
colnames(df_crime)<-c("city","violent_crime","property_crime")
df_trans_mass<-df_trans_mass[-c(1,3),]
df_crime_combine<-rbind(df_crime_combine,df_crime)
}
df_crime_combine<-unfactor(df_crime_combine)
df_crime_format<-df_crime_combine %>% gather(crime_type,occurance,violent_crime:property_crime)
df_crime_format$occurance<-as.numeric(df_crime_format$occurance)
# Plot city crime rate data chart --> 7
ggplot(data=df_crime_format, aes(x=city, y=occurance, fill=crime_type)) +
geom_bar(stat="identity") +ggtitle("City crime by types") + theme(axis.text.x = element_text(angle = 90, hjust = 1))# Rank the data by crime rate
df_crime_city_rank<-df_crime_format[df_crime_format$crime_type=="violent_crime",]
df_crime_city_rank<-df_crime_city_rank%>% mutate(cityrank_crime = rank(-occurance, ties.method ="first"))
# Consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_crime_city_rank[,4])df_edu_combine<-data.frame()
for (i in 1:10)
{
# Extract data from the webpage
webpage_edu<-getURL(gsub(" ","",paste('http://www.bestplaces.net/education/city/ohio/',best_cities$name[i])))
webpage_edu <- readLines(tc <- textConnection(webpage_edu)); close(tc)
pagetree_edu <- htmlTreeParse(webpage_edu, error=function(...){}, useInternalNodes = TRUE)
city_edu<-unlist(xpathSApply(pagetree_edu,"//td",xmlValue))
df_edu<-matrix(city_edu[5:52],ncol=16, nrow=3)
df_edu<-as.data.frame(df_edu)
colnames(df_edu)<-unfactor(df_edu[1,])
df_edu<-df_edu[-c(1,3),]
df_edu_combine<-rbind(df_edu_combine,df_edu)
}
df_edu_stu<-df_edu_combine[,c(1,2,5)]
colnames(df_edu_stu)<-c("city","expenses","stu_teacher_ratio")
df_edu_stu$stu_teacher_ratio<-as.numeric(unfactor(df_edu_stu$stu_teacher_ratio))
# plot students per teacher ratio chart -->8
ggplot(data=df_edu_stu, aes(x=city, y=stu_teacher_ratio,fill=city)) +
geom_bar(stat="identity") +ggtitle("City student to teacher ratio") +geom_text(aes(label = stu_teacher_ratio)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))# Rank the city on this parameter
df_edu_stu<-df_edu_stu %>% mutate(cityrank_edu = rank(-stu_teacher_ratio, ties.method ="f"))
# Consolidate ranking with previous city parameters
df_rank_consolidate<-cbind(df_rank_consolidate,df_edu_stu[,4])Overall City score= sum(total rank score of following parameters(population change,population density,race diversity,% of youth population,avg commute time,mass transit %,crime rate, investment in education))*1.25
city score has been multiplied by 1.25 to make it adjusted in scale of 100
df_rank_consolidate<-df_rank_consolidate[,-2]
colnames(df_rank_consolidate)<-c("City","Population_change","Population_density","Race_diversity","Young_Population","Mass_Commute","Avg_Commute","Crime_rate","Student_teach_ratio")
df_rank_consolidate$Overall_score<-(df_rank_consolidate[,2]+df_rank_consolidate[,3]+df_rank_consolidate[,4]+df_rank_consolidate[,5]+df_rank_consolidate[,6]+df_rank_consolidate[,7]+df_rank_consolidate[,8]+df_rank_consolidate[,9])*1.25
df_rank_consolidate_format<-df_rank_consolidate %>% gather(Parameter,score, Population_change:Student_teach_ratio)
# Plot overall city score chart -->9
ggplot(data=df_rank_consolidate_format, aes(x=City, y=score,fill=Parameter)) +
geom_bar(stat="identity") +ggtitle(" Overall City Score on Liveablity Index") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +coord_flip() # load data from zillow
city_homeprice<-read.csv("C:/Users/Arindam/Documents/Data Science/Cuny/Data 607/Assignments/Project/City_MedianValuePerSqft_AllHomes.csv")
# Filter data by state OH
city_homeprice_OH<-city_homeprice[city_homeprice$State=="OH",]
best_cities$name<-gsub("%20"," ",best_cities$name)
# Get home price date for the 10 respective cities
city_id<-city_homeprice_OH$RegionID[tolower(city_homeprice_OH$RegionName)%in% best_cities$name]
home_price<-city_homeprice_OH[city_homeprice_OH$RegionID %in% city_id,]
home_price<-home_price[,c(2,245)]
# Analysis of score and home price rate
df_rank_consolidate$City<-gsub(", Ohio","",df_rank_consolidate$City)
df_rank_consolidate$City<-gsub("%20"," ",df_rank_consolidate$City)
df_rank_consolidate$price_per_SF<-home_price$X2016.02[df_rank_consolidate$City %in% tolower(home_price$RegionName)]city home price per sq ft/overall city score & city home price per sq ft/bestplaces.net score
# Comparing data against the Liveability score from bestplaces.net
df_rank_consolidate$City<-gsub(", Ohio","",df_rank_consolidate$City)
df_rank_consolidate$Best_net_score<-best_cities$score[tolower(df_rank_consolidate$City) %in% best_cities$name]
df_rank_consolidate$Price_by_Best_net_score<-as.numeric((df_rank_consolidate$price_per_SF)/as.numeric(unfactor(df_rank_consolidate$Best_net_score)))
# comparison of home price with respect to Bestplaces.net score chart -->10
ggplot(data=df_rank_consolidate, aes(x=City, y=Price_by_Best_net_score,fill=City)) +
geom_bar(stat="identity")+ggtitle("City per SQ FT Home Price by Bestplaces.net score") + theme(axis.text.x = element_text(angle = 90, hjust = 1))df_rank_consolidate$Price_by_score_index<-(df_rank_consolidate$price_per_SF/df_rank_consolidate$Overall_score)
df_rank_consolidate$Price_by_score_index<-format(round(df_rank_consolidate$Price_by_score_index,2),nsmall=2)
#comparison of home price with respect to Overall city score chart -->11
ggplot(data=df_rank_consolidate, aes(x=City, y=Price_by_score_index,fill=City)) +
geom_bar(stat="identity") +ggtitle("City per SQ FT Home Price by overall score") + theme(axis.text.x = element_text(angle = 90, hjust = 1))df_rank_consolidate$Price_by_Best_net_score<-format(round(df_rank_consolidate$Price_by_Best_net_score,2),nsmall=2)
df_rank_consolidate_com<-df_rank_consolidate[,c(1,13,14)]
df_rank_consolidate_com<-df_rank_consolidate_com %>% gather(score_type,score,Price_by_score_index:Price_by_Best_net_score)
df_rank_consolidate_com$score<-as.numeric(df_rank_consolidate_com$score)
# comparative chart to compare price per City score vs bestplaces.net score chart -->12
ggplot(data=df_rank_consolidate_com, aes(x=City, y=score,fill=score_type)) + geom_bar(stat="identity",position=position_dodge())+ ggtitle("Comparison of City per SQ FT Home Price by derived score index vs Bestplaces.net score") + theme(axis.text.x = element_text(angle = 90, hjust = 1))df_rank_consolidate$price_popu_change<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Population_change),2),nsmall=2))
df_rank_consolidate$price_popu_den<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Population_density),2),nsmall=2))
df_rank_consolidate$price_race_div<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Race_diversity),2),nsmall=2))
df_rank_consolidate$price_youth_per<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Young_Population),2),nsmall=2))
df_rank_consolidate$price_youth_per<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Young_),2),nsmall=2))
df_rank_consolidate$price_mass_commute<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Mass_Commute),2),nsmall=2))
df_rank_consolidate$price_avg_commute<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Avg_Commute),2),nsmall=2))
df_rank_consolidate$price_crimerate_per<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Crime_rate),2),nsmall=2))
df_rank_consolidate$price_stu_ratio<-as.numeric(format(round((df_rank_consolidate$price_per_SF/df_rank_consolidate$Student_teach_ratio),2),nsmall=2))
# create data frame
city_price_paramter<-df_rank_consolidate[,c(1,15,16,17,18)]
df_city_price_com<-city_price_paramter %>% gather(paramter,Price_per_sqt_per_score,price_popu_change:price_youth_per)
city_price_paramter_add<-df_rank_consolidate[,c(1,19,20,21,22)]
df_city_price_com_add<-city_price_paramter_add %>% gather(paramter,Price_per_sqt_per_score,price_mass_commute:price_stu_ratio)
# Plot city home price per sq ft by first four parameters data in chart -->13
ggplot(df_city_price_com, aes(x=City,y=Price_per_sqt_per_score)) +
geom_line(aes(group = paramter, colour = paramter)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))# Plot city home price per sq ft by rest four parameters in chart -->14
ggplot(df_city_price_com_add, aes(x=City,y=Price_per_sqt_per_score,fill=paramter)) +
geom_bar(stat="identity",position=position_dodge()) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) (1)Evaluation using overall score-
chart 9 , has the overall sore per city broken down by individual parameter score. Looking at the chart it appears Cleveland has the best overall score but has low score on crime rate. Storytelling has good overall score but has less score on educational investment and mass commute. This overall score will help to understand the livability factors of a city.
(2)Evaluation using city home price by overall score and bestcity.net index
chart 10,11- These two charts helps to understand how much is the median city home price per sq ft with respect to livability score. Lower the number on this better off home buyers to make good investment on homes.These two charts will help to understand if the price is high or low with respect to city livability conditions. Chart 12 has comparative bars of the price per score metrics from two different scores, one from bestplaces.net score and one derived in this study. That comparison will indicate how these two metrics are different. From this chart it appears that derive scores and the given scores are direction ally the same.
(3)Evaluation of city home price by each livability index
Chart 13,14- provides analysis of price data based on individual liability parameter. This chart will also help to understand the key factors that drive the home price.If the value is low that means that factor is not a dominant factor is deriving the price. If that factor is high that means it has major influence in determining the home price in that city.
** Following website has been used extensively to extract city data for this study http://www.bestplaces.net/
** Following website has been used to get average median home price per sq ft by cities www.zillow.com/research/data