HOUSE SALES IN KING COUNTY, SEATTLE
This dataset includes sale prices for houses in King County, Seattle between May 2014 and May 2015. The columns include zipcode, longitude, latitude, square footage, number of bedrooms and bathrooms, etc. I will be investigating which features have an influence on the average price of the house. Some of the columns I will be using include:
1. price
2. condition-how good the overall condition is on a scale of 1-5
3. grade-overall grade based on King County grading system
4. view-whether the house has been viewed
5. sqft_living-square footage of the house
6. zipcode
7. waterfront(where 0=no view and 1=view)
source: https://www.kaggle.com/harlfoxem/housesalesprediction
DESCRIPTIVE STATISTICS
Data type
Head
Tail
Row and column count
Column Names
## id int64
## date object
## price float64
## bedrooms int64
## bathrooms float64
## sqft_living int64
## sqft_lot int64
## floors float64
## waterfront int64
## view int64
## condition int64
## grade int64
## sqft_above int64
## sqft_basement int64
## yr_built int64
## yr_renovated int64
## zipcode int64
## lat float64
## long float64
## sqft_living15 int64
## sqft_lot15 int64
## dtype: object
## id date price ... long sqft_living15 sqft_lot15
## 0 7129300520 20141013T000000 221900.0 ... -122.257 1340 5650
## 1 6414100192 20141209T000000 538000.0 ... -122.319 1690 7639
## 2 5631500400 20150225T000000 180000.0 ... -122.233 2720 8062
## 3 2487200875 20141209T000000 604000.0 ... -122.393 1360 5000
## 4 1954400510 20150218T000000 510000.0 ... -122.045 1800 7503
##
## [5 rows x 21 columns]
## id date ... sqft_living15 sqft_lot15
## 21608 263000018 20140521T000000 ... 1530 1509
## 21609 6600060120 20150223T000000 ... 1830 7200
## 21610 1523300141 20140623T000000 ... 1020 2007
## 21611 291310100 20150116T000000 ... 1410 1287
## 21612 1523300157 20141015T000000 ... 1020 1357
##
## [5 rows x 21 columns]
## (21613, 21)
## Index(['id', 'date', 'price', 'bedrooms', 'bathrooms', 'sqft_living',
## 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade',
## 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode',
## 'lat', 'long', 'sqft_living15', 'sqft_lot15'],
## dtype='object')
CLEANING UP DATA
Are there any NAs in each column?
## id 0
## date 0
## price 0
## bedrooms 0
## bathrooms 0
## sqft_living 0
## sqft_lot 0
## floors 0
## waterfront 0
## view 0
## condition 0
## grade 0
## sqft_above 0
## sqft_basement 0
## yr_built 0
## yr_renovated 0
## zipcode 0
## lat 0
## long 0
## sqft_living15 0
## sqft_lot15 0
## dtype: int64
Convert the data type of price from float to integer for formatting purposes and print type again to make sure the price has been converted to integer
## id int64
## date object
## price int32
## bedrooms int64
## bathrooms float64
## sqft_living int64
## sqft_lot int64
## floors float64
## waterfront int64
## view int64
## condition int64
## grade int64
## sqft_above int64
## sqft_basement int64
## yr_built int64
## yr_renovated int64
## zipcode int64
## lat float64
## long float64
## sqft_living15 int64
## sqft_lot15 int64
## dtype: object
PLOTS
1. Create bar chart showing top ten zip codes with highest average selling price
Conclusion: the most expensive zip code in King County was 98039 (with an average price over $2 million) followed by 98004 and 98040.
Conclusion: houses with a higher grade sold for more, especially those that had been viewed 3 or 4 times. Overall, all the variables studied seemed to have an infuence on the selling price of houses in King County (zipcode, sq_ft living space, grade, condition, waterfront view, and how many views it had). This type of analysis would be helpful for homeowners who are looking to put a value on their house. However, I believe that other variables that can also be included for further research include the number of bedrooms, bathrooms, floors, and whether the house has been renovated.
```