Introduction


Real estate prices all over Australia has been on an upward trend but Sydney especially has had a boom in the real estate industry in the recent years. I aim to understand a bit more about the house pricing trend in Sydney through this analysis. In the Price Trend page, the housing price spike is clearly evident with a big increase in a short period. Here we can see a strong positive linear relationship between housing prices and time. We also notice that very expensive homes are being sold less and are very easily noticed outliers in the market data.

In the suburb Prices page, I have grouped the sales data by suburbs in order to analyse the prices of houses per suburb. The most expensive vs cheapest suburbs are highlighted. Also a correlation matrix based off the numeric factors of the realestate data has been shown.

The underlying dataset for this dashboard is around 200,000 rows of real estate data for sydney from the year 2000 till 2019.

Data Sources :

Price Trend

Real Estate Trend Over Time

Suburb Prices

Row

Most Expensive Suburbs

Cheapest Suburbs

Column

Correlation Graph

---
title: "Math 2404 Assignment 3: Story telling with Open Data"
author: "Srihari Nair 3961467"
output: 
  flexdashboard::flex_dashboard:
    source_code: embed
    storyboard: true
---

```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(corrplot)
library(dplyr)
library(ggplot2)
library(scales)
library(ggmap)
library(tidyverse)
library(dygraphs)
```

```{r}
Sydneydata<- read.csv("SydneyHousePrices.csv")
Sydneydata$Date <- as.Date(Sydneydata$Date)
Sydneydata.onlyNumeric <- Sydneydata[,unlist(lapply(Sydneydata, is.numeric))]

AverageSaleprice= Sydneydata %>% filter(Date>'2000-01-01')%>%
   group_by(month = lubridate::floor_date(Date, 'month')) %>%
    summarize(AvgSale=mean(sellPrice))

TopsuburbData = Sydneydata %>% 
  group_by(suburb) %>% 
  summarise(AvgSuburbPrice = round(median(sellPrice),0)) %>%
  arrange(desc(AvgSuburbPrice))%>%
  top_n(20, wt = AvgSuburbPrice)
BottomsuburbData= Sydneydata %>%
  group_by(suburb) %>% 
  summarise(AvgSuburbPrice = round(median(sellPrice),0)) %>%
  arrange(AvgSuburbPrice)%>%
  top_n(-20, wt = AvgSuburbPrice)
```

Introduction
=====================================

---

Real estate prices all over Australia has been on an upward trend but Sydney especially has had a boom in the real estate industry in the recent years. I aim to understand a bit more about the house pricing trend in Sydney through this analysis. 
In the Price Trend page, the housing price spike is clearly evident with a big increase in a short period. Here we can see a strong positive linear relationship between housing prices and time. We also notice that very expensive homes are being sold less and are very easily noticed outliers in the market data. 

In the suburb Prices page, I have grouped the sales data by suburbs in order to analyse the prices of houses per suburb. The most expensive vs cheapest suburbs are highlighted. Also a correlation matrix based off the numeric factors of the realestate data has been shown. 

The underlying dataset for this dashboard is around 200,000 rows of real estate data for sydney from the year 2000 till 2019.



Data Sources :

* www.kaggle.com. (n.d.). Sydney House Prices. [online] Available at: https://www.kaggle.com/datasets/mihirhalai/sydney-house-prices.

* ggplot2.tidyverse.org. (n.d.). Points — geom_point. [online] Available at: https://ggplot2.tidyverse.org/reference/geom_point.html.

* pkgs.rstudio.com. (n.d.). Using flexdashboard. [online] Available at: https://pkgs.rstudio.com/flexdashboard/articles/using.html#storyboards [Accessed 8 Dec. 2022]


Price Trend
===================================== 


### Real Estate Trend Over Time


```{r}
ggplot(AverageSaleprice, aes(x = month, y = AvgSale)) + geom_point(colour='red')+
  labs(y = 'Average Sale Price', x = 'Date', title = 'Average Sale Price Trend since 2000')+
  scale_y_continuous(labels =comma)+
  coord_flip()

```


Suburb Prices
===================================== 

Row {.tabset}
-------------------------------------

### Most Expensive Suburbs

```{r}
ggplot(TopsuburbData, aes(x = reorder(suburb, AvgSuburbPrice), 
                       y = AvgSuburbPrice)) +
  geom_bar(stat='identity',colour="white", fill = "sky blue") +
  labs(x = 'Suburb', y = 'Average Price', title = 'Top 20 Most Expensive Suburb') +
  coord_flip() +
  theme_bw() +
  scale_y_continuous(labels =comma)
```

### Cheapest Suburbs

```{r}
ggplot(BottomsuburbData, aes(x = reorder(suburb, AvgSuburbPrice), 
                       y = AvgSuburbPrice)) +
  geom_bar(stat='identity',colour="white", fill = "red") +
  labs(x = 'Suburb', y = 'Average Price', title = 'Top 20 Cheapest Suburbs') +
  coord_flip() +
  theme_bw() +
  scale_y_continuous(labels =comma)
```

Column
-------------------------------------

### Correlation Graph

```{r}
corrplot(cor(Sydneydata.onlyNumeric[complete.cases(Sydneydata.onlyNumeric),]))

```