Data Preparation

# load data
library(tidyverse)
library(openintro)
housing <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Data_606/main/housing.csv"))

Research question

Is there a relationship between median income and median house value?

Cases

Each case represents a house in California districts during the 1990 U.S. Census.

Data collection

This data was collected as a part of the 1990 U.S. Census by the United States Census Bureau.

Type of study

This is an observational study.

Data Source

This data is collected by the United States Census Bureau. The data was initially used in Pace, R. Kelley, and Ronald Barry. “Sparse spatial autoregressions.” Statistics & Probability Letters 33.3 (1997): 291-297. The dataset can be found on Kaggle: https://www.kaggle.com/camnugent/california-housing-prices

Dependent Variable

The dependent variable is the median income, and is numerical.

Independent Variable

The independent variable is median house value, and is numerical.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(housing$median_income)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4999  2.5634  3.5348  3.8707  4.7432 15.0001
hist(housing$median_income, main = "Median Income", xlab = "Median Income", ylab = "")

summary(housing$median_house_value)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14999  119600  179700  206856  264725  500001
hist(housing$median_income, main = "Median House Value", xlab = "Median House Value", ylab = "")

ggplot(housing, aes(x= housing$median_income, y=housing$median_house_value)) + geom_point() + geom_smooth()
## Warning: Use of `housing$median_income` is discouraged. Use `median_income`
## instead.
## Warning: Use of `housing$median_house_value` is discouraged. Use
## `median_house_value` instead.
## Warning: Use of `housing$median_income` is discouraged. Use `median_income`
## instead.
## Warning: Use of `housing$median_house_value` is discouraged. Use
## `median_house_value` instead.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'