# load data
library(tidyverse)
library(openintro)
housing <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Data_606/main/housing.csv"))
Is there a relationship between median income and median house value?
Each case represents a house in California districts during the 1990 U.S. Census.
This data was collected as a part of the 1990 U.S. Census by the United States Census Bureau.
This is an observational study.
This data is collected by the United States Census Bureau. The data was initially used in Pace, R. Kelley, and Ronald Barry. “Sparse spatial autoregressions.” Statistics & Probability Letters 33.3 (1997): 291-297. The dataset can be found on Kaggle: https://www.kaggle.com/camnugent/california-housing-prices
The dependent variable is the median income, and is numerical.
The independent variable is median house value, and is numerical.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(housing$median_income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4999 2.5634 3.5348 3.8707 4.7432 15.0001
hist(housing$median_income, main = "Median Income", xlab = "Median Income", ylab = "")
summary(housing$median_house_value)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14999 119600 179700 206856 264725 500001
hist(housing$median_income, main = "Median House Value", xlab = "Median House Value", ylab = "")
ggplot(housing, aes(x= housing$median_income, y=housing$median_house_value)) + geom_point() + geom_smooth()
## Warning: Use of `housing$median_income` is discouraged. Use `median_income`
## instead.
## Warning: Use of `housing$median_house_value` is discouraged. Use
## `median_house_value` instead.
## Warning: Use of `housing$median_income` is discouraged. Use `median_income`
## instead.
## Warning: Use of `housing$median_house_value` is discouraged. Use
## `median_house_value` instead.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'