Data Preparation

# load data

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

My research question is how does the legacy of redlining influence current demographic and socioeconomic conditions, particularly in regards to housing, wealth inequality, and access to resources?

Cases

What are the cases, and how many are there?

The cases in this study are metropolitan areas within the United States, with each area representing a separate case. The exact number of cases will depend on the specific metropolitan areas included in the analysis.

Data collection

Describe the method of data collection.

The data for this study will be collected from publicly available sources, including the Mapping Inequality project, which provides historical data on redlining practices in the United States, and the Federal Reserve Economic Data (FRED) website, which offers economic and demographic data. As well as data which may be sourced from FiveThirtyEight’s repository on redlining.

Type of study

What type of study is this (observational/experiment)?

I would consider this study as observational in nature, as it aims to analyze existing data on redlining practices and their long-term effects on demographic and socioeconomic conditions.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The data sources include:

Mapping Inequality project: https://github.com/fivethirtyeight/data/tree/master/redlining

FiveThirtyEight’s redlining repository: https://projects.fivethirtyeight.com/redlining/

Federal Reserve Economic Data (FRED): https://fred.stlouisfed.org/series/MSPUS

Dependent Variable

What is the response variable? Is it quantitative or qualitative?

The response variable is likely to be multifaceted, encompassing various aspects of demographic and socioeconomic conditions, such as housing prices, wealth distribution, and access to resources. These variables can be both quantitative and qualitative, depending on the specific measures used in the analysis. The main response variable that I will focus on is quantitative, which is the rate of return or appreciation on real estate properties in these different metropolitan areas.

Independent Variable(s)

The independent variables include historical redlining grades assigned by the Home Owners Loan Corporation (HOLC) from 1935-40, as well as contemporary demographic and economic indicators.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.