This data set is on rent cost of PBS building inventory over time. The variables used here are lease_annual_rent_amount (which is the anual rent cost) ,lease_ansi_rentable_sqft (the square footage of rentable space ),lease_usable_sqft (the usable space in square feet ),lse_surface_parking_spaces (the available surface parking space),lse_structured_parking_spaces (the avilable structured parking space ), and lessor_stat (the date when the leas began). The reason for choosing this data is simply to see how rent has changed over time. the data set is from General Services Administration.
the research question is to see how some variables impact rent cost.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggfortify)
lease <-read_csv("data_gov_lse_rexus.csv")
Rows: 7390 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): Lease Number, Current Expiration Date, Lease Initial Effective Dat...
dbl (5): Lease ANSI Rentable Sqft, Lease Usable Sqft, Lse Structured Parkin...
lgl (1): Lessor County
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Editing Column Names
names(lease) <-gsub(" ","_",names(lease)) # to substitute spaces with _names(lease) <-tolower(names(lease)) # to substitute capitalized letters in column names to lowercase
Cleaning
leasec <- lease |>filter(lessor_state %in%c("MD","DC","VA")) # filtering for the states needed for the project
leasecm <- leasec |>mutate(rent_per_sqft = lease_annual_rent_amount / lease_ansi_rentable_sqft) #adding a column that shows rent per square feet by deviding lease_annual_rent_amount by lease_ansi_rentable_sqft
Intercept: Predicted lease_annual_rent_amount when all predictors are 0 is -7.909e+04. This is just a mathematical calculation for the intercept, not realistic.
Coefficients (slope):
lease_ansi_rentable_sqft: Posative (3.811e+01), increased rent for larger rentable sqft. lse_surface_parking_spaces: Negative (-1.180e+03), lower rent for more surface parking spaces. lse_structured_parking_spaces: Positive (2.020e+03), increased rent for more structured parking spaces.
P-values: for all predictors where significant (p<0.05). lease_ansi_rentable_sqft P-value < 2e-16 lse_surface_parking_spaces P-value = 0.00278
lse_structured_parking_spaces P-value = 0.01416
models (F-Statistic) p-value: < 2.2e-16 meaning the predictors are associated with lease_annual_rent_amount
Adjusted R²: about 0.9024. This means about 90% of lease annual rent amount variance is explained by this model.
Call:
lm(formula = lease_annual_rent_amount ~ lease_ansi_rentable_sqft +
lse_surface_parking_spaces + lse_structured_parking_spaces,
data = leasecm)
Residuals:
Min 1Q Median 3Q Max
-16111394 -161858 21883 198141 6740366
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.909e+04 5.920e+04 -1.336 0.18206
lease_ansi_rentable_sqft 3.811e+01 5.292e-01 72.021 < 2e-16 ***
lse_surface_parking_spaces -1.180e+03 3.929e+02 -3.003 0.00278 **
lse_structured_parking_spaces 2.020e+03 8.212e+02 2.460 0.01416 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1290000 on 649 degrees of freedom
Multiple R-squared: 0.9028, Adjusted R-squared: 0.9024
F-statistic: 2010 on 3 and 649 DF, p-value: < 2.2e-16
autoplot(fit1, 1:4, nrow=2, ncol=2)
Warning: `fortify(<lm>)` was deprecated in ggplot2 3.6.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
leasecm_vl <- leasecm |>separate(lease_initial_effective_date, sep="-", into =c("day", "month", "year")) #learned it from https://forum.posit.co/t/converting-dates-year-month-day-to-3-separate-columns-year-month-day/8585/4
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
leasecm_vlf
# A tibble: 72 × 3
# Groups: year [27]
year lessor_state avg_anual_rent
<chr> <chr> <dbl>
1 1998 DC 3135017.
2 1999 DC 2508945.
3 2000 DC 7379352
4 2002 MD 18113.
5 2002 VA 818195.
6 2003 MD 1301567.
7 2003 VA 16137173.
8 2004 DC 1896940.
9 2004 MD 251985.
10 2005 DC 1734729.
# ℹ 62 more rows
scatter1 <-ggplot(data = leasecm_vlf, aes(x = year,y = avg_anual_rent,color = lessor_state)) +geom_point(aes(shape = lessor_state), alpha =0.8) +scale_color_manual(values =c("#eb6b34","purple","cyan4")) +geom_vline(xintercept ="2020", color ="black", linetype ="dotdash") +geom_text(x ="2020", y =5533333, label ="Start of lockdown for covid-19 in the U.S", size =1.7) +labs(title ="mean anual rent over time for PBS building inventory in the DMV",x ="lease initial effective date",y =" mean lease annual rent amount",color ="lessor state",caption="Source: General Services Administration") +theme_minimal()+theme(axis.text.x =element_text(angle =45)) scatter1
the visualization shows the mean lease anual rent amount over time. there are 3 colors and shapes to show the 3 states selected circle orange for DC, triangle purple for MD, and square cyan for VA. there is an anotation at 2020 to indicate when lockdown began for covid 19.
Grouped by state and sumarised for mean rent and mean rentable sqft
# A tibble: 3 × 3
lessor_state avg_anual_rent avg_rentable_sqft
<chr> <dbl> <dbl>
1 DC 3241097. 77626.
2 MD 1629105. 49769.
3 VA 1716782. 50900.
ggplot(leasecm_vs, aes(x = lessor_state, y =avg_anual_rent, fill = avg_rentable_sqft)) +geom_col(width =0.7) +labs( title ="Anual Rent for PBS building inventory by State and sqft DMV",x="lessor state",y="mean lease annual rent amount",caption="Source: General Services Administration",fill =" mean lease ansi rentable sqft") +scale_fill_gradient(low ="lightblue", high ="green", name ="mean lease ansi rentable sqft") +theme_minimal()
This visualisation shows mean lease anual ret amount for DC,MD, and VA tho color shifts from light blue(lower mean rentable square feet) to green (higher rentable square feet). This visualization shows that DC has both the highest mean anual rent cost and highest mean rentable square feet.
## Visualisations The First visualization shows the mean lease anual rent amount over timme for DC,MD, and VAwith an anotation at 2020 to indicate when lockdown began for covid The secong visualisation shows mean lease anual ret amount for DC,MD, and VA. It shows that DC has both the highest mean anual rent cost and highest mean rentable square feet.
## Cleaning
First changed the names of columns to lowercase and replaced spaces with _. Then filterd for DC,MD, and VA. Next selected variables that could be used and lastly adding a column that shows rent per square feet by deviding lease_annual_rent_amount by lease_ansi_rentable_sqft.
Future
It would be intresting to find a way to convert adresses to lat and long to make an interactive map
Source: General Services Administration : https://catalog.data.gov/dataset?publisher=General%20Services%20Administration https://www.smccd.edu/housing/other_resources.php https://forum.posit.co/t/converting-dates-year-month-day-to-3-separate-columns-year-month-day/8585/4