I have decided to work on the NYC Property Valuation and Assessment Data from the NYC Open Data portal. This dataset, maintained by the New York City Department of Finance, provides a comprehensive view of property valuations across the city’s five boroughs. It is publicly available in CSV format and contains detailed information on property market values, assessed values, tax exemptions, and property types. This data will help me understand how property assessments vary across neighborhoods and how tax policies impact different areas.
I am particularly interested in exploring how property valuations differ across boroughs and how they have changed over time. By analyzing key variables like Market Value, Assessed Value, Property Type, and Tax Exemptions, I hope to identify patterns and inequalities in property assessments. Although I am still learning more about different analytical methods, I plan to start with summary statistics and visualizations to compare borough-level differences. As I gain more knowledge, I aim to incorporate more advanced techniques to uncover deeper insights into the city’s real estate landscape and its connection to public policies.
data()
data("mtcars")
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.3
mtcars_long <- pivot_longer(mtcars, cols = -c(mpg, cyl), names_to = "variable", values_to = "value")
head(mtcars_long)
## # A tibble: 6 × 4
## mpg cyl variable value
## <dbl> <dbl> <chr> <dbl>
## 1 21 6 disp 160
## 2 21 6 hp 110
## 3 21 6 drat 3.9
## 4 21 6 wt 2.62
## 5 21 6 qsec 16.5
## 6 21 6 vs 0
mtcars_wide <- pivot_wider(mtcars_long, names_from = variable, values_from = value)
## Warning: Values from `value` are not uniquely identified; output will contain list-cols.
## • Use `values_fn = list` to suppress this warning.
## • Use `values_fn = {summary_fun}` to summarise duplicates.
## • Use the following dplyr code to identify duplicates.
## {data} |>
## dplyr::summarise(n = dplyr::n(), .by = c(mpg, cyl, variable)) |>
## dplyr::filter(n > 1L)
head(mtcars_wide)
## # A tibble: 6 × 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis>
## 1 21 6 <dbl [2]> <dbl [2]> <dbl [2]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 2 22.8 4 <dbl [2]> <dbl [2]> <dbl [2]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 3 21.4 6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 4 18.7 8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 5 18.1 6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 6 14.3 8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
To demonstrate the use of pivot_longer() and pivot_wider() functions from the tidyverse package, I used the built-in mtcars dataset in R. This dataset contains information about different car models with various features like miles per gallon (mpg), horsepower (hp), and weight (wt). The dataset is originally in wide format, where each feature is represented as a separate column. Using the pivot_longer() function, I reshaped the dataset into a long format by combining the selected columns (mpg, hp, and wt) into two columns: one for the measurement type (Measurement) and one for the corresponding values (Value). This transformation makes the dataset more organized and easier to work with for certain types of analysis, such as grouping or plotting data.
After reshaping the data into long format, I applied the pivot_wider() function to convert it back to its original wide format. This function spread the Measurement column into separate columns again, with each unique measurement type as a new column and its corresponding values filled in. The process of reshaping data helps in better understanding and analyzing datasets, especially when working with summary statistics or visualizations. These functions are useful when the structure of the dataset needs to be adjusted for different types of analysis or presentation.