Part I

I have decided to work on the NYC Property Valuation and Assessment Data from the NYC Open Data portal. This dataset, maintained by the New York City Department of Finance, provides a comprehensive view of property valuations across the city’s five boroughs. It is publicly available in CSV format and contains detailed information on property market values, assessed values, tax exemptions, and property types. This data will help me understand how property assessments vary across neighborhoods and how tax policies impact different areas.

I am particularly interested in exploring how property valuations differ across boroughs and how they have changed over time. By analyzing key variables like Market Value, Assessed Value, Property Type, and Tax Exemptions, I hope to identify patterns and inequalities in property assessments. Although I am still learning more about different analytical methods, I plan to start with summary statistics and visualizations to compare borough-level differences. As I gain more knowledge, I aim to incorporate more advanced techniques to uncover deeper insights into the city’s real estate landscape and its connection to public policies.

Part II

Loading in-built dataset

Datasets avaible

data()

Load the data set

data("mtcars")

Display the data set

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.3

Pivot the dataset to long format

mtcars_long <- pivot_longer(mtcars, cols = -c(mpg, cyl), names_to = "variable", values_to = "value")

Display the long format dataset

head(mtcars_long)
## # A tibble: 6 × 4
##     mpg   cyl variable  value
##   <dbl> <dbl> <chr>     <dbl>
## 1    21     6 disp     160   
## 2    21     6 hp       110   
## 3    21     6 drat       3.9 
## 4    21     6 wt         2.62
## 5    21     6 qsec      16.5 
## 6    21     6 vs         0

Pivot the dataset to wide format

mtcars_wide <- pivot_wider(mtcars_long, names_from = variable, values_from = value)
## Warning: Values from `value` are not uniquely identified; output will contain list-cols.
## • Use `values_fn = list` to suppress this warning.
## • Use `values_fn = {summary_fun}` to summarise duplicates.
## • Use the following dplyr code to identify duplicates.
##   {data} |>
##   dplyr::summarise(n = dplyr::n(), .by = c(mpg, cyl, variable)) |>
##   dplyr::filter(n > 1L)

Display the wide format dataset

head(mtcars_wide)
## # A tibble: 6 × 11
##     mpg   cyl disp      hp        drat      wt     qsec  vs    am    gear  carb 
##   <dbl> <dbl> <list>    <list>    <list>    <list> <lis> <lis> <lis> <lis> <lis>
## 1  21       6 <dbl [2]> <dbl [2]> <dbl [2]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 2  22.8     4 <dbl [2]> <dbl [2]> <dbl [2]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 3  21.4     6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 4  18.7     8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 5  18.1     6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 6  14.3     8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>

To demonstrate the use of pivot_longer() and pivot_wider() functions from the tidyverse package, I used the built-in mtcars dataset in R. This dataset contains information about different car models with various features like miles per gallon (mpg), horsepower (hp), and weight (wt). The dataset is originally in wide format, where each feature is represented as a separate column. Using the pivot_longer() function, I reshaped the dataset into a long format by combining the selected columns (mpg, hp, and wt) into two columns: one for the measurement type (Measurement) and one for the corresponding values (Value). This transformation makes the dataset more organized and easier to work with for certain types of analysis, such as grouping or plotting data.

After reshaping the data into long format, I applied the pivot_wider() function to convert it back to its original wide format. This function spread the Measurement column into separate columns again, with each unique measurement type as a new column and its corresponding values filled in. The process of reshaping data helps in better understanding and analyzing datasets, especially when working with summary statistics or visualizations. These functions are useful when the structure of the dataset needs to be adjusted for different types of analysis or presentation.