Svolgimento

Carico il dataset e faccio un pò di esplorazione

dati <- read.csv('realestate_texas.csv')
attach(dati)
head(dati)

##       city year month sales volume median_price listings months_inventory
## 1 Beaumont 2010     1    83 14.162       163800     1533              9.5
## 2 Beaumont 2010     2   108 17.690       138200     1586             10.0
## 3 Beaumont 2010     3   182 28.701       122400     1689             10.6
## 4 Beaumont 2010     4   200 26.819       123200     1708             10.6
## 5 Beaumont 2010     5   202 28.833       123100     1771             10.9
## 6 Beaumont 2010     6   189 27.219       122800     1803             11.1

sapply(dati, class)

##             city             year            month            sales 
##      "character"        "integer"        "integer"        "integer" 
##           volume     median_price         listings months_inventory 
##        "numeric"        "numeric"        "integer"        "numeric"

1. Analisi delle variabili

str(dati)

## 'data.frame':    240 obs. of  8 variables:
##  $ city            : chr  "Beaumont" "Beaumont" "Beaumont" "Beaumont" ...
##  $ year            : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ month           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sales           : int  83 108 182 200 202 189 164 174 124 150 ...
##  $ volume          : num  14.2 17.7 28.7 26.8 28.8 ...
##  $ median_price    : num  163800 138200 122400 123200 123100 ...
##  $ listings        : int  1533 1586 1689 1708 1771 1803 1857 1830 1829 1779 ...
##  $ months_inventory: num  9.5 10 10.6 10.6 10.9 11.1 11.7 11.6 11.7 11.5 ...

Sono presenti due variabili temporali mese e anno, che sarà utile unificare. È presente una variabile city che fa riferimento a sole quattre città. Tutte le altre variabili sono di tipo numeriche è tengono conto di informazioni utili per un’analisi delle vendite immobiliari.

2. Indici di posizione, variabilità e forma

Valuto i vari indici per le variabili e definiamo una funzione che printi i risultati:

indici <- function(var){
  cat("Indexes for the variable '", as.name(ensym(var)), "'\n")
  print(summary(var))
  cat("Standard deviation:", sd(var), "\n")
  cat("Skewness:", skewness(var), "\n")
  cat("Kurtosis:", kurtosis(var)-3, "\n")
  cat("CV:", sd(var)/abs(mean(var)), "\n")
}

2.1 City

freq_table <- function(dati, var){

  ft <- dati %>%
    group_by(!!ensym(var)) %>%
    summarise(count_class = n(), .groups = "drop") %>%
    ungroup() %>%
    mutate(count_tot = sum(count_class),
           freq_rel = count_class / count_tot)
  
  return(ft) 
}

freq_table(dati,city)

## # A tibble: 4 × 4
##   city                  count_class count_tot freq_rel
##   <chr>                       <int>     <int>    <dbl>
## 1 Beaumont                       60       240     0.25
## 2 Bryan-College Station          60       240     0.25
## 3 Tyler                          60       240     0.25
## 4 Wichita Falls                  60       240     0.25

Come ci si aspetta, visto che i dati sono riferiti allo stesso arco temporale (12 mesi * 5 anni = 60 records), abbiamo un numero di record uguale per ognuno delle città.

2.2 Sales

indici(sales)

## Indexes for the variable ' sales '
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    79.0   127.0   175.5   192.3   247.0   423.0 
## Standard deviation: 79.65111 
## Skewness: 0.718104 
## Kurtosis: -0.3131764 
## CV: 0.4142203

2.3 Volume

indici(volume)

## Indexes for the variable ' volume '
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   8.166  17.660  27.062  31.005  40.893  83.547 
## Standard deviation: 16.65145 
## Skewness: 0.884742 
## Kurtosis: 0.176987 
## CV: 0.5370536

2.4 Median Price

indici(median_price)

## Indexes for the variable ' median_price '
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   73800  117300  134500  132665  150050  180000 
## Standard deviation: 22662.15 
## Skewness: -0.3645529 
## Kurtosis: -0.6229618 
## CV: 0.1708218

2.5 Listings

indici(listings)

## Indexes for the variable ' listings '
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     743    1026    1618    1738    2056    3296 
## Standard deviation: 752.7078 
## Skewness: 0.6494982 
## Kurtosis: -0.79179 
## CV: 0.4330833

2.6 Months Inventory

indici(months_inventory)

## Indexes for the variable ' months_inventory '
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.400   7.800   8.950   9.193  10.950  14.900 
## Standard deviation: 2.303669 
## Skewness: 0.04097527 
## Kurtosis: -0.1744475 
## CV: 0.2506031

2.7 Considerazioni

Ha senso creare una colonna che incorpori mese è anno per ogni record:

dati <- dati %>%
  mutate(year_month_date = as.Date(paste(year, month, '01',  sep = "-")))

head(dati)

##       city year month sales volume median_price listings months_inventory
## 1 Beaumont 2010     1    83 14.162       163800     1533              9.5
## 2 Beaumont 2010     2   108 17.690       138200     1586             10.0
## 3 Beaumont 2010     3   182 28.701       122400     1689             10.6
## 4 Beaumont 2010     4   200 26.819       123200     1708             10.6
## 5 Beaumont 2010     5   202 28.833       123100     1771             10.9
## 6 Beaumont 2010     6   189 27.219       122800     1803             11.1
##   year_month_date
## 1      2010-01-01
## 2      2010-02-01
## 3      2010-03-01
## 4      2010-04-01
## 5      2010-05-01
## 6      2010-06-01

3. Identificazione delle variabili con maggiore variabilità e asimmetria

La variabile con maggior variabilità è quella che presenta un Coefficiente di Variazione CV maggiore, quindi, osservando i risultati sopra, si deduce essere volume. Ma è sempre la variabile volume che presenta l’asimmetria più elevata poiché, tra tutte, pesenta un valore di Skewness maggiore (in particola asimmetria positiva).

4. Creazione di classi per una variabile quantitativa

Viene richiesto di scegliere una variabile quantitativa e suddividerla in classi, scelgo la variabule sales

dati$sales_class <- cut(sales, breaks = seq(min(sales) - (min(sales) %% 50), max(sales) + 50, by = 50))
freq_table(dati,sales_class)

## # A tibble: 8 × 4
##   sales_class count_class count_tot freq_rel
##   <fct>             <int>     <int>    <dbl>
## 1 (50,100]             21       240   0.0875
## 2 (100,150]            72       240   0.3   
## 3 (150,200]            56       240   0.233 
## 4 (200,250]            32       240   0.133 
## 5 (250,300]            34       240   0.142 
## 6 (300,350]            13       240   0.0542
## 7 (350,400]             9       240   0.0375
## 8 (400,450]             3       240   0.0125

calcolo il coefficiente di gini normalizzato per la nuova variabile

gini.index <- function(var){
  n <- length(var)
  
  ni = table(var) 
  fi = ni/n 
  fi2 = fi^2 
  J = length(table(var)) 
  
  gini = 1 - sum(fi2)
  gini.normalizzato = gini/((J-1)/J)
  return(gini.normalizzato)
}

cat("Indice di gini per sales_class:", gini.index(dati$sales_class), "\n")

## Indice di gini per sales_class: 0.9206349

che indica un’alta eterogeneita delle della classe, diamo una rapida occhiata grafica

ggplot(dati, aes(x = sales_class)) +
  geom_bar(fill = 'lightblue', color = "black") +
  theme_minimal() +
  labs(x = "Classi numero di vendite",
       y = "Conteggio",
       title = "Distribuzione del numero di vendite mensili")

5. Calcolo della probabilità

Creo una funzione che calcoli la probabilità di estrazione di un record con delle caratteristiche

probabilita <- function(col, val){
  mean(col == val)
}

cat("Probabilità che una righa riporti la città di Beaumont: ", probabilita(city, "Beaumont")*100, "%")

## Probabilità che una righa riporti la città di Beaumont:  25 %

cat("Probabilità che una righa riporti il mese di luglio: ", probabilita(month, 7)*100, "%")

## Probabilità che una righa riporti il mese di luglio:  8.333333 %

cat("Probabilità che una righa riporti dicembre 2012: ", probabilita(dati$year_month_date, "2012-12-01")*100, "%" )

## Probabilità che una righa riporti dicembre 2012:  1.666667 %

6. Creazione di nuove variabili

Ci viene richiesto di creare due nuove colonne che rappresentino: prezzo medio ed efficacia degli annunci di vendita. Partiamo dal primo, esso è dato dal:

prezzo medio di vendita = (volume entrate)/(numero vendite)

dati <- dati %>%
  mutate(mean_price = volume/sales*1000000)

Mentre l’efficacia degli annunci potrebbe essere calcolata:

efficacia di vendita = (numero vendite)/(numero annunci)

dati <- dati %>%
  mutate(sales_eff = sales/listings)

head(dati)

##       city year month sales volume median_price listings months_inventory
## 1 Beaumont 2010     1    83 14.162       163800     1533              9.5
## 2 Beaumont 2010     2   108 17.690       138200     1586             10.0
## 3 Beaumont 2010     3   182 28.701       122400     1689             10.6
## 4 Beaumont 2010     4   200 26.819       123200     1708             10.6
## 5 Beaumont 2010     5   202 28.833       123100     1771             10.9
## 6 Beaumont 2010     6   189 27.219       122800     1803             11.1
##   year_month_date sales_class mean_price  sales_eff
## 1      2010-01-01    (50,100]   170626.5 0.05414220
## 2      2010-02-01   (100,150]   163796.3 0.06809584
## 3      2010-03-01   (150,200]   157697.8 0.10775607
## 4      2010-04-01   (150,200]   134095.0 0.11709602
## 5      2010-05-01   (200,250]   142737.6 0.11405985
## 6      2010-06-01   (150,200]   144015.9 0.10482529

7. Analisi condizionata

Facciamo statistica condizionata per le città, anno e mese.

Definisco una funzione che, per ogni variabile, restituisca i summary per le variabili rispetto a città, anno e mese

summary <- function(dati, var) {
  
  summary_city <- dati %>%
    group_by(city) %>%
    summarise(mean = mean({{var}}),
              sd = sd({{var}}))
  
  summary_year <- dati %>%
    group_by(year) %>%
    summarise(mean = mean({{var}}),
              sd = sd({{var}}))
  
  summary_month <- dati %>%
    group_by(month) %>%
    summarise(mean = mean({{var}}),
              sd= sd({{var}}))
  
  return(list(
    city  = summary_city,
    year  = summary_year,
    month = summary_month
  ))
}

ed una funzione che plotti dei i risultati in boxplot

plot_boxplots <- function(dati, var) {
  
  var_name <- deparse(substitute(var))
  
  plot_cities <- ggplot(dati, aes(x = city, y = {{ var }}, fill = city)) +
    geom_boxplot(fill = "skyblue") +
    labs(x = "Città", y = var_name) +
    theme_minimal()
  
  plot_years <- ggplot(dati, aes(x = factor(year), y = {{ var }})) +
    geom_boxplot(fill = "skyblue") +
    labs(x = "Anno", y = var_name) +
    theme_minimal()
  
  plot_months <- ggplot(dati, aes(x = factor(month), y = {{ var }})) +
    geom_boxplot(fill = "skyblue") +
    labs(x = "Mese", y = var_name) +
    theme_minimal()
  
  return(list(
    plot_city = plot_cities,
    plot_year = plot_years,
    plot_month = plot_months
  ))
}

7.1 Sales

sales_summary <- summary(dati,sales)
sales_summary

## $city
## # A tibble: 4 × 3
##   city                   mean    sd
##   <chr>                 <dbl> <dbl>
## 1 Beaumont               177.  41.5
## 2 Bryan-College Station  206.  85.0
## 3 Tyler                  270.  62.0
## 4 Wichita Falls          116.  22.2
## 
## $year
## # A tibble: 5 × 3
##    year  mean    sd
##   <int> <dbl> <dbl>
## 1  2010  169.  60.5
## 2  2011  164.  63.9
## 3  2012  186.  70.9
## 4  2013  212.  84.0
## 5  2014  231.  95.5
## 
## $month
## # A tibble: 12 × 3
##    month  mean    sd
##    <int> <dbl> <dbl>
##  1     1  127.  43.4
##  2     2  141.  51.1
##  3     3  189.  59.2
##  4     4  212.  65.4
##  5     5  239.  83.1
##  6     6  244.  95.0
##  7     7  236.  96.3
##  8     8  231.  79.2
##  9     9  182.  72.5
## 10    10  180.  75.0
## 11    11  157.  55.5
## 12    12  169.  60.7

sales_plot <- plot_boxplots(dati,sales)
sales_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.2 Volume

volume_summary <- summary(dati,volume)
volume_summary

## $city
## # A tibble: 4 × 3
##   city                   mean    sd
##   <chr>                 <dbl> <dbl>
## 1 Beaumont               26.1  6.97
## 2 Bryan-College Station  38.2 17.2 
## 3 Tyler                  45.8 13.1 
## 4 Wichita Falls          13.9  3.24
## 
## $year
## # A tibble: 5 × 3
##    year  mean    sd
##   <int> <dbl> <dbl>
## 1  2010  25.7  10.8
## 2  2011  25.2  12.2
## 3  2012  29.3  14.5
## 4  2013  35.2  17.9
## 5  2014  39.8  21.2
## 
## $month
## # A tibble: 12 × 3
##    month  mean    sd
##    <int> <dbl> <dbl>
##  1     1  19.0  8.37
##  2     2  21.7 10.1 
##  3     3  29.4 12.0 
##  4     4  33.3 14.5 
##  5     5  39.7 19.0 
##  6     6  41.3 21.1 
##  7     7  39.1 21.4 
##  8     8  38.0 18.0 
##  9     9  29.6 15.2 
## 10    10  29.1 15.1 
## 11    11  24.8 11.2 
## 12    12  27.1 12.6

volume_plot <- plot_boxplots(dati,volume)
volume_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.3 Median Price

median_summary <- summary(dati,median_price)
median_summary

## $city
## # A tibble: 4 × 3
##   city                     mean     sd
##   <chr>                   <dbl>  <dbl>
## 1 Beaumont              129988. 10105.
## 2 Bryan-College Station 157488.  8852.
## 3 Tyler                 141442.  9337.
## 4 Wichita Falls         101743. 11320.
## 
## $year
## # A tibble: 5 × 3
##    year    mean     sd
##   <int>   <dbl>  <dbl>
## 1  2010 130192. 21822.
## 2  2011 127854. 21318.
## 3  2012 130077. 21432.
## 4  2013 135723. 21708.
## 5  2014 139481. 25625.
## 
## $month
## # A tibble: 12 × 3
##    month   mean     sd
##    <int>  <dbl>  <dbl>
##  1     1 124250 25151.
##  2     2 130075 22823.
##  3     3 127415 23442.
##  4     4 131490 21458.
##  5     5 134485 18796.
##  6     6 137620 19231.
##  7     7 134750 21945.
##  8     8 136675 22488.
##  9     9 134040 24344.
## 10    10 133480 26358.
## 11    11 134305 24691.
## 12    12 133400 22810.

median_plot <- plot_boxplots(dati,median_price)
median_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.4 Listings

listings_summary <- summary(dati,listings)
listings_summary

## $city
## # A tibble: 4 × 3
##   city                   mean    sd
##   <chr>                 <dbl> <dbl>
## 1 Beaumont              1679.  91.1
## 2 Bryan-College Station 1458. 253. 
## 3 Tyler                 2905. 227. 
## 4 Wichita Falls          910.  73.8
## 
## $year
## # A tibble: 5 × 3
##    year  mean    sd
##   <int> <dbl> <dbl>
## 1  2010 1826   785.
## 2  2011 1850.  780.
## 3  2012 1777.  738.
## 4  2013 1678.  744.
## 5  2014 1560.  707.
## 
## $month
## # A tibble: 12 × 3
##    month  mean    sd
##    <int> <dbl> <dbl>
##  1     1 1647.  705.
##  2     2 1692.  711.
##  3     3 1757.  727.
##  4     4 1826.  770.
##  5     5 1824.  790.
##  6     6 1833.  812.
##  7     7 1821.  827.
##  8     8 1786.  816.
##  9     9 1749.  803.
## 10    10 1710.  779.
## 11    11 1653.  741.
## 12    12 1558.  693.

listings_plot <- plot_boxplots(dati,listings)
listings_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.5 Months Inventory

months_inventory_summary <- summary(dati,months_inventory)
months_inventory_summary

## $city
## # A tibble: 4 × 3
##   city                   mean    sd
##   <chr>                 <dbl> <dbl>
## 1 Beaumont               9.97 1.65 
## 2 Bryan-College Station  7.66 2.25 
## 3 Tyler                 11.3  1.89 
## 4 Wichita Falls          7.82 0.781
## 
## $year
## # A tibble: 5 × 3
##    year  mean    sd
##   <int> <dbl> <dbl>
## 1  2010  9.97  2.08
## 2  2011 10.9   2.07
## 3  2012  9.88  1.61
## 4  2013  8.15  1.69
## 5  2014  7.06  1.75
## 
## $month
## # A tibble: 12 × 3
##    month  mean    sd
##    <int> <dbl> <dbl>
##  1     1  8.84  1.97
##  2     2  9.06  1.98
##  3     3  9.40  2.06
##  4     4  9.72  2.24
##  5     5  9.68  2.38
##  6     6  9.70  2.41
##  7     7  9.62  2.50
##  8     8  9.39  2.45
##  9     9  9.18  2.52
## 10    10  8.94  2.44
## 11    11  8.66  2.37
## 12    12  8.12  2.27

months_inventory_plot <- plot_boxplots(dati,months_inventory)
months_inventory_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.6 Mean Price

mean_price_summary <- summary(dati,mean_price)
mean_price_summary

## $city
## # A tibble: 4 × 3
##   city                     mean     sd
##   <chr>                   <dbl>  <dbl>
## 1 Beaumont              146640. 11232.
## 2 Bryan-College Station 183534. 15149.
## 3 Tyler                 167677. 12351.
## 4 Wichita Falls         119430. 11398.
## 
## $year
## # A tibble: 5 × 3
##    year    mean     sd
##   <int>   <dbl>  <dbl>
## 1  2010 150189. 23280.
## 2  2011 148251. 24938.
## 3  2012 150899. 26438.
## 4  2013 158705. 26524.
## 5  2014 163559. 31741.
## 
## $month
## # A tibble: 12 × 3
##    month    mean     sd
##    <int>   <dbl>  <dbl>
##  1     1 145640. 29819.
##  2     2 148840. 25120.
##  3     3 151137. 23238.
##  4     4 151461. 26174.
##  5     5 158235. 25787.
##  6     6 161546. 23470.
##  7     7 156881. 27220.
##  8     8 156456. 28253.
##  9     9 156522. 29669.
## 10    10 155897. 32527.
## 11    11 154233. 29685.
## 12    12 154996. 27009.

mean_price_plot <- plot_boxplots(dati,mean_price)
mean_price_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

7.7 Sales Eff.

sales_eff_summary <- summary(dati,sales_eff)
sales_eff_summary

## $city
## # A tibble: 4 × 3
##   city                    mean     sd
##   <chr>                  <dbl>  <dbl>
## 1 Beaumont              0.106  0.0267
## 2 Bryan-College Station 0.147  0.0729
## 3 Tyler                 0.0935 0.0235
## 4 Wichita Falls         0.128  0.0247
## 
## $year
## # A tibble: 5 × 3
##    year   mean     sd
##   <int>  <dbl>  <dbl>
## 1  2010 0.0997 0.0337
## 2  2011 0.0927 0.0232
## 3  2012 0.110  0.0281
## 4  2013 0.135  0.0448
## 5  2014 0.157  0.0618
## 
## $month
## # A tibble: 12 × 3
##    month   mean     sd
##    <int>  <dbl>  <dbl>
##  1     1 0.0831 0.0230
##  2     2 0.0878 0.0219
##  3     3 0.116  0.0346
##  4     4 0.125  0.0380
##  5     5 0.141  0.0503
##  6     6 0.142  0.0576
##  7     7 0.143  0.0740
##  8     8 0.142  0.0526
##  9     9 0.112  0.0348
## 10    10 0.112  0.0360
## 11    11 0.102  0.0293
## 12    12 0.117  0.0379

sales_eff_plot <- plot_boxplots(dati,sales_eff)
sales_eff_plot

## $plot_city

## 
## $plot_year

## 
## $plot_month

8 Creazione di visualizzazioni con ggplot2

Creo dei plot per le variabili.

8.1 Sales

agg_sales_1 <- dati %>%
  group_by(city, month) %>%
  summarise(total_sales = sum(sales), .groups = "drop") %>%
  ungroup() %>%
  mutate(month = factor(month, levels = 1:12))

p1 <- ggplot(agg_sales_1, aes(x = month, y = total_sales, fill = city)) +
  geom_col(color = "black") +
  theme_minimal() +
  labs(x = "Mese", y = "Numero di vendite",
       title = "Numero vendite per mensilita e città")

p2 <- ggplot(agg_sales_1, aes(x = month, y = total_sales, fill = city)) +
  geom_col(color = "black", position = "fill") +
  theme_minimal() +
  labs(title = "Vendite normalizzate per mensilita e città", x = "Mese", y = "Frazione vendite")

p1

p2

si osserva un aumento degli acquisti nei mesi da maggio ad agosto, soprattutto per le città di Tyler e Bryan-College Station. È da indagare se questo è dovuto ad un maggior numero di annunci in questi mesi, oppure ad un sales_eff maggiore.

agg_sales_2 <- dati %>%
  group_by(year, month) %>%
  summarise(total_sales = sum(sales), .groups = "drop") %>%
  ungroup() %>%
  mutate(month = factor(month, levels = 1:12))

ggplot(agg_sales_2, aes(x = month, y = total_sales, group = year, color = factor(year))) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  labs(x = "Mese",
       y = "Numero di vendite ",
       color = "Anni",
       title = "Andamento delle vendite per mese negli anno")

sono stati aggregati gli acquisti di case per singolo anno, si osserva, come gia fatto notare, come nei mesi centrali ci sia un aumento delle vendite, e che questo incremento è piu evidenti negli ultimi due anni (2013 e 2014).

agg_sales_3 <- dati %>%
  group_by(year,city) %>%
  summarise(total_sales = sum(sales), .groups = "drop")

  ggplot(agg_sales_3, aes(x = year, y = total_sales, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
   labs(x = "Anno",
       y = "Vendite",
       color = "Citta",
       title = "Vendite medie annuali per città")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

da quest’ultimo grafico si osserva che negli anni c’è in generale un aumento delle vendite di case (per Bryan-Colleage Station in 5 anni c’è stato un aumento del più del 50% delle vendite), fatta eccezione per la sola città di Wichita Falls che è rimasta con vendite costanti.

8.2 Volume

agg_volume <- dati %>%
  group_by(year, city) %>%
  summarise(total_volume = sum(volume), .groups = "drop")

ggplot(agg_volume, aes(x = year, y = total_volume, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(
    x = "Anno",
    y = "Volume totale vendite (in milioni di $)",
    color = "Città",
    title = "Andamento del volume delle vendite per anno e citta"
  )

Il volume delle vendite annue negli anni è in evidente crescita per tutte le città eccetto per Whicita Falls.

8.3 Median Price

ggplot(dati, aes(x = city, y = median_price)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Prezzo mediano d'acquisto nei 5 anni", x = "Città", y = "Prezzo Mediano") +
  theme_minimal()

la città di Bryan-College Station presenta una mediana d’acquisto maggiore tra tutte, mentre la più basse è per città di Wichita Falls.

agg_media_prive_1 <- dati %>%
  group_by(year, city) %>%
  summarise(total_median_price = mean(median_price), .groups = "drop")

ggplot(agg_media_prive_1, aes(x = year, y = total_median_price, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(x = "Anno",
       y = "Prezzo Mediano d'acquisto abitaizone (in $)",
       color = "Città",
       title = "Prezzo mediano aggregato per anno e città")

l’andamento del prezzo mediano negli anni per le città di Bryan-College Station e Tyler mostra un andamento crescente, mentre per le città di Beaumont e Wichita Falls non sembrano esserci incrementi evidenti negli anni.

8.4 Listings

agg_listings_1 <- dati %>%
  group_by(city, year) %>%
  summarise(total_listings = sum(listings), .groups = "drop")

ggplot(agg_listings_1, aes(x = year, y = total_listings, fill = city)) +
  geom_col(color = "black") +
  theme_minimal() +
  labs(x = "Anno", y = "Numero di vendite",
       fill = "Città",
       title = "Numero vendite per anno e città")

il numero di annunci attivi (a fine mese) è in diminuzione, questo potrebbe essere dovuto a sales eff crescente negli anni

agg_listings_2 <- dati %>%
  group_by(city, year) %>%
  summarise(total_listing = mean(listings),
            sales_eff_mean = mean(sales_eff),
            .groups = "drop")

coeff <- max(agg_listings_2$total_listing) / max(agg_listings_2$sales_eff_mean)

ggplot(agg_listings_2, aes(x = year)) +
  geom_line(aes(y = total_listing, color = "Listing"), size = 1) +
  geom_line(aes(y = sales_eff_mean * coeff, color = "Sales Efficiency"), 
            size = 1) +
  geom_point(aes(y = total_listing, color = "Listing"), size = 2) +
  geom_point(aes(y = sales_eff_mean * coeff, color = "Sales Efficiency"), size = 2) +
  scale_y_continuous(
    name = "Totale Listing",
    sec.axis = sec_axis(~ . / coeff, name = "Sales Efficiency (media annua)")
  ) +
  scale_color_manual(values = c("Listing" = "blue", "Sales Efficiency" = "red")) +
  facet_wrap(~ city) +
  theme_minimal() +
  labs(title = "Andamento Listings e Sales Efficiency per Città",
       x = "Anno",
       color = "")

nelle città dove c’è un incremento della sales eff è evidente una diminuzione del numero di annunci attivi. Nell’unica citta dove la sales eff risulta costante nei 5 anni (Wichita Falls) rimane costante anche il numero di annunci attivi a fine mese.

8.5 Months Inventory

agg_mon_inv_1 <- dati %>%
  group_by(city, year) %>%
  summarise(total_mon_inv = mean(months_inventory),
            .groups = "drop")

ggplot(agg_mon_inv_1, aes(x = year, y = total_mon_inv, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(x = "Anno",
       y = "Months Inventory medio per anno",
       color = "Città",
       title = "Months Inventory per anno e città")

negli anni è aumentata la capacità di vendita degli immobili fatta accezione, ancora una volta per la città di Wichita Falls.

8.6 Mean Price

agg_mean_price_1 <- dati %>%
  group_by(city, year) %>%
  summarise(mean_mean_price = mean(mean_price),
            .groups = "drop")

ggplot(agg_mean_price_1, aes(x = year, y = mean_mean_price, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(x = "Anno",
       y = "Prezzo medio per anno",
       color = "Città",
       title = "Prezzo medio d'acquisto degli immobili per anno e città")

8.7 Sales_eff

agg_sales_eff_1 <- dati %>%
  group_by(city, year) %>%
  summarise(mean_sales_eff = mean(sales_eff),
            .groups = "drop")

ggplot(agg_sales_eff_1, aes(x = year, y = mean_sales_eff, color = city, group = city)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
    labs(x = "Anno",
       y = "Sales Efficiency",
       color = "Citta",
       title = "Media annuale della Sales Efficiency per città")

spicca su tutta un’efficacia di vendite di immobili per la città di Bryan-Collage Station, andamento crescente più contenuto per le città di Tyler e Beaumont, mentre rimane costante per la città di Wichita Falls.

9 Conclusioni

Si posso fare le seguenti osservazioni:

C’è in generale un aumento degli acquisti di immobili negli anni e questo trend è trainato dalle città di Tyler e Bryan-College Station. Questo aumento d’acquisto non sembra dovuto ad un maggior numero d’annunci in circolo, ma ad annunci più efficaci (spicca in particolare la città di Bryan-College Station).
Sempre per queste due città si osserva un prezzo mediano e un prezzo medio in costante crescita negli anni.
Il prezzo medio d’acquisto per le citta di Beaumont e Wichita Falls sembrano rimanere costanti.
In particolare la città di Wichita Falls vede una stagnazione di tutte le variabili nel quinquennio.
Infine, risulta evidente una correlazione delle vendite di immobili con la stagionalita.

Progetto: Analisi Esplorativa del Mercato Immobiliare del Texas

2025-09-06