Suggested citation:
Mendez C. (2020). Bivariate distribution dynamics analysis in R. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/tutorial-bivariate-distribution-dynamics
This work is licensed under the Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License.

Acknowledgment:
Material adapted from multiple sources, in particular the dataset is from Magrini (2007).
Replication files
The tutorial is self-contained. No aditional file is needed.
If you are a member of the QuaRCS lab, you can run this tutorial in R Studio Cloud
Tutorial objectives
Study the dynamics of univariate densities
Compute the bandwidth of a density
Study mobility plots
Study bi-variate densities
Study density-based clustering methods
Study conditional bi-variate densities
Import data
We will use two hypothetical cross-sectional series.
- The first (
x
) series was produced by drawing a random sample of 1000 observations from a univariate normal distribution.
- The second (
y
) series was produced by merging and sorting two random samples of 500 observations.
The mean and the standard deviation of these two series respectively matched those of the logarithm of per capita Gross Value Added observed for the Italian Provinces in 1996 and in 2002. For this reason assume that the analysis has been performed over a 6-year time period.
Descriptive statistics
skim_type |
skim_variable |
n_missing |
complete_rate |
numeric.mean |
numeric.sd |
numeric.p0 |
numeric.p25 |
numeric.p50 |
numeric.p75 |
numeric.p100 |
numeric.hist |
numeric |
x |
0 |
1 |
14883.41 |
3746.62 |
3344.50 |
12362.51 |
14974.61 |
17321.7 |
26506.5 |
▁▃▇▃▁ |
numeric |
y |
0 |
1 |
16441.00 |
4011.14 |
7400.07 |
12992.51 |
16244.71 |
20021.6 |
25394.5 |
▂▇▅▇▂ |
numeric |
log_x |
0 |
1 |
9.57 |
0.28 |
8.12 |
9.42 |
9.61 |
9.8 |
10.2 |
▁▁▂▇▃ |
numeric |
log_y |
0 |
1 |
9.68 |
0.25 |
8.91 |
9.47 |
9.70 |
9.9 |
10.1 |
▁▃▇▇▇ |
numeric |
rel_x |
0 |
1 |
1.00 |
0.25 |
0.22 |
0.83 |
1.01 |
1.2 |
1.8 |
▁▃▇▃▁ |
numeric |
rel_y |
0 |
1 |
1.00 |
0.24 |
0.45 |
0.79 |
0.99 |
1.2 |
1.5 |
▂▇▅▇▂ |
numeric |
rel_log_x |
0 |
1 |
1.00 |
0.03 |
0.85 |
0.98 |
1.00 |
1.0 |
1.1 |
▁▁▂▇▃ |
numeric |
rel_log_y |
0 |
1 |
0.99 |
0.03 |
0.84 |
0.97 |
0.99 |
1.0 |
1.1 |
▁▁▂▇▃ |
Univariate dynamics
Select bandwiths
select bandwidth based on function dpik
from the package KernSmooth
[1] 0.065
[1] 0.039
Plot each density


Plot both densities
Method 1
Keep the orignal bandwiths of the package KernSmooth
densities_plot <- densities %>%
ggplot()+
theme_minimal()+
geom_line(aes(domain_initial, density_initial))+
geom_line(aes(domain_final,density_final), linetype = "dashed")+
labs(subtitle = "",
x = "Relative Variable",
y = "Density") +
geom_label(
label="Year 2002",
x= 1.525,
y= 0.9,
label.size = 0.35,
color = "black",
) +
geom_label(
label="Year 1996",
x= 1.75,
y= 0.2,
label.size = 0.35,
color = "black",
)
densities_plot

Note that you have adjust the labels manually
- Interactive plotly version
Manual labels are not yet implemented in the ggplotly
function
Method 2
using the bandwidth default of ggplot

Using plotly
Bivariate density
Mobility scatterplot

Fit a non-linear function

Not that the nonlinear fit crosses the 45-degree line two times from above.
Using geom_pointdensity

Using the KernSmooth package

Interactive

Interactive version
Using the Bivariate package


Using ggplot (stat_density_2d())


Density-based clusters
An S4 object of class "pdfCluster"
Call: pdfCluster(x = dat[, 5:6])
Initial groupings:
label 1 2 NA
count 296 297 407
Final groupings:
label 1 2
count 504 496
Groups tree (here 'h' denotes 'height'):
--[dendrogram w/ 1 branches and 2 members at h = 1]
`--[dendrogram w/ 2 branches and 2 members at h = 0.593]
|--leaf "1 "
`--leaf "2 " (h= 0.152 )
Core clusters

Full clustering

Cluster tree

Mode function

Conditional density analysis
Using the hdrcde
package
Increase the number of intervals to 60


High density regions


Using the np
package
Compute adaptive bandwith based on cross-validation
Conditional density data (1000 observations, 2 variable(s))
(1 dependent variable(s), and 1 explanatory variable(s))
Bandwidth Selection Method: Maximum Likelihood Cross-Validation
Formula: dat$rel_y ~ dat$rel_x
Bandwidth Type: Adaptive Nearest Neighbour
Objective Function Value: 5502 (achieved on multistart 1)
Exp. Var. Name: dat$rel_x Bandwidth: 2
Dep. Var. Name: dat$rel_y Bandwidth: 2
Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
No. Continuous Dependent Vars.: 1
Estimation Time: 68 seconds
Compute conditional density object
Conditional Density Data: 1000 training points, in 2 variable(s)
(1 dependent variable(s), and 1 explanatory variable(s))
dat$rel_y
Dep. Var. Bandwidth(s): 2
dat$rel_x
Exp. Var. Bandwidth(s): 2
Bandwidth Type: Adaptive Nearest Neighbour
Log Likelihood: 5948
Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
No. Continuous Dependent Vars.: 1

References
Magrini, S. (2007). Analysing convergence through the distribution dynamics approach: why and how?. University Ca’Foscari of Venice, Dept. of Economics Research Paper Series No, 13.
Mendez C. (2020). Classical sigma and beta convergence analysis in R: Using the REAT 2.1 Package. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/classical-convergence-reat21
Mendez C. (2020). Univariate distribution dynamics in R: Using the ggridges package. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/univariate-distribution-dynamics
Mendez, C. (2020) Regional efficiency convergence and efficiency clusters. Asia-Pacific Journal of Regional Science, 1-21.
Mendez, C. (2019). Lack of Global Convergence and the Formation of Multiple Welfare Clubs across Countries: An Unsupervised Machine Learning Approach. Economies, 7(3), 74.
Mendez, C. (2019). Overall efficiency, pure technical efficiency, and scale efficiency across provinces in Indonesia 1990 and 2010. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/efficiency-clusters-indonesia-1990-2010
Mendez-Guerra, C. (2018). On the distribution dynamics of human development: Evidence from the metropolitan regions of Bolivia’’. Economics Bulletin, 38(4), 2467-2475.
END
