## [1] "Random Forest"
Partial Dependence Plots (PDPs) are a visualization tool used in machine learning to understand the relationship between a specific feature (variable) and the predicted outcome of a model(it shows the marginal effect of one or two input features on the predictions of a machine learning model.) They provide insights into how the predicted outcome changes as the chosen feature varies while keeping other features constant.ref
Step 1. The independent variable or the set of independent variables for which the scenario is to be created are selected.
Step 2. For each variable, the best parametric distribution that fits the sample data of independent variables (predictors) is identifed using the Chi-squared goodness of fit and method of moments for parameter estimation
## [1] "Best distribution: normal"
## [1] "Best p-value: 1"
Step 4. After the best distribution(s) of the predictors(s) is identifed, for each state random sampling is implemented to obtain the base case values (BV).
## [1] 7.685040 7.752925 8.120584 7.814723 7.826803 8.152719 7.894961 7.540231
## [9] 7.659067 7.708637 8.051810 7.874182 7.882600 7.822980 7.685993 8.167486
## [17] 7.902552 7.396044 7.944377 7.703062
Step 5. According to the hypothesized scenario, the mean of the historical parametric distribution of the variable of interest is perturbed.
Normal Distribution:
For a normal distribution, the mean (μ) is the central location, and the standard deviation (σ) measures the spread of the distribution. A shift of +1 standard deviation would mean considering values that are one standard deviation above the mean.
Gamma Distribution:
The shape parameter (k) and the scale parameter (θ) characterize the gamma distribution. The mean is equal to kθ, and the standard deviation is equal to √(kθ^2).
Step 6. Using random sampling, new values are obtained from the new distribution with the shifed mean, which corresponds to the hypothesized scenario.
Te original values of the variable are then substituted by the new values corresponding to the scenario while keeping all the other variables same as original.
Step 7. Using the selected statistical learning model, the percentage of population reporting poor mental health are predicted for the new data set.
Step 8. Identify whether any signifcant nation-level and/or state-level increase or decrease in the response (compared to the original response variable) is observed or not