Araneta et. al.

library(readxl)
library(sjPlot)

## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

library(ggplot2)

dataver2 <- read_excel("C:/Users/Dell/Documents/analysis tasks/araneta/dataver2.xlsx")

Audit <- as.factor(dataver2$`Audit Finding Type`)
prevalence <- as.factor(dataver2$`Prevalence - Freq. of Audit Finding`)

# 1
lm1 <- lm(dataver2$`FHI - Debt-to-Equity Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm1)

## 
## Call:
## lm(formula = dataver2$`FHI - Debt-to-Equity Ratio` ~ Audit + 
##     prevalence, data = dataver2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.22071 -0.18543 -0.08197  0.04399  2.96116 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.515003   0.149691   3.440 0.000719 ***
## Audit2       -0.041712   0.163431  -0.255 0.798833    
## Audit3       -0.121894   0.162109  -0.752 0.453061    
## Audit4        0.024251   0.136351   0.178 0.859031    
## Audit5       -0.138682   0.172038  -0.806 0.421225    
## Audit6       -0.022530   0.161809  -0.139 0.889415    
## prevalence2  -0.148374   0.089163  -1.664 0.097809 .  
## prevalence3  -0.131097   0.122243  -1.072 0.284941    
## prevalence4  -0.008254   0.167827  -0.049 0.960828    
## prevalence5  -0.145229   0.173302  -0.838 0.403118    
## prevalence6  -0.212771   0.170175  -1.250 0.212785    
## prevalence7  -0.131136   0.197128  -0.665 0.506739    
## prevalence8   0.018956   0.309304   0.061 0.951199    
## prevalence9  -0.348642   0.233087  -1.496 0.136439    
## prevalence10  0.723992   0.309304   2.341 0.020324 *  
## prevalence11 -0.285911   0.309304  -0.924 0.356511    
## prevalence12 -0.142066   0.278125  -0.511 0.610108    
## prevalence13 -0.070348   0.363731  -0.193 0.846855    
## prevalence15 -0.155884   0.492131  -0.317 0.751792    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4688 on 183 degrees of freedom
## Multiple R-squared:  0.08937,    Adjusted R-squared:  -0.0002041 
## F-statistic: 0.9977 on 18 and 183 DF,  p-value: 0.4644

Analysis on Debt to equity ratio and Audit + prevalence as independent variable

The intercept is highly statistically significant (p-value < 0.001, indicated by ***). This means that the estimated value of Debt to equity ratio and Audit when all other predictors are at their reference level (or zero for numerical predictors) is significantly different from zero.

All Audit coefficients (Audit2 to Audit6) have very high p-values (e.g., Audit2 p-value = 0.79883, Audit6 p-value = 0.889415). None of the Audit levels are statistically significant predictors of Debt to equity ratio and Audit compared to their reference level. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients are still not statistically significant. Prevalence2: Estimate = -0.148374, Pr(>|t|) = 0.097809. The coefficient Prevalence 2 is borderline significant at the 0.10 level (p-value = 0.097809). It suggests a potential negative association between being in prevalence2 and Debt to equity ratio and Audit (a decrease of about 0.148 units in Debt to equity ratio and Audit), but it’s not significant at the conventional 0.05 level.

Prevalence10: Estimate = 0.723992, Pr(>|t|) = 0.020324 This Prevalence10 coefficient is statistically significant at the 0.05 level (indicated by ). Being in the prevalence10 category is associated with an increase of approximately 0.724 units in Debt to equity ratio and Audit compared to the reference prevalence category, holding other variables constant.

ll other prevalence levels (prevalence3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15) are not statistically significant at conventional levels.

# 2 
lm2 <- lm(dataver2$`FHI - Quick Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm2)

## 
## Call:
## lm(formula = dataver2$`FHI - Quick Ratio` ~ Audit + prevalence, 
##     data = dataver2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4792 -1.0440 -0.4197  0.0946 22.8309 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.66727    1.19357   1.397  0.16414   
## Audit2       -0.37287    1.30314  -0.286  0.77510   
## Audit3        0.43022    1.29259   0.333  0.73964   
## Audit4        0.27199    1.08721   0.250  0.80273   
## Audit5        0.56209    1.37177   0.410  0.68247   
## Audit6        0.38867    1.29020   0.301  0.76357   
## prevalence2   0.26021    0.71095   0.366  0.71479   
## prevalence3   0.74075    0.97472   0.760  0.44826   
## prevalence4  -0.68143    1.33818  -0.509  0.61121   
## prevalence5  -0.36645    1.38184  -0.265  0.79116   
## prevalence6  -0.43937    1.35691  -0.324  0.74646   
## prevalence7  -0.50283    1.57182  -0.320  0.74941   
## prevalence8  -0.66439    2.46627  -0.269  0.78793   
## prevalence9   0.15880    1.85854   0.085  0.93200   
## prevalence10  7.00349    2.46627   2.840  0.00503 **
## prevalence11  0.11795    2.46627   0.048  0.96191   
## prevalence12 -0.07104    2.21766  -0.032  0.97448   
## prevalence13 -1.00400    2.90025  -0.346  0.72961   
## prevalence15 -0.47768    3.92406  -0.122  0.90325   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.738 on 183 degrees of freedom
## Multiple R-squared:  0.06891,    Adjusted R-squared:  -0.02268 
## F-statistic: 0.7524 on 18 and 183 DF,  p-value: 0.753

Analysis on Quick Ratio as dependent variable and Audit + prevalence as Independent variable

All Audit coefficients (Audit2 to Audit6) have very high p-values (e.g., Audit2 p-value = 0.77510, Audit6 p-value = 0.76357). This indicates that none of the Audit levels, compared to their reference level (which is not explicitly shown but implied by the dummy coding), are statistically significant predictors of FHI in this model. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients also have high p-values, suggesting they are not statistically significant.However, prevalence10 stands out. Estimate = 7.00349, Std. Error = 2.46627, t value = 2.840, Pr(>|t|) = 0.00503. This coefficient is statistically significant at the 0.01 level. This means that, holding other variables constant, being in the prevalence10 category is associated with an increase of approximately 7.00349 units in FHI compared to the reference prevalence category. This is the most notable finding in the provided output.

Significant Predictor: The only statistically significant predictor in this output is prevalence10, which shows a positive association with FHI - Quick ratio. The Audit variables and most other prevalence variables do not appear to be statistically significant predictors of FHI in this model.

lm3 <- lm(dataver2$`FHI - Current Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm3)

## 
## Call:
## lm(formula = dataver2$`FHI - Current Ratio` ~ Audit + prevalence, 
##     data = dataver2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7229 -0.8009 -0.2690  0.5650  3.3526 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.65179    0.35103   4.706 4.98e-06 ***
## Audit2        0.27654    0.38325   0.722   0.4715    
## Audit3        0.42769    0.38015   1.125   0.2620    
## Audit4        0.19697    0.31975   0.616   0.5386    
## Audit5        0.30117    0.40343   0.747   0.4563    
## Audit6        0.42957    0.37945   1.132   0.2591    
## prevalence2   0.33092    0.20909   1.583   0.1152    
## prevalence3   0.30144    0.28666   1.052   0.2944    
## prevalence4  -0.10945    0.39356  -0.278   0.7812    
## prevalence5   0.41688    0.40640   1.026   0.3063    
## prevalence6  -0.05857    0.39907  -0.147   0.8835    
## prevalence7  -0.09407    0.46227  -0.203   0.8390    
## prevalence8   0.63456    0.72533   0.875   0.3828    
## prevalence9   1.13141    0.54660   2.070   0.0399 *  
## prevalence10  1.08230    0.72533   1.492   0.1374    
## prevalence11  0.56696    0.72533   0.782   0.4354    
## prevalence12  0.92501    0.65221   1.418   0.1578    
## prevalence13 -0.55823    0.85296  -0.654   0.5136    
## prevalence15  0.74252    1.15406   0.643   0.5208    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.099 on 183 degrees of freedom
## Multiple R-squared:  0.07602,    Adjusted R-squared:  -0.01486 
## F-statistic: 0.8365 on 18 and 183 DF,  p-value: 0.6556

Analysis on FHI - Current Ratio and Audit + prevalence as independent variable

The intercept is highly statistically significant (p < 0.001). This means that the expected value of FHI is significantly different from zero when all other predictors are at their reference levels (or zero for Current Ratio).

All Audit coefficients (Audit2 to Audit6) continue to show high p-values (e.g., Audit2 p-value = 0.4715, Audit6 p-value = 0.2591). None of the Audit levels are statistically significant predictors of Current Ratio compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients are not statistically significant. However, prevalence9: Estimate = 1.13141, Std. Error = 0.54660, t value = 2.070, Pr(>|t|) = 0.0399 . This coefficient is statistically significant at the 0.05 level (indicated by *). Being in the prevalence9 category is associated with an increase of approximately 1.131 units in FHI compared to the reference prevalence category, holding other variables constant. This is a new significant predictor not observed in the previous models.

Prevalence10 (Estimate = 1.08230, Pr(>|t|) = 0.1374) is no longer statistically significant in this model, unlike the previous two analyses where it was a strong predictor. This is a key change. All other prevalence levels are not statistically significant at conventional levels.

The p-value of 0.6556 is very high (much greater than 0.05). This means that the overall regression model is NOT statistically significant. We cannot conclude that any of the independent variables, as a group, have a significant effect on Current Ratio.

lm4 <- lm(dataver2$`FHI (GF) - Expenditure-to-Revenue Ratio` ~ Audit + prevalence, data =  dataver2)
summary(lm4)

## 
## Call:
## lm(formula = dataver2$`FHI (GF) - Expenditure-to-Revenue Ratio` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2104 -0.1873 -0.0789  0.0358  4.3525 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.085364   0.239775   4.527 1.08e-05 ***
## Audit2       -0.067240   0.261785  -0.257    0.798    
## Audit3       -0.005137   0.259667  -0.020    0.984    
## Audit4       -0.024773   0.218409  -0.113    0.910    
## Audit5       -0.208584   0.275572  -0.757    0.450    
## Audit6       -0.049003   0.259186  -0.189    0.850    
## prevalence2  -0.046126   0.142822  -0.323    0.747    
## prevalence3   0.052942   0.195810   0.270    0.787    
## prevalence4  -0.064441   0.268826  -0.240    0.811    
## prevalence5  -0.214132   0.277596  -0.771    0.441    
## prevalence6  -0.107465   0.272588  -0.394    0.694    
## prevalence7   0.219793   0.315761   0.696    0.487    
## prevalence8  -0.222857   0.495446  -0.450    0.653    
## prevalence9  -0.143976   0.373361  -0.386    0.700    
## prevalence10 -0.197205   0.495446  -0.398    0.691    
## prevalence11 -0.113256   0.495446  -0.229    0.819    
## prevalence12 -0.221976   0.445503  -0.498    0.619    
## prevalence13 -0.174069   0.582626  -0.299    0.765    
## prevalence15 -0.275885   0.788299  -0.350    0.727    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7509 on 183 degrees of freedom
## Multiple R-squared:  0.02119,    Adjusted R-squared:  -0.07508 
## F-statistic: 0.2201 on 18 and 183 DF,  p-value: 0.9997

Analysis on Expenditure-to-Revenue Ratio` ~ Audit + prevalence as independent variable

The intercept is highly statistically significant (p < 0.001). This means that the estimated value of FHI is significantly different from zero when all other predictors are at their reference levels (or zero for Expenditure-to-Revenue Ratio).

All Audit coefficients (Audit2 to Audit6) have very high p-values (e.g., Audit2 p-value = 0.798, Audit6 p-value = 0.850). Consistent with previous models, none of the Audit levels are statistically significant predictors of Expenditure-to-Revenue Ratio compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

All prevalence coefficients in this model have very high p-values (e.g., prevalence2 p-value = 0.747, prevalence15 p-value = 0.727). Crucially, none of the prevalence categories are statistically significant predictors of Expenditure-to-Revenue Ratio in this model. This is a notable change from previous models where prevalence10 and prevalence9 showed significance.

The p-value of 0.9997 is extremely high (very close to 1). This means that the overall regression model is NOT statistically significant at all. We cannot conclude that any of the independent variables, as a group, have a significant effect on Expenditure-to-Revenue Ratio.

# 5 
lm5 <- lm(dataver2$`FHI (GF) - Revenue Collection Efficiency` ~ Audit + prevalence, data = dataver2)
summary(lm5)

## 
## Call:
## lm(formula = dataver2$`FHI (GF) - Revenue Collection Efficiency` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9724 -0.3240 -0.1128  0.0144  8.5639 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.21987    0.40641   3.002  0.00306 **
## Audit2        0.13012    0.44371   0.293  0.76967   
## Audit3       -0.21070    0.44012  -0.479  0.63270   
## Audit4       -0.17930    0.37019  -0.484  0.62873   
## Audit5        0.18355    0.46708   0.393  0.69480   
## Audit6        0.09738    0.43931   0.222  0.82481   
## prevalence2  -0.21054    0.24208  -0.870  0.38558   
## prevalence3   0.33479    0.33189   1.009  0.31443   
## prevalence4  -0.11310    0.45564  -0.248  0.80424   
## prevalence5  -0.17022    0.47051  -0.362  0.71793   
## prevalence6  -0.13226    0.46202  -0.286  0.77499   
## prevalence7   0.75283    0.53520   1.407  0.16123   
## prevalence8  -0.18004    0.83975  -0.214  0.83047   
## prevalence9  -0.16808    0.63282  -0.266  0.79084   
## prevalence10 -0.23812    0.83975  -0.284  0.77707   
## prevalence11 -0.25247    0.83975  -0.301  0.76403   
## prevalence12 -0.22029    0.75510  -0.292  0.77082   
## prevalence13 -0.20941    0.98752  -0.212  0.83230   
## prevalence15 -0.16678    1.33612  -0.125  0.90080   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.273 on 183 degrees of freedom
## Multiple R-squared:  0.04851,    Adjusted R-squared:  -0.04508 
## F-statistic: 0.5183 on 18 and 183 DF,  p-value: 0.9472

Analysis on Revenue Collection Efficiency` and Audit + prevalence as independent variable

The intercept is statistically significant at the 0.01 level (indicated by **). This means that the estimated value of FHI is significantly different from zero when all other predictors are at their reference levels (or zero for Revenue Collection Efficiency).

All Audit coefficients (Audit2 to Audit6) have very high p-values (e.g., Audit2 p-value = 0.76967, Audit6 p-value = 0.82481).

Consistent with all previous models, none of the Audit levels are statistically significant predictors of Revenue Collection Efficiency compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

All prevalence coefficients in this model have very high p-values (e.g., prevalence2 p-value = 0.38558, prevalence15 p-value = 0.90080).

Similar to the previous model, none of the prevalence categories are statistically significant predictors of Revenue Collection Efficiency in this model. This further emphasizes that the effects observed in earlier models (like prevalence10 and prevalence9) were not consistently present across all model specifications.

The p-value of 0.9472 is extremely high (much greater than 0.05). This means that the overall regression model is NOT statistically significant at all. We cannot conclude that any of the independent variables, as a group, have a significant effect on Revenue Collection Efficiency.

# 6
lm6 <- lm(dataver2$`FHI (GF) - Expenditure Budget Utilization Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm6)

## 
## Call:
## lm(formula = dataver2$`FHI (GF) - Expenditure Budget Utilization Ratio` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24394 -0.06715  0.03128  0.07905  0.17294 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.727255   0.032712  22.232   <2e-16 ***
## Audit2        0.038566   0.035714   1.080   0.2816    
## Audit3        0.045304   0.035425   1.279   0.2026    
## Audit4        0.037749   0.029797   1.267   0.2068    
## Audit5        0.035981   0.037595   0.957   0.3398    
## Audit6        0.037709   0.035360   1.066   0.2876    
## prevalence2  -0.004095   0.019485  -0.210   0.8338    
## prevalence3  -0.005355   0.026714  -0.200   0.8413    
## prevalence4  -0.004722   0.036675  -0.129   0.8977    
## prevalence5  -0.016424   0.037872  -0.434   0.6650    
## prevalence6   0.013821   0.037188   0.372   0.7106    
## prevalence7   0.091847   0.043078   2.132   0.0343 *  
## prevalence8   0.094083   0.067592   1.392   0.1656    
## prevalence9   0.068215   0.050936   1.339   0.1822    
## prevalence10  0.009918   0.067592   0.147   0.8835    
## prevalence11  0.084532   0.067592   1.251   0.2127    
## prevalence12  0.097130   0.060778   1.598   0.1117    
## prevalence13  0.049003   0.079486   0.617   0.5383    
## prevalence15  0.035330   0.107545   0.329   0.7429    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1024 on 183 degrees of freedom
## Multiple R-squared:  0.05786,    Adjusted R-squared:  -0.03481 
## F-statistic: 0.6244 on 18 and 183 DF,  p-value: 0.8777

Analysis on Expenditure Budget Utilization Ratio` and Audit + prevalence as Indepedent variables

The intercept is highly statistically significant (p < 0.001). This means that the estimated value of Expenditure Budget Utilization Ratio is significantly different from zero when all other predictors are at their reference levels (or zero for Expenditure Budget Utilization Ratio). The very high t-value (22.232) further emphasizes its strong significance.

All Audit coefficients (Audit2 to Audit6) continue to show high p-values (e.g., Audit2 p-value = 0.2816, Audit6 p-value = 0.2876). Consistent with all previous models, none of the Audit levels are statistically significant predictors of Expenditure Budget Utilization Ratio compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients have high p-values. Prevalence7: Estimate = 0.091847, Std. Error = 0.043078, t value = 2.132, Pr(>|t|) = 0.0343 The coefficient Prevalence7 is statistically significant at the 0.05 level (indicated by ). Being in the prevalence7 category is associated with an increase of approximately 0.092 units in FHI compared to the reference prevalence category, holding other variables constant. This is a new significant predictor that has not appeared in previous models.

All other prevalence levels are not statistically significant at conventional levels. Notably, prevalence9 and prevalence10, which showed significance in some previous models, are not significant here.

The p-value of 0.8777 is very high (much greater than 0.05). This means that the overall regression model is NOT statistically significant. We cannot conclude that any of the independent variables, as a group, have a significant effect on Expenditure Budget Utilization Ratio.

# 7 
lm7 <- lm(dataver2$`FHI (SEF) - Expenditure-to-Revenue Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm7)

## 
## Call:
## lm(formula = dataver2$`FHI (SEF) - Expenditure-to-Revenue Ratio` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.597  -9.749  -4.107  -0.281 199.643 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)    16.928     10.556   1.604    0.111
## Audit2         -6.413     11.525  -0.556    0.579
## Audit3         -4.279     11.432  -0.374    0.709
## Audit4         -4.769      9.616  -0.496    0.621
## Audit5        -13.777     12.132  -1.136    0.258
## Audit6         -6.034     11.411  -0.529    0.598
## prevalence2    -5.265      6.288  -0.837    0.404
## prevalence3   -10.196      8.621  -1.183    0.238
## prevalence4   -12.264     11.835  -1.036    0.301
## prevalence5     3.170     12.222   0.259    0.796
## prevalence6   -13.282     12.001  -1.107    0.270
## prevalence7   -15.417     13.902  -1.109    0.269
## prevalence8   -16.280     21.813  -0.746    0.456
## prevalence9   -15.381     16.438  -0.936    0.351
## prevalence10  -12.786     21.813  -0.586    0.558
## prevalence11  -16.072     21.813  -0.737    0.462
## prevalence12  -15.907     19.614  -0.811    0.418
## prevalence13  -16.348     25.651  -0.637    0.525
## prevalence15  -16.057     34.706  -0.463    0.644
## 
## Residual standard error: 33.06 on 183 degrees of freedom
## Multiple R-squared:  0.03152,    Adjusted R-squared:  -0.06374 
## F-statistic: 0.3309 on 18 and 183 DF,  p-value: 0.9957

Analysis on Expenditure-to-Revenue Ratio` and Audit + prevalence as Indepedent variables

The intercept is not statistically significant (p < 0.001). This means that the estimated value of Expenditure-to-Revenue Ratio is not significantly different from zero when all other predictors are at their reference levels (or zero for Expenditure-to-Revenue Ratio). The low t-value (1.604) further emphasizes its no significance.

All Audit coefficients (Audit2 to Audit6) continue to show high p-values (e.g., Audit2 p-value = 0.579, Audit6 p-value = 0.598). Consistent with all previous models, none of the Audit levels are statistically significant predictors of Expenditure-to-Revenue Ratio compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients have high p-values. All other prevalence levels are not statistically significant at conventional levels.

The p-value of 0.9957 is very high (much greater than 0.05). This means that the overall regression model is NOT statistically significant. We cannot conclude that any of the independent variables, as a group, have a significant effect on Expenditure-to-Revenue Ratio.

# 8 
lm8 <- lm(dataver2$`FHI (SEF) - Revenue Collection Efficiency` ~ Audit + prevalence, data = dataver2)
summary(lm8)

## 
## Call:
## lm(formula = dataver2$`FHI (SEF) - Revenue Collection Efficiency` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28828 -0.31797 -0.00363  0.33895  1.88212 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.10651    0.19701   5.617 7.14e-08 ***
## Audit2       -0.01171    0.21509  -0.054   0.9566    
## Audit3       -0.10776    0.21335  -0.505   0.6141    
## Audit4       -0.15989    0.17945  -0.891   0.3741    
## Audit5        0.01582    0.22642   0.070   0.9444    
## Audit6       -0.08668    0.21295  -0.407   0.6845    
## prevalence2   0.16594    0.11735   1.414   0.1590    
## prevalence3  -0.03110    0.16088  -0.193   0.8469    
## prevalence4   0.44617    0.22087   2.020   0.0448 *  
## prevalence5   0.16491    0.22808   0.723   0.4706    
## prevalence6   0.09073    0.22397   0.405   0.6859    
## prevalence7  -0.21703    0.25944  -0.837   0.4039    
## prevalence8  -0.04011    0.40707  -0.099   0.9216    
## prevalence9   0.27332    0.30676   0.891   0.3741    
## prevalence10 -0.71853    0.40707  -1.765   0.0792 .  
## prevalence11  0.26913    0.40707   0.661   0.5094    
## prevalence12 -0.09693    0.36604  -0.265   0.7915    
## prevalence13 -0.77005    0.47870  -1.609   0.1094    
## prevalence15 -0.10917    0.64769  -0.169   0.8663    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.617 on 183 degrees of freedom
## Multiple R-squared:  0.09048,    Adjusted R-squared:  0.001022 
## F-statistic: 1.011 on 18 and 183 DF,  p-value: 0.449

Analysis on Revenue Collection Efficiency` and Audit + prevalence as Indepedent variables

The intercept is highly statistically significant (p < 0.001). This means that the estimated value of Revenue Collection Efficiency is significantly different from zero when all other predictors are at their reference levels (or zero for Revenue Collection Efficiency). The very high t-value (5.617) further emphasizes its strong significance.

All Audit coefficients (Audit2 to Audit6) continue to show high p-values (e.g., Audit2 p-value = 0.9566 , Audit6 p-value = 0.6845). Consistent with all previous models, none of the Audit levels are statistically significant predictors of Revenue Collection Efficiency compared to their reference level in this model. Their estimated effects are small and not distinguishable from zero.

Most prevalence coefficients have high p-values. All other prevalence levels are not statistically significant at conventional levels. However, Prevalence4: Estimate = 0.44617, Std. Error = 0.22087, t value = 2.020, Pr(>|t|) = 0.0448 . The coefficient Prevalence7 is statistically significant at the 0.05 level (indicated by ). Being in the prevalence4 category is associated with an increase of approximately 0.44 units in Revenue Collection Efficiency compared to the reference prevalence category, holding other variables constant. This is a new significant predictor that has not appeared in previous models.

The p-value of 0.449 is very high (much greater than 0.05). This means that the overall regression model is NOT statistically significant. We cannot conclude that any of the independent variables, as a group, have a significant effect on Revenue Collection Efficiency.

# 9 
lm9 <- lm(dataver2$`FHI (SEF) - Expenditure Budget Utilization Ratio` ~ Audit + prevalence, data = dataver2)
summary(lm9)

## 
## Call:
## lm(formula = dataver2$`FHI (SEF) - Expenditure Budget Utilization Ratio` ~ 
##     Audit + prevalence, data = dataver2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.33992 -0.09649  0.12374  0.23285  0.42900 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.487147   0.115289   4.225 3.76e-05 ***
## Audit2        0.151981   0.125872   1.207    0.229    
## Audit3        0.073421   0.124853   0.588    0.557    
## Audit4        0.027572   0.105016   0.263    0.793    
## Audit5        0.150912   0.132501   1.139    0.256    
## Audit6        0.111943   0.124622   0.898    0.370    
## prevalence2   0.085848   0.068672   1.250    0.213    
## prevalence3  -0.005429   0.094150  -0.058    0.954    
## prevalence4   0.210368   0.129257   1.628    0.105    
## prevalence5   0.117022   0.133474   0.877    0.382    
## prevalence6   0.200887   0.131066   1.533    0.127    
## prevalence7   0.031092   0.151825   0.205    0.838    
## prevalence8   0.075014   0.238221   0.315    0.753    
## prevalence9   0.209902   0.179520   1.169    0.244    
## prevalence10  0.104733   0.238221   0.440    0.661    
## prevalence11  0.390795   0.238221   1.640    0.103    
## prevalence12  0.251231   0.214207   1.173    0.242    
## prevalence13  0.219986   0.280139   0.785    0.433    
## prevalence15 -0.405932   0.379031  -1.071    0.286    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3611 on 183 degrees of freedom
## Multiple R-squared:  0.05846,    Adjusted R-squared:  -0.03415 
## F-statistic: 0.6312 on 18 and 183 DF,  p-value: 0.872

Analysis on Expenditure Budget Utilization Ratio` ~ Audit + prevalence as Indepedent variables

The intercept is highly statistically significant (p < 0.001). This means that the estimated value of Expenditure Budget Utilization Ratio is significantly different from zero When all other independent variables (Audit and prevalence) are zero, the expected Expenditure Budget Utilization Ratio is 0.487147.. The very high t-value (4.225) further emphasizes its strong significance.

Most Audit categories (Audit1 to Audit6) do not show statistical significance (p-values are mostly > 0.05), except possibly Audit1 (p-value = 0.229) and Audit5 (p-value = 0.256), which are still not significant at conventional levels. This suggests that, individually, these specific audit categories (when treated as separate dummy variables) do not have a significant linear relationship with the Expenditure Budget Utilization Ratio, after accounting for other variables in the model.

Similar to Audit, most prevalence categories do not show statistical significance (p-values are mostly > 0.05). This implies that, individually, these specific prevalence categories (as dummy variables) do not significantly predict the Expenditure Budget Utilization Ratio in this model.

The p-value of 0.872 is very high (much greater than 0.05). This means that the overall regression model is NOT statistically significant. We cannot conclude that any of the independent variables, as a group, have a significant effect onExpenditure Budget Utilization Ratio.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI - Debt-to-Equity Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI - Debt-to-Equity Ratio`
## t = -1.1511, df = 200, p-value = 0.2511
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.21675095  0.05756825
## sample estimates:
##        cor 
## -0.0811275

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI - Debt-to-Equity Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health - Debt to Equity Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Debt to Equity Ratio Since the p-value (0.2511) is much greater than common significance levels (e.g., 0.05), we fail to reject the null hypothesis. This means there is no statistically significant linear relationship between ‘Prevalence - Monetary Value’ and ‘FHI - Debt-to-Equity Ratio’ at the 0.05 level. In other words, we cannot conclude that a true correlation exists in the population based on this sample.

The correlation coefficient of -0.0811275 indicates a very slight negative trend. As ‘Monetary Value’ increases, ‘Financial Health - Debt to Equity Ratio’ tends to slightly decrease, but this relationship is extremely weak and not statistically significant.

The sample estimate for the correlation coefficient is -0.0811275. This indicates a very weak negative linear relationship between the two variables. The scatter plot visually confirms the weak negative trend. The red regression line slopes slightly downwards, but the points are widely scattered around the line, indicating a poor fit and a weak relationship.

In summary, both the numerical correlation analysis and the scatter plot suggest that there is no statistically significant linear correlation between ‘Prevalence - Monetary Value’ and ‘Financial Health - Debt-to-Equity Ratio’. The observed negative correlation is very weak and likely due to random chance.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI - Quick Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI - Quick Ratio`
## t = 0.52973, df = 200, p-value = 0.5969
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1011421  0.1745804
## sample estimates:
##        cor 
## 0.03743153

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI - Quick Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health - Quick Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Quick Ratio Since the p-value (0.5969) is much greater than common significance levels (e.g., 0.05), we fail to reject the null hypothesis. This means there is no statistically significant linear relationship between ‘Prevalence - Monetary Value’ and ‘FHI - Quick Ratio’ at the 0.05 level. We cannot conclude that a true correlation exists in the population based on this sample.

The correlation coefficient of 0.03743153 indicates an extremely slight positive trend. As ‘Monetary Value’ increases, ‘Financial Health - Quick Ratio’ tends to slightly increase, but this relationship is negligible and not statistically significant.

The scatter plot visually confirms the extremely weak positive trend. The red regression line slopes very slightly upwards, almost horizontally. The points are widely scattered around the line, particularly with some outliers at very high ‘Financial Health - Quick Ratio’ values for low ‘Monetary Value’, indicating a very poor fit and essentially no clear linear relationship.

In summary, both the numerical correlation analysis and the scatter plot clearly indicate that there is no statistically significant linear correlation between ‘Prevalence - Monetary Value’ and ‘Financial Health - Quick Ratio’. The observed positive correlation is extremely weak and not statistically meaningful.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI - Current Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI - Current Ratio`
## t = 3.4767, df = 200, p-value = 0.0006227
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1041096 0.3647592
## sample estimates:
##       cor 
## 0.2387295

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI - Current Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Current Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Current Ratio

Since the p-value (0.0006227) is much smaller than common significance levels (e.g., 0.05 or even 0.01), we reject the null hypothesis. This means there is a statistically significant positive linear relationship between ‘Prevalence - Monetary Value’ and ‘FHI - Current Ratio’. The correlation coefficient of 0.2387295 suggests a weak to moderate positive relationship. As ‘Monetary Value’ increases, ‘Financial Health - Current Ratio’ tends to increase.

The scatter plot visually supports the positive linear relationship. The red regression line slopes upwards, indicating that as ‘Monetary Value’ increases, ‘Financial Health - Current Ratio’ generally increases. While there is still some scatter, the points generally follow the upward trend, especially as ‘Monetary Value’ increases. There are also some outliers, particularly at lower ‘Monetary Value’ with higher ‘Current Ratio’ values.

In summary, both the numerical correlation analysis and the scatter plot indicate a statistically significant, weak to moderate positive linear correlation between ‘Prevalence - Monetary Value’ and ‘Financial Health - Current Ratio’.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (GF) - Expenditure-to-Revenue Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (GF) - Expenditure-to-Revenue Ratio`
## t = -0.54539, df = 200, p-value = 0.5861
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1756524  0.1000473
## sample estimates:
##        cor 
## -0.0385359

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Expenditure-to-Revenue Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Expenditure-to-Revenue Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Expenditure-to-Revenue Ratio

Since the p-value (0.5861) is considerably larger than common significance levels (e.g., 0.05 or 0.10), we fail to reject the null hypothesis. This means there is no statistically significant linear relationship between ‘Prevalence - Monetary Value’ and ‘FHI (GF) - Expenditure-to-Revenue Ratio’. We cannot conclude that a true correlation exists in the population based on this sample.

The sample estimate for the correlation coefficient is -0.0385359. This indicates a very weak negative linear relationship between the two variables.

The correlation coefficient of -0.0385359 suggests an extremely slight negative linear trend. As ‘Monetary Value’ increases, ‘Financial Health - Expenditure-to-Revenue Ratio’ tends to slightly decrease, but this relationship is negligible.

The scatter plot visually confirms the extremely weak and practically non-existent linear trend. The red regression line is almost perfectly horizontal, showing minimal slope.

The data points are widely dispersed, and there’s a noticeable cluster of points at lower ‘Monetary Value’ with varying ‘Expenditure-to-Revenue Ratio’ values, including some potential outliers at higher ratios. The overall scatter suggests no clear linear pattern.

In summary, both the numerical correlation analysis and the scatter plot consistently indicate that there is no statistically significant linear correlation between ‘Prevalence - Monetary Value’ and ‘FHI (GF) - Expenditure-to-Revenue Ratio’. The observed correlation is extremely weak and is likely due to random chance rather than a true underlying relationship.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (GF) - Revenue Collection Efficiency`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (GF) - Revenue Collection Efficiency`
## t = -0.37898, df = 200, p-value = 0.7051
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1642317  0.1116759
## sample estimates:
##         cor 
## -0.02678804

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Revenue Collection Efficiency`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Revenue Collection Efficiency") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Revenue Collection Efficiency Since the p-value (0.7051) is much larger than common significance levels (e.g., 0.05 or 0.10), we fail to reject the null hypothesis. This means there is no statistically significant linear relationship between ‘Prevalence - Monetary Value’ and ‘FHI (GF) - Revenue Collection Efficiency’. We cannot conclude that a true correlation exists in the population based on this sample.

The sample estimate for the correlation coefficient is -0.02678804. This indicates an extremely weak negative linear relationship between the two variables. The correlation coefficient of -0.02678804 suggests a negligible negative linear trend. As ‘Monetary Value’ increases, ‘Financial Health - Revenue Collection Efficiency’ shows an almost imperceptible tendency to decrease.

The scatter plot visually confirms the extremely weak and practically non-existent linear trend. The red regression line is nearly flat, indicating almost no slope.

The data points are highly concentrated at lower ‘Monetary Value’, with a few scattered points across the range, including one prominent outlier at very high ‘Financial Health - Revenue Collection Efficiency’. The overall distribution of points shows no clear linear pattern.

In summary, both the numerical correlation analysis and the scatter plot consistently indicate that there is no statistically significant linear correlation between ‘Prevalence - Monetary Value’ and ‘FHI (GF) - Revenue Collection Efficiency’. The observed correlation is extremely weak and is likely due to random chance rather than a true underlying relationship.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (GF) - Expenditure Budget Utilization Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (GF) - Expenditure Budget Utilization Ratio`
## t = 0.052362, df = 200, p-value = 0.9583
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1344172  0.1416811
## sample estimates:
##         cor 
## 0.003702501

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Expenditure Budget Utilization Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Expenditure Budget Utilization Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Expenditure Budget Utilization Ratio Since the p-value (0.9583) is much greater than typical significance levels (e.g., 0.05 or 0.01), we fail to reject the null hypothesis. This means there is no statistically significant evidence to suggest a linear correlation between “Prevalence - Monetary Value” and “FHI (GF) - Expenditure Budget Utilization Ratio”.

The sample estimate for the correlation (cor) is 0.003702501. This value is very close to zero, indicating an extremely weak linear relationship between the two variables.

The scatter plot visually confirms the results of the Pearson correlation. The red regression line is almost perfectly horizontal. A horizontal regression line indicates that as the “Monetary value” changes, the predicted “Expenditure Budget Utilization Ratio” does not significantly change. This is consistent with a correlation coefficient close to zero.

Both the statistical output of Pearson’s product-moment correlation and the visual representation from the scatter plot strongly indicate that there is no statistically significant linear relationship between “Prevalence - Monetary Value” and “Financial Health - Expenditure Budget Utilization Ratio.” The correlation coefficient is extremely close to zero, the p-value is very high, and the regression line on the scatter plot is essentially flat.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (SEF) - Expenditure-to-Revenue Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (SEF) - Expenditure-to-Revenue Ratio`
## t = -0.3387, df = 200, p-value = 0.7352
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1614599  0.1144868
## sample estimates:
##         cor 
## -0.02394261

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Expenditure-to-Revenue Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Expenditure-to-Revenue Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Expenditure-to-Revenue Ratio Since the p-value (0.7352) is much greater than typical significance levels (e.g., 0.05 or 0.01), we fail to reject the null hypothesis. This means there is no statistically significant evidence to suggest a linear correlation between “Prevalence - Monetary Value” and “FHI (SEF) - Expenditure-to-Revenue Ratio”.

The sample estimate for the correlation (cor) is −0.02394261. This value is very close to zero, indicating an extremely weak linear relationship between the two variables. The negative sign suggests a very slight, almost negligible, inverse relationship (as one variable increases, the other slightly decreases), but its magnitude is too small to be meaningful.

The scatter plot visually confirms the results of the Pearson correlation. The red regression line is almost perfectly horizontal and appears to have a very slight downward slope. A nearly horizontal regression line indicates that as the “Monetary value” changes, the predicted “Expenditure-to-Revenue Ratio” does not significantly change. This is consistent with a correlation coefficient very close to zero.

The data points are widely scattered, particularly at lower “Monetary value” ranges, and do not show a clear upward or downward trend. There are a few outliers with very high “Expenditure-to-Revenue Ratio” at low “Monetary value,” but even these do not establish a clear linear pattern across the entire dataset.

Both the statistical output of Pearson’s product-moment correlation and the visual representation from the scatter plot strongly indicate that there is no statistically significant linear relationship between “Prevalence - Monetary Value” and “Financial Health - Expenditure-to-Revenue Ratio.” The correlation coefficient is extremely close to zero, the p-value is very high, and the regression line on the scatter plot is essentially flat, suggesting no predictive power from “Monetary value” on the “Expenditure-to-Revenue Ratio.”

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (SEF) - Revenue Collection Efficiency`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (SEF) - Revenue Collection Efficiency`
## t = 0.27158, df = 200, p-value = 0.7862
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1191665  0.1568357
## sample estimates:
##        cor 
## 0.01920037

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Revenue Collection Efficiency`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Revenue Collection Efficiency") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Revenue Collection Efficiency Since the p-value (0.7862) is much greater than typical significance levels (e.g., 0.05 or 0.01), we fail to reject the null hypothesis. This means there is no statistically significant evidence to suggest a linear correlation between “Prevalence - Monetary Value” and “Revenue Collection Efficiency”.

The sample estimate for the correlation (cor) is 0.01920037. This value is very close to zero, indicating an extremely weak positive linear relationship. The positive sign suggests that as “Monetary Value” increases, “Revenue Collection Efficiency” slightly increases, but the magnitude is negligible.

The scatter plot visually confirms the results of the Pearson correlation. The red regression line is almost perfectly horizontal, and it appears to have a very slight upward slope, consistent with the very small positive correlation coefficient. A nearly horizontal regression line indicates that changes in “Monetary value” do not significantly predict changes in “Revenue Collection Efficiency.”

Both the statistical output of Pearson’s product-moment correlation and the visual representation from the scatter plot strongly indicate that there is no statistically significant linear relationship between “Prevalence - Monetary Value” and “Financial Health - Revenue Collection Efficiency.” The correlation coefficient is negligible, the p-value is very high, and the regression line on the scatter plot is essentially flat, providing no evidence of a linear association between these two variables.

cor.test(dataver2$`Prevalence - Monetary Value`, dataver2$`FHI (SEF) - Expenditure Budget Utilization Ratio`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  dataver2$`Prevalence - Monetary Value` and dataver2$`FHI (SEF) - Expenditure Budget Utilization Ratio`
## t = 1.1576, df = 200, p-value = 0.2484
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05711396  0.21718528
## sample estimates:
##        cor 
## 0.08158027

ggplot(dataver2, aes(x = `Prevalence - Monetary Value`, y = `FHI (GF) - Expenditure Budget Utilization Ratio`)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Scatter Plot with Regression Line",
       x = "Monetary value",
       y = "Financial Health -  Expenditure Budget Utilization Ratio") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Analysis on the Correlation between Monetary value and Expenditure Budget Utilization Ratio

Since the p-value (0.2484) is greater than common significance levels (e.g., 0.05), we fail to reject the null hypothesis. This means there is no statistically significant evidence to conclude a linear correlation between “Prevalence - Monetary Value” and “Expenditure Budget Utilization Ratio” at the 0.05 significance level.

The sample estimate for the correlation (cor) is 0.08158027. This value is close to zero, suggesting a very weak positive linear relationship.

The red regression line is nearly horizontal, indicating that changes in “Monetary value” have very little effect on “Expenditure Budget Utilization Ratio.” While there might be a minuscule upward slope (consistent with the small positive ‘cor’ value), it’s visually insignificant. The data points are heavily clustered at the lower end of the “Monetary value” scale, with a broad spread of “Expenditure Budget Utilization Ratio” values. There’s no clear linear trend or pattern discernible across the entire dataset.

Based on both the statistical analysis and the scatter plot, there is no statistically significant linear relationship between “Prevalence - Monetary Value” and “Financial Health - Expenditure Budget Utilization Ratio.” The correlation coefficient is very small, the p-value is high, and the visual representation shows a scattered distribution of points with a nearly flat regression line, all of which indicate a lack of a meaningful linear association between these two variables.