Historically, agriculture has played an important role in the Australian economy and society (Australian Government Productivity Commission, 2005). With the development in social environment, innovation and intelligent planning in the agricultural industry are vital to ensure that food supply demands are met across the country and exported overseas. Various researches indicate that some of the key factors shaping the industry could be summarized in consumer demands, government policies, technological advances and innovation and emerging environment concerns.
In assessment task 2, the initial hypothesis of our study was to provide insights into what are the most influential geographical factors and how does yield react to changes of these factors by using linear regression. The team attempted to apply different types of approaches to find out the best fitting model to predict crop yield based on multiple environmental variables. The final model revealed that the mean temperature between September and February in the next year had the most influence on the wine grapes yield and other variables such as latitude also have some degree of influence on yield. However, at that time the complexities of the topics were unknown, and the team only had the comparatively simple tool of linear regression to build up the model and most of features are manually selected. Even though some of our members used Boruta package to choose variables, the types of models waived resulted in a lower train set / test set RMSE ratio but higher test set RSME number. In addition, it was hard to evaluate whether variables selected were or were not related.
In order to expand the exploration in agriculture and enhance the insights concluded in assessment 2, two question were raised up:
1. Why the other variables should be removed?
2. How do the different variables interact?
For the first question, the current study attempts to test variables derived from our previous study by employing Structural Equation Modelling (SEM) method. Literally, the SEM is a statistical modelling technique which is widely used to analyses the structural relationship between measured variables and latent construct. The relationship between the theoretical constructs (i.e. latent factors) are represented by regression or path coefficients between the factors (J.J. Hox, 1998). In another word, SEM model enables us to test these structures, observing how well they reflect the observational data, and therefore how well our assumptions may fit to the actual interactions taking place (Joshua McCarthy, 2020).
For the second question, this project will apply Exploratory Factor Analysis (EFA) method to find out whether these variables have similar patterns of responses. The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying variables called factors that can explain the interrelationships among measured variables.
In order to expand the exploration in agriculture and enhance the insights concluded in assessment 2, two question were raised up:
The data is reusing the datasets collected in assignment 2 which contains:
However, regarding to the requirements for this project, the selected data sources were pre-processed by using R. The climate values were narrowed down to only one variable which aims to reduce the weights of climate variables’ impact on the new model. Grape varieties data is initially in forms of ratio within SA2 region. Currently, they were re-calculated by incorporating with land use data which results in cultivated area of various grapes across states. Further, the scale of varieties was also narrowed down to top five wine grapes cultivated in Australia.
After removing the outliers, the number of records in this dataset is 220 and the number of initial observed variables in the model is 12. According to Kline (2011), a typical sample size in studies where SEM is used is about 200 cases with an ideal sample size-to-parameter ratio of 20:1. From J.C.F de Winter, et al. (2009), they pointed that EFA is generally regarded as a technique for large sample sizes (N), with N = 50 as reasonable absolute minimum. Thus, the sample size of the dataset is adequate for both EFA and SEM.
Yi Fan et, al. (2016) noted that model specification defines the hypothesized relationship among the variables in an SEM based on one’s knowledge. Model identification is to check if the model is over-identified, just-identified or under-identified. Model coefficients can be only estimated in the just-identified or over-identified mode. In another word, SEM requires us to develop a literature and theory supported theoretical model which indicates the relationship between the latent variables and observed indicators. In final, the observed variables were selected by supports from multiple researches and life experiences.
Intuitively, it would seem likely that climatic changes, especially the temperature are important to grapevines and grape production, which is concluded from our previous model. It also proves that fruitfulness would be improved by high temperatures during planting season (i.e. summer). Further, the region with a higher mean temperature across the whole year is more likely to get solar radiation than the region with a lower value. Likewise, it could be less possible to get a higher level of mean rainfall. EDA identified that there is a decreasing trend between crop yield and rainfall. But lack of rainfall can be a severe influence on grape productivity in the absence of good quality irrigation water (Ted Goldammer, 2018). Soil conditions should be taken into consideration as well. A. Lynn Cochran reported that grapes could tolerate a fairly wide range of soil chemistry and conditions while some soil types are more ideal for grape-growing in terms of pH, organic content, water capacity and soil fertility.
Another relevant factor which may affect the yield is land use. Obviously, vineyards with a larger growing area for viticulture could get a higher level of grape yield every year. The last potential factor is the variety of grape since due to differing viticultural practices and diverse composition of different varietals, the amount of grape harvested in hectolitre per hectare vary greatly.
The Directed Acyclic Graph (DAG) below shows a conceptual model of the correlations between all observed variables and latent variables. SEM model is constructed from the DAG.
Figure 1. Conceptual Directed Acyclic Graph
Table 1. Latent Variables and Identifiers
| S. No. | Latent Variables | Identifiers | |
|---|---|---|---|
| 1 | mean_temp | l_mn_t_ | lu_mean_temp_annual |
| 2 | mean_rainfall | l_mn_r_ | lu_mean_rain_annual |
| 3 | soil_cond | l_w_ l_s_ l_p_ l_c_ |
lu_water_capacity lu_sand lu_ph lu_carbon |
| 4 | mean_solar | l_mn_s_ | lu_mean_solar_annual |
| 5 | variety_area | ch_ sh_ cl_ tm_ c_ |
chardonnay_area shiraz_area colombard_area tempranillo_area cabernet_sauvignon_area |
| 6 | land_use | g_ | grape_area_landuse |
| 7 | crop_yield | cr_ | yield |
Multiple SEMs were created and analysed while unfortunately, each iteration of the model resulted in a slight improvement. The final SEM model analysed below is the best one that represents the potential relationship among these observed variables.
This step is for SEM model estimation. The graph below shows the distribution of the data which indicates that there is variation in distributions. Originally, SEM relied on all multivariate distribution being normal, while the estimator Maximum Likelihood Robust (MLR) is a recommended starting point if data quality is questionable (Joshua McCarthy, 2020).
Figure 2. Distribution of Data
Note: R codes refer to
get distribution of data section in Appendix.
Then, the table below shows the fit indices referring to the report from Thakkar (2020). The SEM model is still not well fitted. All of six parameters do not meet its own preferred value significantly, even though the Incremental fit (IFI) and Comparative fit (CFI), with numbers of 0.863 and 0.862 respectively, are almost close to 0.9. The result is surprisingly similar to our previous study in assessment task 2, which tells us that apart from temperature, a better fitting model is less likely generated based on other observed variables selected.
Table 2. Fit Indices
| Index | Result Value | Preferred Value | Conclusion |
|---|---|---|---|
| Chi-square | 0.000 | > 0.05 | Not fit |
| Root mean square error of approximation (RMSEA) | 0.186 | < 0.05 | Not fit |
| Goodness of fit (GFI) | 0.785 | > 0.9 | Not fit |
| Normal fit (NFI) | 0.848 | > 0.9 | Not fit |
| Incremental fit (IFI) | 0.863 | > 0.9 | Not fit |
| Comparative fit (CFI) | 0.862 | > 0.9 | Not fit |
Figure 3. SEM Paths
Note: R codes refer to
SEM model section in Appendix.
Here, EFA model is used to reduce the data to a smaller set of summary variables and to explore the correlations among these variables. The graph below shows three factors could be used to describe nearly the whole datasets. In detail, temperature, rainfall, solar radiation, soil conditions (including water capacity, sand and carbon) could be summarised into one factor. The previous three are relatively correlated with a number over 0.7.
Figure 4. EFA Model with Three Factors
Note: R codes refer to
EFA model section in Appendix.
From the SEM model analysis, it was found that the final model created was not able to manifest that various environmental variables could make great impacts on the wine grapes yield. It also proves that why in the previous study the team would choose a few temperature variables to train the model and made the prediction rather than incorporated with multiple variables such as soil conditions and grape varieties. Some of the reason could be as following:
Thus, the results demonstrate that various factors might have influence on crop yield, even though they are inconsistent with our initial expectations and different from the conclusions from the reports we researched.
Speaking of the EFA model, surprisingly, it proves the assumption that there is a significant relationship between temperature, solar radiation and rain. In addition to this, carbon and water capacity from soil conditions are highly related to those climatic values which conforms to our common knowledge.
Through this assignment, I was introduced to Structural Equation Modelling and Exploratory Factor Analysis model based on the topic about wine grapes in Australia. However, the deeper I drove into these models, the more I realised that the topic of agriculture is tremendous but extremely complicated. Thus, not many significant results were gained from analysing the data. In conclusion, the whole agricultural market is affected by several factors which is much broader than the observed variables we have found in this case and the complexity of establishing relationships is far harder than we thought. In addition, my skills in statistical modelling and relevant knowledge are still inadequate, while the good point is that difficulties and frustrations bring motivations to encourage me work harder.
[1] Australian Government Productivity Commission (2005). Trends in Australian Agriculture.
[2] Yi Fan et al. (2016). Applications of structural equation modeling (SEM) in ecological studies: an updated review. Ecological Processes.
[3] J.C.F. de Winter et al. (2009). Exploratory Factor Analysis With Small Sample Sizes.
[4] Ted Goldammer. (2018). Grape Grower’s Handbook. Apex Publishers.
[5] A.Lynn Cochran. The Best Soil Types to Grow Grapes. SFGATE.
[6] Joshua McCarthy (2020). Understanding Implied Structure in Past COVID-19 Research using SEM and CFA.
[7] Dr. Jitesh J. Thakkar (2020). Structural Equation Modelling.
library(here)
## here() starts at /Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(lavaan)
## This is lavaan 0.6-7
## lavaan is BETA software! Please report any bugs.
library(semPlot)
## Registered S3 methods overwritten by 'huge':
## method from
## plot.sim BDgraph
## print.sim BDgraph
library(semTools)
##
## ###############################################################################
## This is semTools 0.5-3
## All users of R (or SEM) are invited to submit functions or ideas for functions.
## ###############################################################################
##
## Attaching package: 'semTools'
## The following object is masked from 'package:readr':
##
## clipboard
library(ggplot2)
library(cowplot)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## The following object is masked from 'package:semTools':
##
## skew
## The following object is masked from 'package:lavaan':
##
## cor2cov
yield_by_region <- read_csv(here::here("project/src/output", "yield_by_region.csv")
, col_types = cols(`cabernet_sauvignon _ratio` = col_double())
)
## Warning: 164 parsing failures.
## row col expected actual file
## 3 cabernet_sauvignon _ratio a double FALSE '/Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture/project/src/output/yield_by_region.csv'
## 3 NA 178 columns 179 columns '/Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture/project/src/output/yield_by_region.csv'
## 10 cabernet_sauvignon _ratio a double FALSE '/Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture/project/src/output/yield_by_region.csv'
## 10 NA 178 columns 179 columns '/Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture/project/src/output/yield_by_region.csv'
## 11 cabernet_sauvignon _ratio a double FALSE '/Users/kevin/Downloads/test/STDS_Data_Geeks_Agriculture/project/src/output/yield_by_region.csv'
## ... ......................... ........... ........... ................................................................................................
## See problems(...) for more details.
yield_by_region$`cabernet_sauvignon _ratio` <- as.numeric(yield_by_region$`cabernet_sauvignon _ratio`)
colnames(yield_by_region)[colnames(yield_by_region) == 'cabernet_sauvignon _ratio'] <- 'cabernet_sauvignon_ratio'
yield_by_region[is.na(yield_by_region)] <- 0
# remove outlier
yield_by_region <- yield_by_region[-222,]
yield_by_region <- yield_by_region[-180,]
model_data <- yield_by_region %>%
dplyr::mutate(chardonnay_area = round(chardonnay_ratio * grape_area_landuse / 1000, 3)) %>%
dplyr::mutate(shiraz_area = round(shiraz_ratio * grape_area_landuse / 1000, 3)) %>%
dplyr::mutate(colombard_area = round(colombard_ratio * grape_area_landuse / 1000, 3)) %>%
dplyr::mutate(tempranillo_area = round(tempranillo_ratio * grape_area_landuse / 1000, 3)) %>%
dplyr::mutate(cabernet_sauvignon_area = round(cabernet_sauvignon_ratio * grape_area_landuse / 1000, 3)) %>%
dplyr::mutate(lat = -lat) %>%
dplyr::mutate(grape_area_landuse = round(grape_area_landuse / 1000, 3)) %>%
dplyr::select(lon, lat, state, yield,
lu_mean_temp_annual,
lu_mean_rain_annual,
lu_mean_solar_annual,
grape_area_landuse,
lu_bulk_density, lu_carbon, lu_clay, lu_nitrogen, lu_ph, lu_phosphorus, lu_sand, lu_silt, lu_water_capacity,
chardonnay_area, shiraz_area, colombard_area, tempranillo_area, cabernet_sauvignon_area,
)
cvd_vars <- model_data %>%
dplyr::select(-lon, -lat, -state, -lu_bulk_density, -lu_clay, -lu_ph, -lu_phosphorus, -lu_silt, -lu_nitrogen)
dist_plots <- lapply(names(cvd_vars), function(var_x){
p <-
ggplot(cvd_vars) +
aes_string(var_x)
if(is.numeric(cvd_vars[[var_x]])) {
p <- p + geom_density()
} else {
p <- p + geom_bar()
}
})
plot_grid(plotlist = dist_plots)
sem_model <- '
# latent variable model "latent variable =~ observed variable"
crop_yield =~ yield
mean_temp =~ lu_mean_temp_annual
mean_rainfall =~ lu_mean_rain_annual
soil_cond =~ lu_water_capacity + lu_carbon + lu_sand
mean_solar =~ lu_mean_solar_annual
variety_area =~ chardonnay_area + shiraz_area + colombard_area + tempranillo_area + cabernet_sauvignon_area
land_use =~ grape_area_landuse
# regressions "latent variable ~ latent variable => regression fomula"
crop_yield ~ mean_temp + mean_solar + mean_rainfall + land_use + soil_cond
mean_temp ~ mean_solar + mean_rainfall
mean_solar ~ mean_rainfall
variety_area ~ land_use + soil_cond
# residual correlations (covariances) "observed variable ~~ observed variable => residuals and covariances"
lu_mean_temp_annual ~~ lu_mean_rain_annual
lu_mean_rain_annual ~~ lu_mean_solar_annual
grape_area_landuse ~~ chardonnay_area
grape_area_landuse ~~ shiraz_area
grape_area_landuse ~~ colombard_area
grape_area_landuse ~~ tempranillo_area
lu_mean_rain_annual ~~ lu_water_capacity
lu_water_capacity ~~ lu_carbon
'
fit_1 <- sem(sem_model, data = model_data, check.gradient = FALSE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: some observed variances are (at least) a factor 1000 times larger than
## others; use varTable(fit) to investigate
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
# varTable(fit_1)
summary(fit_1, standardized = TRUE)
## lavaan 0.6-7 ended normally after 984 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 43
##
## Number of observations 220
##
## Model Test User Model:
##
## Test statistic 414.599
## Degrees of freedom 48
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## crop_yield =~
## yield 1.000 5.251 1.000
## mean_temp =~
## lu_men_tmp_nnl 1.000 2.081 1.000
## mean_rainfall =~
## lu_mean_rn_nnl 1.000 234.515 1.000
## soil_cond =~
## lu_water_cpcty 1.000 0.630 0.829
## lu_carbon -1.133 0.087 -12.950 0.000 -0.714 -0.834
## lu_sand 2.904 1.124 2.584 0.010 1.830 0.161
## mean_solar =~
## lu_men_slr_nnl 1.000 1.557 1.000
## variety_area =~
## chardonnay_are 1.000 0.636 1.013
## shiraz_area 0.142 0.062 2.300 0.021 0.090 0.151
## colombard_area 0.388 0.016 24.075 0.000 0.247 0.866
## tempranillo_ar 0.425 0.017 25.443 0.000 0.270 0.879
## cabrnt_svgnn_r 0.044 0.020 2.216 0.027 0.028 0.096
## land_use =~
## grape_are_lnds 1.000 1.972 1.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## crop_yield ~
## mean_temp -0.007 0.234 -0.030 0.976 -0.003 -0.003
## mean_solar -0.585 0.395 -1.481 0.139 -0.173 -0.173
## mean_rainfall 0.158 2.510 0.063 0.950 7.065 7.065
## land_use -0.045 12.063 -0.004 0.997 -0.017 -0.017
## soil_cond 64.528 944.853 0.068 0.946 7.745 7.745
## mean_temp ~
## mean_solar 0.941 0.085 11.061 0.000 0.704 0.704
## mean_rainfall -0.001 0.001 -2.136 0.033 -0.151 -0.151
## mean_solar ~
## mean_rainfall -0.005 0.001 -9.079 0.000 -0.715 -0.715
## variety_area ~
## land_use 0.619 0.177 3.506 0.000 1.919 1.919
## soil_cond -0.110 0.094 -1.176 0.240 -0.109 -0.109
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .lu_mean_temp_annual ~~
## .lu_mean_rn_nnl 161.882 23.099 7.008 0.000 161.882 Inf
## .lu_mean_rain_annual ~~
## .lu_men_slr_nnl 133.260 17.934 7.430 0.000 133.260 Inf
## .chardonnay_area ~~
## .grape_are_lnds -1.462 0.643 -2.274 0.023 -1.462 -Inf
## .shiraz_area ~~
## .grape_are_lnds 0.495 0.136 3.651 0.000 0.495 Inf
## .colombard_area ~~
## .grape_are_lnds -0.537 0.250 -2.144 0.032 -0.537 -Inf
## .tempranillo_area ~~
## .grape_are_lnds -0.637 0.274 -2.323 0.020 -0.637 -Inf
## .lu_mean_rain_annual ~~
## .lu_water_cpcty 14.671 10.136 1.447 0.148 14.671 Inf
## .lu_water_capacity ~~
## .lu_carbon 0.005 0.067 0.073 0.942 0.005 0.024
## mean_rainfall ~~
## soil_cond -146.948 17.125 -8.581 0.000 -0.994 -0.994
## land_use -78.316 23.553 -3.325 0.001 -0.169 -0.169
## soil_cond ~~
## land_use 0.258 0.075 3.439 0.001 0.207 0.207
## .crop_yield ~~
## .variety_area -0.577 0.479 -1.204 0.229 -0.556 -0.556
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .yield 0.000 0.000 0.000
## .lu_men_tmp_nnl 0.000 0.000 0.000
## .lu_mean_rn_nnl 0.000 0.000 0.000
## .lu_water_cpcty 0.181 0.066 2.761 0.006 0.181 0.313
## .lu_carbon 0.223 0.076 2.941 0.003 0.223 0.304
## .lu_sand 125.999 12.002 10.498 0.000 125.999 0.974
## .lu_men_slr_nnl 0.000 0.000 0.000
## .chardonnay_are -0.010 0.007 -1.431 0.153 -0.010 -0.026
## .shiraz_area 0.351 0.033 10.516 0.000 0.351 0.977
## .colombard_area 0.020 0.002 9.273 0.000 0.020 0.250
## .tempranillo_ar 0.021 0.002 8.984 0.000 0.021 0.227
## .cabrnt_svgnn_r 0.084 0.008 10.545 0.000 0.084 0.991
## .grape_are_lnds 0.000 0.000 0.000
## .crop_yield 1.022 244.550 0.004 0.997 0.037 0.037
## .mean_temp 1.430 0.139 10.318 0.000 0.330 0.330
## mean_rainfall 54997.374 5241.285 10.493 0.000 1.000 1.000
## soil_cond 0.397 0.082 4.835 0.000 1.000 1.000
## .mean_solar 1.184 0.159 7.446 0.000 0.489 0.489
## .variety_area -1.055 0.794 -1.328 0.184 -2.609 -2.609
## land_use 3.889 0.366 10.628 0.000 1.000 1.000
semPaths(fit_1, style="lisrel",
whatLabels = "std", edge.label.cex = .6, label.prop=0.9,
rotation = 4, equalizeManifests = FALSE, optimizeLatRes = TRUE,
node.width = 1.5, edge.width = 0.5, shapeMan = "rectangle",
shapeLat = "ellipse", shapeInt = "triangle",
sizeMan = 4, sizeInt = 2, sizeLat = 4,
curve=2, unCol = "#070b8c")
fitMeasures(fit_1, c("pvalue", "rmsea", "gfi", "nfi", "ifi", "cfi"))
## pvalue rmsea gfi nfi ifi cfi
## 0.000 0.186 0.785 0.848 0.863 0.862
# modindices(fit_1)
efa_data <- model_data %>%
dplyr::select(-lon, -lat, -state, -lu_bulk_density, -lu_clay, -lu_ph, -lu_phosphorus, -lu_silt, -yield, -lu_nitrogen)
# cost too much time
# parallel <- fa.parallel(cvd_vars_2, fm = 'minres', fa = 'fa')
# plot(parallel)
# Exploratory Factor Analysis (EFA) is a statistical technique that is used to identify the latent relational structure
# among a set of variables and narrow down to smaller number of variables. This essentially means that the variance of
# large number of variables can be described by few summary variables
twofactor <- fa(efa_data, nfactors = 2, rotate = "oblimin", fm="minres")
## Loading required namespace: GPArotation
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
print(twofactor)
## Factor Analysis using method = minres
## Call: fa(r = efa_data, nfactors = 2, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## lu_mean_temp_annual 0.73 -0.09 0.512 0.488 1.0
## lu_mean_rain_annual -0.69 -0.11 0.528 0.472 1.1
## lu_mean_solar_annual 0.78 -0.09 0.586 0.414 1.0
## grape_area_landuse 0.15 0.66 0.512 0.488 1.1
## lu_carbon -0.92 0.01 0.845 0.155 1.0
## lu_sand 0.32 -0.02 0.096 0.904 1.0
## lu_water_capacity 0.72 0.15 0.601 0.399 1.1
## chardonnay_area -0.05 1.03 1.031 -0.031 1.0
## shiraz_area 0.18 0.15 0.068 0.932 2.0
## colombard_area 0.04 0.82 0.696 0.304 1.0
## tempranillo_area -0.01 0.83 0.678 0.322 1.0
## cabernet_sauvignon_area 0.14 0.12 0.044 0.956 2.0
##
## MR1 MR2
## SS loadings 3.22 2.98
## Proportion Var 0.27 0.25
## Cumulative Var 0.27 0.52
## Proportion Explained 0.52 0.48
## Cumulative Proportion 0.52 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.27
## MR2 0.27 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 factors are sufficient.
##
## The degrees of freedom for the null model are 66 and the objective function was 11.63 with Chi Square of 2489.85
## The degrees of freedom for the model are 43 and the objective function was 5.03
##
## The root mean square of the residuals (RMSR) is 0.15
## The df corrected root mean square of the residuals is 0.19
##
## The harmonic number of observations is 220 with the empirical chi square 674.32 with prob < 2.4e-114
## The total number of observations was 220 with Likelihood Chi Square = 1069.92 with prob < 3.7e-196
##
## Tucker Lewis Index of factoring reliability = 0.346
## RMSEA index = 0.329 and the 90 % confidence intervals are 0.313 0.348
## BIC = 838
## Fit based upon off diagonal values = 0.85
## Visualise the factor anaysis
fa.diagram(twofactor)
# ------------------------------------------------------------------------
threefactor <- fa(efa_data, nfactors = 3, rotate = "oblimin", fm="minres")
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## An ultra-Heywood case was detected. Examine the results carefully
print(threefactor)
## Factor Analysis using method = minres
## Call: fa(r = efa_data, nfactors = 3, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 h2 u2 com
## lu_mean_temp_annual 0.76 -0.05 -0.08 0.542 0.4583 1.0
## lu_mean_rain_annual -0.69 -0.10 -0.04 0.526 0.4744 1.0
## lu_mean_solar_annual 0.80 -0.06 -0.05 0.606 0.3937 1.0
## grape_area_landuse 0.02 0.51 0.75 0.960 0.0400 1.8
## lu_carbon -0.91 0.02 -0.05 0.840 0.1598 1.0
## lu_sand 0.31 -0.03 0.03 0.096 0.9037 1.0
## lu_water_capacity 0.72 0.13 0.05 0.598 0.4021 1.1
## chardonnay_area -0.03 0.99 0.09 1.004 -0.0038 1.0
## shiraz_area 0.03 -0.14 0.96 0.904 0.0958 1.0
## colombard_area 0.08 0.90 -0.13 0.829 0.1708 1.1
## tempranillo_area 0.03 0.87 -0.09 0.759 0.2414 1.0
## cabernet_sauvignon_area 0.01 -0.12 0.78 0.588 0.4118 1.1
##
## MR1 MR2 MR3
## SS loadings 3.17 2.93 2.15
## Proportion Var 0.26 0.24 0.18
## Cumulative Var 0.26 0.51 0.69
## Proportion Explained 0.38 0.35 0.26
## Cumulative Proportion 0.38 0.74 1.00
##
## With factor correlations of
## MR1 MR2 MR3
## MR1 1.00 0.25 0.16
## MR2 0.25 1.00 0.17
## MR3 0.16 0.17 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 66 and the objective function was 11.63 with Chi Square of 2489.85
## The degrees of freedom for the model are 33 and the objective function was 2.13
##
## The root mean square of the residuals (RMSR) is 0.07
## The df corrected root mean square of the residuals is 0.09
##
## The harmonic number of observations is 220 with the empirical chi square 126.31 with prob < 7.6e-13
## The total number of observations was 220 with Likelihood Chi Square = 452.89 with prob < 3e-75
##
## Tucker Lewis Index of factoring reliability = 0.65
## RMSEA index = 0.24 and the 90 % confidence intervals are 0.222 0.261
## BIC = 274.9
## Fit based upon off diagonal values = 0.97
fa.diagram(threefactor)
# ------------------------------------------------------------------------
fourfactor <- fa(efa_data, nfactors = 4, rotate = "oblimin", fm="minres")
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## An ultra-Heywood case was detected. Examine the results carefully
print(fourfactor)
## Factor Analysis using method = minres
## Call: fa(r = efa_data, nfactors = 4, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR3 MR4 MR1 h2 u2 com
## lu_mean_temp_annual 0.01 -0.03 0.93 0.03 0.83 0.1713 1.0
## lu_mean_rain_annual -0.01 0.01 0.09 1.05 1.01 -0.0093 1.0
## lu_mean_solar_annual -0.01 0.01 0.88 -0.05 0.81 0.1910 1.0
## grape_area_landuse 0.51 0.76 0.02 0.01 0.97 0.0262 1.8
## lu_carbon 0.02 -0.06 -0.40 0.63 0.80 0.1996 1.7
## lu_sand 0.01 0.06 0.42 0.08 0.16 0.8446 1.1
## lu_water_capacity 0.10 0.05 0.18 -0.67 0.67 0.3342 1.2
## chardonnay_area 0.99 0.09 -0.01 0.01 1.01 -0.0067 1.0
## shiraz_area -0.14 0.96 -0.01 -0.04 0.90 0.1014 1.0
## colombard_area 0.90 -0.13 0.02 -0.08 0.83 0.1729 1.1
## tempranillo_area 0.87 -0.09 0.02 -0.03 0.76 0.2390 1.0
## cabernet_sauvignon_area -0.12 0.78 0.00 -0.01 0.59 0.4108 1.1
##
## MR2 MR3 MR4 MR1
## SS loadings 2.91 2.16 2.13 2.12
## Proportion Var 0.24 0.18 0.18 0.18
## Cumulative Var 0.24 0.42 0.60 0.78
## Proportion Explained 0.31 0.23 0.23 0.23
## Cumulative Proportion 0.31 0.54 0.77 1.00
##
## With factor correlations of
## MR2 MR3 MR4 MR1
## MR2 1.00 0.17 0.15 -0.24
## MR3 0.17 1.00 0.09 -0.16
## MR4 0.15 0.09 1.00 -0.47
## MR1 -0.24 -0.16 -0.47 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 66 and the objective function was 11.63 with Chi Square of 2489.85
## The degrees of freedom for the model are 24 and the objective function was 0.79
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 220 with the empirical chi square 9.89 with prob < 0.99
## The total number of observations was 220 with Likelihood Chi Square = 166.52 with prob < 2.7e-23
##
## Tucker Lewis Index of factoring reliability = 0.836
## RMSEA index = 0.164 and the 90 % confidence intervals are 0.142 0.189
## BIC = 37.08
## Fit based upon off diagonal values = 1
fa.diagram(fourfactor)