SEM Analysis - Geographical Factors and Agriculture Productivity

1. Introduction

The first grape planting materials arrived in Australia in 1788 and settled in white. Today, grapes are grown commercially in all states and territories. Viticulture (wine, raisins and table) is Australia’s largest fruit industry, and its production environment ranges from temperate to tropical. Among the three viticulture industries, wine grape production and wine making are the largest and most important (David n.d.). . According to Australian Wine Grape Production Projections, released by the Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES), production is projected to increase further in 2050 to 2 million tones (Department of Agriculture, Water and the Environment ABARES 2020).

2. Background

In order to support investors, agricultural operators and decision makers in the decision-making process, we investigated the impact of climate change on the value of agricultural land in AT2B. The value of agricultural land is sufficient for the yield of crops, that is, the yield of tons or hectoliters of product per millimeter of land for any given agricultural commodity. Crop yield is an indicator of the production efficiency of a certain commodity in a given area. Both internal and external factors affect the production of crops. In AT2B, Data Geeks applied a linear regression model to predict the annual crop yield of a given area based on geographic factors, so it is a response variable. The purpose of our analysis is to discover which factors are most influential and how the grape production responds to changes in these factors.

3. Current Study

In AT2B, we used model-based selection which is the purely algorithmic selection, and we also used rational-based subset selection where predictor variables were divided into chunks for variables by their nature - rain, temperature, solar radiation and soil. For the rational-based subsets selection we used the diagnostic correlation plots, correlation values, our findings from the EDA and AIC to reduce multicollinearity. The result was acceptable based on previous research question, but there is room can be improved if there was more given time. we could potentially improve regression fit by expressing yield (the response variable) in log units, we could standardize the predictor variables given the range of values vary quite considerably.

In current study, new model structural equation modeling (SEM) will be applied to test grape productivity. Structural equation modelling (SEM) is multivariate quantitative technique employed to describe the relationships among observed variables. The technique helps the researcher to test or validate a theoretical model for theory testing and extension. The multivariate analysis is conducted with an objective to help the researcher for an in-depth explanatory analysis with a required statistical efﬁciency (Thakkar, J.J 2020). The software package that will be used in this study is Lavvan, which is a structural equation modeling software package for statistical calculations implemented in the R system. Lavaan stands for ‘latent variable analysis’, and its name reveals the long-term goal: to provide a set of tools that can be used to explore, estimate and understand various latent variable models, including factor analysis, structural equations, longitudinal and so on. (Yves, 2012).

Depending on the current performance of the farmer and their response to changes in the external environment, the scope for increasing productivity varies from farm to farm. Productivity is affected by farm drivers, such as farm size, management skills, and financial ability to invest in new technologies. Productivity is also affected by external factors beyond the control of farmers. These factors include seasonal conditions, technological progress, government policies, market conditions and infrastructure usage (ABARES 2020). Due to the limitation of dataset, the current study is to mainly attempt to test how climate variability influences grape yield separately. The research question will be:

To what extent temperature affect grape yield?

To what extent rainfall affect grape yield?

To what extent soil attribute affect grape yield?

4. Dataset

Current study continues focusing on the data that we collected in AT2B. Data was collected from a range of publicly available sources and initial analysis was performed to determine the suitability for each set in the model. Table 1 provides a summary of the data that was selected for use in the SEM model. Another reason why I don’t acquire new data is because as per Kenny (2015), for models with about 75 to 200 cases, the chi square test is generally a reasonable measure of fit. Current dataset has fairly good number of observations (222 observations).

5. Analysis

5.1 Fit indices

Researchers used many different fitting statistics to evaluate their confirmability Factor analysis and structural equation modeling (Stephen n.d.). In current study, as a beginner user, I will mainly summarize the most commonly used fitness statistics and suggested cut-off values indicate fitness, such as CFI, RMSEA & SRMR.

5.2 Model1 - Version1

The recent climate change has severely affected the productivity of Australian crops, especially in Southwestern Australia and Southeastern Australia. In Western Australia, climatic conditions between 2000-01 and 2014-15 reduced TFP by an average of 7.7% compared to the long-term average conditions (1914-15 to 2014-15). In New South Wales, climatic conditions after 2000-01 reduced productivity by an average of 6.5% (Hugh & Valle 2017).

## lavaan 0.6-7 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         26
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              4757.407
##   Degrees of freedom                                65
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   rainfall =~                                                           
##     lu_mean_ran_jn    1.000                               0.824    0.826
##     lu_mean_ran_fb    1.127    0.060   18.712    0.000    0.929    0.931
##     lu_mean_ran_mr    1.027    0.064   15.934    0.000    0.846    0.848
##     lu_mean_ran_pr    0.320    0.080    3.978    0.000    0.264    0.264
##     lu_mean_ran_my    0.082    0.082    1.003    0.316    0.068    0.068
##     lu_mean_ran_jn    0.003    0.082    0.042    0.966    0.003    0.003
##     lu_mean_ran_jl    0.127    0.082    1.550    0.121    0.104    0.105
##     lu_mean_rain_g    0.408    0.079    5.133    0.000    0.336    0.337
##     lu_mean_ran_sp    1.036    0.064   16.151    0.000    0.853    0.855
##     lu_mean_ran_ct    1.196    0.057   20.874    0.000    0.985    0.987
##     lu_mean_ran_nv    1.096    0.062   17.771    0.000    0.902    0.905
##     lu_mean_ran_dc   -0.564    0.077   -7.311    0.000   -0.465   -0.466
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     rainfall         -0.300    0.081   -3.718    0.000   -0.247   -0.248
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_ran_jn    0.317    0.031   10.106    0.000    0.317    0.318
##    .lu_mean_ran_fb    0.133    0.015    9.013    0.000    0.133    0.134
##    .lu_mean_ran_mr    0.280    0.028   10.014    0.000    0.280    0.281
##    .lu_mean_ran_pr    0.926    0.088   10.522    0.000    0.926    0.930
##    .lu_mean_ran_my    0.991    0.094   10.535    0.000    0.991    0.995
##    .lu_mean_ran_jn    0.995    0.094   10.536    0.000    0.995    1.000
##    .lu_mean_ran_jl    0.985    0.093   10.534    0.000    0.985    0.989
##    .lu_mean_rain_g    0.883    0.084   10.512    0.000    0.883    0.887
##    .lu_mean_ran_sp    0.268    0.027    9.979    0.000    0.268    0.269
##    .lu_mean_ran_ct    0.026    0.008    3.291    0.001    0.026    0.026
##    .lu_mean_ran_nv    0.181    0.019    9.551    0.000    0.181    0.182
##    .lu_mean_ran_dc    0.780    0.074   10.484    0.000    0.780    0.783
##    .yield             0.934    0.089   10.524    0.000    0.934    0.939
##     rainfall          0.679    0.090    7.538    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 4757.407    0.570    0.361    0.262

Version 1 of model1 is purely focusing on rainfall climate data. None of indices meets the benchmark which means Version 1 does not fit my data, so I need to monitor my models. After research of investigation on the season for grape planting, it’s advised that grapes are planted in the late winter to early spring months. The plants start to grow in spring and continue to grow throughout the summer season. Grapes ripen in the late summer to early fall, depending on the variety grown. Therefore, I separated rainfall data into summer and winter, and it also because std.all indices for winter is relatively higher than winter.

5.2.1 Model1 - Version2

## lavaan 0.6-7 ended normally after 69 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         28
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              3343.420
##   Degrees of freedom                                63
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   rainfall_summer =~                                                      
##     lu_mean_ran_jn      1.000                               0.874    0.876
##     lu_mean_ran_fb      1.115    0.046   24.468    0.000    0.974    0.977
##     lu_mean_ran_mr      0.990    0.054   18.418    0.000    0.865    0.867
##     lu_mean_ran_pr      0.277    0.076    3.648    0.000    0.242    0.243
##     lu_mean_ran_ct      1.081    0.048   22.564    0.000    0.944    0.946
##     lu_mean_ran_nv      1.006    0.053   19.056    0.000    0.879    0.881
##     lu_mean_ran_dc     -0.492    0.072   -6.784    0.000   -0.430   -0.430
##   rainfall_winter =~                                                      
##     lu_mean_ran_my      1.000                               0.981    0.983
##     lu_mean_ran_jn      1.007    0.016   62.785    0.000    0.988    0.990
##     lu_mean_ran_jl      0.989    0.020   48.661    0.000    0.970    0.972
##     lu_mean_rain_g      0.906    0.033   27.246    0.000    0.889    0.891
##     lu_mean_ran_sp      0.401    0.063    6.335    0.000    0.393    0.394
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     rainfall_summr   -0.276    0.067   -4.102    0.000   -0.242   -0.242
##     rainfall_wintr    0.468    0.059    7.908    0.000    0.459    0.460
## 
## Covariances:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   rainfall_summer ~~                                                      
##     rainfall_wintr      0.014    0.059    0.238    0.812    0.016    0.016
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_ran_jn    0.232    0.024    9.605    0.000    0.232    0.233
##    .lu_mean_ran_fb    0.046    0.010    4.712    0.000    0.046    0.046
##    .lu_mean_ran_mr    0.247    0.026    9.683    0.000    0.247    0.249
##    .lu_mean_ran_pr    0.937    0.089   10.519    0.000    0.937    0.941
##    .lu_mean_ran_ct    0.104    0.013    7.843    0.000    0.104    0.105
##    .lu_mean_ran_nv    0.223    0.023    9.551    0.000    0.223    0.224
##    .lu_mean_ran_dc    0.811    0.077   10.474    0.000    0.811    0.815
##    .lu_mean_ran_my    0.033    0.005    6.920    0.000    0.033    0.033
##    .lu_mean_ran_jn    0.019    0.004    4.669    0.000    0.019    0.019
##    .lu_mean_ran_jl    0.054    0.006    8.555    0.000    0.054    0.054
##    .lu_mean_rain_g    0.206    0.020   10.134    0.000    0.206    0.207
##    .lu_mean_ran_sp    0.841    0.080   10.517    0.000    0.841    0.845
##    .yield             0.730    0.070   10.485    0.000    0.730    0.734
##     rainfall_summr    0.763    0.092    8.258    0.000    1.000    1.000
##     rainfall_wintr    0.962    0.095   10.182    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 3343.420    0.484    0.320    0.484

## Chi-Squared Difference Test
## 
##       Df  AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## std_1 63 5140 5235.3 3343.4                                  
## std   65 6550 6638.5 4757.4       1414       2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In version 2 of model1, after I separated rainfall into difference season, the model is better fit than version 1, although the model still does not fit as all indices are still lower than benchmarks. It could potential because using the proposed cutoff criteria, the ML‐based TLI, Mc, and RMSEA tend to over-reject true‐population models at small sample size and thus are less preferable when sample size is small ((Hu 1999). But, by using package anova(), The model with the two latent ability factors fits the data significantly better than a model with only a single latent factor for general ability, χ2(2)=1414, p<.001.

5.3 Model2 version 1

## lavaan 0.6-7 ended normally after 76 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         26
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              6640.612
##   Degrees of freedom                                65
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Tempreture =~                                                         
##     lu_mean_tmp_nv    1.000                               0.885    0.887
##     lu_mean_tmp_dc   -0.308    0.073   -4.188    0.000   -0.272   -0.273
##     lu_mean_tmp_jn    0.981    0.051   19.356    0.000    0.867    0.869
##     lu_mean_tmp_fb    1.073    0.044   24.315    0.000    0.949    0.951
##     lu_mean_tmp_mr    1.130    0.039   28.786    0.000    1.000    1.002
##     lu_mean_tmp_pr    1.088    0.043   25.322    0.000    0.962    0.964
##     lu_mean_tmp_my    1.016    0.048   21.013    0.000    0.899    0.901
##     lu_mean_tmp_jn    0.994    0.050   19.948    0.000    0.879    0.881
##     lu_mean_tmp_jl    1.052    0.046   22.976    0.000    0.930    0.932
##     lu_mean_temp_g    1.084    0.043   25.044    0.000    0.959    0.961
##     lu_mean_tmp_sp    1.058    0.045   23.324    0.000    0.935    0.938
##     lu_mean_tmp_ct    1.032    0.047   21.878    0.000    0.913    0.915
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     Tempreture       -0.065    0.075   -0.865    0.387   -0.058   -0.058
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_tmp_nv    0.213    0.020   10.693    0.000    0.213    0.214
##    .lu_mean_tmp_dc    0.921    0.087   10.540    0.000    0.921    0.926
##    .lu_mean_tmp_jn    0.243    0.023   10.673    0.000    0.243    0.244
##    .lu_mean_tmp_fb    0.095    0.009   10.822    0.000    0.095    0.095
##    .lu_mean_tmp_mr   -0.004    0.001   -3.168    0.002   -0.004   -0.004
##    .lu_mean_tmp_pr    0.070    0.006   10.836    0.000    0.070    0.070
##    .lu_mean_tmp_my    0.188    0.018   10.714    0.000    0.188    0.189
##    .lu_mean_tmp_jn    0.223    0.021   10.686    0.000    0.223    0.224
##    .lu_mean_tmp_jl    0.130    0.012   10.776    0.000    0.130    0.131
##    .lu_mean_temp_g    0.077    0.007   10.836    0.000    0.077    0.077
##    .lu_mean_tmp_sp    0.120    0.011   10.789    0.000    0.120    0.121
##    .lu_mean_tmp_ct    0.162    0.015   10.740    0.000    0.162    0.162
##    .yield             0.992    0.094   10.536    0.000    0.992    0.997
##     Tempreture        0.782    0.092    8.483    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 6640.612    0.675    0.128    0.403

version 2

## lavaan 0.6-7 ended normally after 91 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         28
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              5778.125
##   Degrees of freedom                                63
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   summer_Tempreture =~                                                      
##     lu_mean_tmp_nv        1.000                               0.970    0.972
##     lu_mean_tmp_dc       -0.381    0.064   -5.925    0.000   -0.370   -0.371
##     lu_mean_tmp_jn        1.006    0.022   46.354    0.000    0.976    0.978
##     lu_mean_tmp_fb        1.030    0.017   62.363    0.000    0.999    1.001
##     lu_mean_tmp_mr        0.977    0.027   36.537    0.000    0.948    0.950
##     lu_mean_tmp_pr        0.868    0.040   21.952    0.000    0.842    0.844
##     lu_mean_tmp_ct        0.989    0.025   39.913    0.000    0.960    0.962
##   winter_Tempreture =~                                                      
##     lu_mean_tmp_my        1.000                               0.961    0.963
##     lu_mean_tmp_jn        1.010    0.025   40.995    0.000    0.971    0.973
##     lu_mean_tmp_jl        1.042    0.018   56.529    0.000    1.002    1.004
##     lu_mean_temp_g        1.001    0.026   38.261    0.000    0.962    0.964
##     lu_mean_tmp_sp        0.881    0.040   21.993    0.000    0.847    0.849
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     summer_Temprtr   -0.971    0.088  -11.096    0.000   -0.942   -0.944
##     winter_Temprtr    0.929    0.088   10.525    0.000    0.893    0.895
## 
## Covariances:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   summer_Tempreture ~~                                                      
##     winter_Temprtr        0.729    0.081    8.960    0.000    0.782    0.782
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_tmp_nv    0.055    0.005   10.345    0.000    0.055    0.055
##    .lu_mean_tmp_dc    0.859    0.081   10.539    0.000    0.859    0.863
##    .lu_mean_tmp_jn    0.043    0.004   10.130    0.000    0.043    0.043
##    .lu_mean_tmp_fb   -0.002    0.001   -1.194    0.232   -0.002   -0.002
##    .lu_mean_tmp_mr    0.097    0.009   10.559    0.000    0.097    0.098
##    .lu_mean_tmp_pr    0.286    0.027   10.575    0.000    0.286    0.287
##    .lu_mean_tmp_ct    0.075    0.007   10.498    0.000    0.075    0.075
##    .lu_mean_tmp_my    0.072    0.006   11.122    0.000    0.072    0.072
##    .lu_mean_tmp_jn    0.053    0.005   10.962    0.000    0.053    0.053
##    .lu_mean_tmp_jl   -0.008    0.002   -5.130    0.000   -0.008   -0.008
##    .lu_mean_temp_g    0.070    0.006   11.115    0.000    0.070    0.070
##    .lu_mean_tmp_sp    0.278    0.026   10.851    0.000    0.278    0.279
##    .yield             0.625    0.058   10.732    0.000    0.625    0.628
##     summer_Temprtr    0.941    0.094    9.969    0.000    1.000    1.000
##     winter_Temprtr    0.924    0.094    9.807    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 5778.125    0.639    0.143    0.481

In model2, I mainly focus on temperature data. In the version 1 of Model2, it appears similar result with version 1 of Model1, then I applied same logic with Model1 which separated data into difference seasons. As result of version 2, although rmsea is remains relatively high, but srmr and cfi become better that version 1. However, the rmsea, srmr & cfi are 0.639, 0.143 and 0.481 respectively. But surprisingly, the summer temperature and winter temperature are both highly correlated with grape yield.

5.4 Model3

## lavaan 0.6-7 ended normally after 53 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         32
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              4762.745
##   Degrees of freedom                                88
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   summer_Tempreture =~                                                      
##     lu_mean_tmp_nv        1.000                               0.982    0.984
##     lu_mean_tmp_dc       -0.412    0.063   -6.566    0.000   -0.404   -0.405
##     lu_mean_tmp_jn        0.996    0.018   54.756    0.000    0.978    0.980
##     lu_mean_tmp_fb        1.011    0.014   70.821    0.000    0.992    0.995
##     lu_mean_tmp_mr        0.950    0.027   35.233    0.000    0.932    0.935
##     lu_mean_tmp_pr        0.831    0.041   20.417    0.000    0.816    0.817
##     lu_mean_tmp_ct        0.991    0.019   51.304    0.000    0.973    0.975
##   rainfall_summer =~                                                        
##     lu_mean_ran_jn        1.000                               0.862    0.864
##     lu_mean_ran_fb        1.113    0.050   22.403    0.000    0.959    0.961
##     lu_mean_ran_mr        0.990    0.057   17.398    0.000    0.853    0.855
##     lu_mean_ran_pr        0.267    0.077    3.442    0.001    0.230    0.230
##     lu_mean_ran_ct        1.115    0.050   22.533    0.000    0.961    0.963
##     lu_mean_ran_nv        1.042    0.054   19.289    0.000    0.898    0.900
##     lu_mean_ran_dc       -0.518    0.073   -7.044    0.000   -0.446   -0.447
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     summer_Temprtr   -0.432    0.065   -6.611    0.000   -0.424   -0.425
##     rainfall_summr   -0.462    0.077   -5.992    0.000   -0.398   -0.399
## 
## Covariances:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   summer_Tempreture ~~                                                      
##     rainfall_summr       -0.302    0.062   -4.842    0.000   -0.357   -0.357
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_tmp_nv    0.032    0.004    8.492    0.000    0.032    0.032
##    .lu_mean_tmp_dc    0.832    0.079   10.524    0.000    0.832    0.836
##    .lu_mean_tmp_jn    0.039    0.004    8.929    0.000    0.039    0.039
##    .lu_mean_tmp_fb    0.011    0.002    4.742    0.000    0.011    0.011
##    .lu_mean_tmp_mr    0.126    0.012   10.116    0.000    0.126    0.127
##    .lu_mean_tmp_pr    0.330    0.032   10.415    0.000    0.330    0.332
##    .lu_mean_tmp_ct    0.048    0.005    9.279    0.000    0.048    0.048
##    .lu_mean_ran_jn    0.253    0.026    9.645    0.000    0.253    0.254
##    .lu_mean_ran_fb    0.076    0.012    6.640    0.000    0.076    0.077
##    .lu_mean_ran_mr    0.268    0.028    9.713    0.000    0.268    0.269
##    .lu_mean_ran_pr    0.943    0.090   10.519    0.000    0.943    0.947
##    .lu_mean_ran_ct    0.072    0.011    6.401    0.000    0.072    0.072
##    .lu_mean_ran_nv    0.190    0.021    9.231    0.000    0.190    0.190
##    .lu_mean_ran_dc    0.797    0.076   10.461    0.000    0.797    0.800
##    .yield             0.778    0.074   10.461    0.000    0.778    0.781
##     summer_Temprtr    0.964    0.094   10.203    0.000    1.000    1.000
##     rainfall_summr    0.742    0.092    8.069    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 4762.745    0.489    0.242    0.474

Factor Loadings
Latent Factor	Indicator	B	SE	Z	p-value	Beta
summer_Tempreture	lu_mean_temp_nov	1.000	0.000	NA	NA	0.984
summer_Tempreture	lu_mean_temp_dec	-0.412	0.063	-6.566	0.000	-0.405
summer_Tempreture	lu_mean_temp_jan	0.996	0.018	54.756	0.000	0.980
summer_Tempreture	lu_mean_temp_feb	1.011	0.014	70.821	0.000	0.995
summer_Tempreture	lu_mean_temp_mar	0.950	0.027	35.233	0.000	0.935
summer_Tempreture	lu_mean_temp_apr	0.831	0.041	20.417	0.000	0.817
summer_Tempreture	lu_mean_temp_oct	0.991	0.019	51.304	0.000	0.975
rainfall_summer	lu_mean_rain_jan	1.000	0.000	NA	NA	0.864
rainfall_summer	lu_mean_rain_feb	1.113	0.050	22.403	0.000	0.961
rainfall_summer	lu_mean_rain_mar	0.990	0.057	17.398	0.000	0.855
rainfall_summer	lu_mean_rain_apr	0.267	0.077	3.442	0.001	0.230
rainfall_summer	lu_mean_rain_oct	1.115	0.050	22.533	0.000	0.963
rainfall_summer	lu_mean_rain_nov	1.042	0.054	19.289	0.000	0.900
rainfall_summer	lu_mean_rain_dec	-0.518	0.073	-7.044	0.000	-0.447

In Model3, I decided to only select to Summer rainfall and temperature data and created latent variables for summer rainfall and summer temperature. By checking the Factor loading table, we can tell that the latent variable is highly correlated with chosen indicators, but the latent variables show both negative impact on grape yield and shared similar std.all. However, rmsea, srmr and cfi are still do not meet benchmark. ### 5.5 Model4

## lavaan 0.6-7 ended normally after 119 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         40
##                                                       
##   Number of observations                           222
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              4429.309
##   Degrees of freedom                                95
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   summer_Tempreture =~                                                      
##     lu_mean_tmp_nv        1.000                               0.993    0.998
##     lu_mean_tmp_dc       -1.135    0.046  -24.559    0.000   -1.127   -0.704
##     lu_mean_tmp_jn        1.096    0.011  102.576    0.000    1.089    0.979
##     lu_mean_tmp_fb        0.982    0.014   69.936    0.000    0.976    0.979
##     lu_mean_tmp_mr        1.035    0.024   43.927    0.000    1.029    0.912
##     lu_mean_tmp_pr        0.814    0.038   21.342    0.000    0.809    0.776
##     lu_mean_tmp_ct        0.986    0.013   77.305    0.000    0.980    0.983
##   rainfall_summer =~                                                        
##     lu_mean_ran_jn        1.000                               0.770    0.821
##     lu_mean_ran_fb        1.241    0.046   27.126    0.000    0.956    0.956
##     lu_mean_ran_mr        1.128    0.047   23.802    0.000    0.869    0.867
##     lu_mean_ran_pr        0.440    0.074    5.943    0.000    0.339    0.332
##     lu_mean_ran_ct        1.261    0.043   29.090    0.000    0.972    0.972
##     lu_mean_ran_nv        1.113    0.055   20.085    0.000    0.857    0.880
##     lu_mean_ran_dc       -0.874    0.055  -15.917    0.000   -0.673   -0.594
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   yield ~                                                               
##     summer_Temprtr   -0.438    0.059   -7.439    0.000   -0.435   -0.454
##     rainfall_summr   -0.383    0.077   -4.956    0.000   -0.295   -0.308
##     lu_ph            -0.257    0.055   -4.677    0.000   -0.257   -0.267
## 
## Covariances:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .lu_mean_temp_jan ~~                                                       
##    .lu_mean_ran_jn       -0.100    0.011   -9.179    0.000   -0.100   -0.817
##  .lu_mean_temp_feb ~~                                                       
##    .lu_mean_ran_fb        0.002    0.005    0.464    0.642    0.002    0.036
##  .lu_mean_temp_mar ~~                                                       
##    .lu_mean_ran_mr        0.166    0.020    8.452    0.000    0.166    0.717
##  .lu_mean_temp_apr ~~                                                       
##    .lu_mean_ran_pr        0.343    0.048    7.079    0.000    0.343    0.541
##  .lu_mean_temp_oct ~~                                                       
##    .lu_mean_ran_ct       -0.005    0.004   -1.457    0.145   -0.005   -0.128
##  .lu_mean_temp_nov ~~                                                       
##    .lu_mean_ran_nv        0.012    0.003    3.653    0.000    0.012    0.473
##  .lu_mean_temp_dec ~~                                                       
##    .lu_mean_ran_dc        0.856    0.091    9.437    0.000    0.856    0.824
##   summer_Tempreture ~~                                                      
##     rainfall_summr       -0.260    0.056   -4.673    0.000   -0.340   -0.340
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .lu_mean_tmp_nv    0.003    0.001    2.874    0.004    0.003    0.003
##    .lu_mean_tmp_dc    1.296    0.123   10.516    0.000    1.296    0.505
##    .lu_mean_tmp_jn    0.052    0.005    9.854    0.000    0.052    0.042
##    .lu_mean_tmp_fb    0.041    0.004    9.955    0.000    0.041    0.041
##    .lu_mean_tmp_mr    0.215    0.021   10.432    0.000    0.215    0.169
##    .lu_mean_tmp_pr    0.432    0.041   10.505    0.000    0.432    0.398
##    .lu_mean_tmp_ct    0.033    0.003    9.780    0.000    0.033    0.033
##    .lu_mean_ran_jn    0.288    0.029   10.015    0.000    0.288    0.327
##    .lu_mean_ran_fb    0.086    0.011    8.025    0.000    0.086    0.086
##    .lu_mean_ran_mr    0.248    0.025    9.802    0.000    0.248    0.247
##    .lu_mean_ran_pr    0.929    0.088   10.508    0.000    0.929    0.890
##    .lu_mean_ran_ct    0.055    0.008    6.455    0.000    0.055    0.055
##    .lu_mean_ran_nv    0.214    0.022    9.723    0.000    0.214    0.226
##    .lu_mean_ran_dc    0.833    0.080   10.412    0.000    0.833    0.648
##    .yield             0.664    0.063   10.499    0.000    0.664    0.723
##     summer_Temprtr    0.987    0.094   10.507    0.000    1.000    1.000
##     rainfall_summr    0.593    0.067    8.886    0.000    1.000    1.000

##    chisq    rmsea     srmr      cfi 
## 4429.309    0.453    0.317    0.518

Based on research done in AT2B, I also would like to test how soil attributes affect grape yield. However, there is an error message appears which might due to lack of soil data in the dataset. So, I decide to add single variable lu_ph into regression formula, which leads my model4. Surprisingly, lu_ph has the lowest std.all indices which is -0.267.

6. Conclusion and reflection

Within the simple SME model 1 and model 2, both version 2 are better fit that version 1 after removing insignificant indicators.However, the upgraded version still does not fit the model very well which leads a possibility, that is the mean of rainfall and temperature might not be the best option to run with SEM model. It could be better if we could have captured prolonged period of high/low temperature and rainfall which was suggested by teaching team in AT2B.

Lack of expert domain knowledge and repeated testing of hypothesis are the main limitations of the model. Other than climates factors, there are plenty of other factors that have significant impact on agriculture productive. The factors compromise the size of vineyards, farmer management skills, types of fertilizes, method of irrigation and so on.Those other factors should be also to be considered in the future analysis.

7. Reflection

Broader literature should have been reviewed with more time given to achieve a better model, not only the SEM and R package knowledge, but also the domain knowledge which would give me a better idea of how to structure data collection and create SEM model in this particular field. I have replied on the work we done in AT2 quite a lot which might have limited the exploration of advanced model. This report could be directed to a whole new and interesting dimension if above limitations could have done differently.

Reference

ABARES 2020, Productivity drivers, Department of Agriculture, Water and the Environment, https://www.agriculture.gov.au/abares/research-topics/productivity/productivity-drivers view 5 Nov 2020 David, O n.d. Grape Production in Australia, http://www.fao.org/3/x6897e/x6897e04.htm view 01 Nov 2020 Hu, L. & Bentler, P.M. 1999, ‘Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives’, Structural Equation Modeling: A Multidisciplinary Journal, vol. 6, no. 1, pp. 1–55.

Hughes, N, Lawson, K & Valle, H 2017, Farm performance and climate: Climate-adjusted productivity for broadacre cropping farms, Canberra, April. http://data.daff.gov.au/data/warehouse/9aas/2017/FarmPerformanceClimate/FarmPerformanceClimate_v1.0.0.pdf viewed 3 Nov 2020

Kenny, D. A. 2015. SEM: Fit, viewed 1 Nov 2020, http://davidakenny.net/cm/fit.htm

Stephen.P n.d. Fit Indices commonly reported for CFA and SEM, Cornell Statistical Consulting Unit, viewed at 2 Nov 2020

Thakkar, Jitesh J.. Structural Equation Modelling : Application for Research and Practice (with AMOS and R), Springer Singapore Pte. Limited, 2020. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/uts/detail.action?docID=6134018. Viewed 1 Nov 2020

Yves, R 2012, lavaan: An R Package for Structural Equation Modeling, Journal of Statistical Software, Volume 48, Issue 2, viewed 5 Nov 2020

Appendix

coding

library(lavaan) library(semPlot) library(tidyverse) library(dplyr) library(lavaan) library(semPlot) library(semTools) library(qgraph) library(tidyr) library(knitr) library(BBmisc)

yield_by_region <- read.csv(here::here(“yield_by_region.csv”), header = TRUE)

yield_by_region_nor <- normalize(yield_by_region, method = “standardize”, range = c(0, 1), margin = 1L,on.constant = “quiet”)

—– Model1 - Version1 model1 <- ’ # latent variable model rainfall =~

lu_mean_rain_jan + 
lu_mean_rain_feb +
lu_mean_rain_mar +
lu_mean_rain_apr +
lu_mean_rain_may +
lu_mean_rain_jun +
lu_mean_rain_jul +
lu_mean_rain_aug +
lu_mean_rain_sep +
lu_mean_rain_oct +
lu_mean_rain_nov + 
lu_mean_rain_dec

# regressions

yield ~ rainfall

# residual correlations (covariances)

’ std <- sem(model1,data=yield_by_region_nor) semPaths(std,what=“paths”, whatLabels = “stand”, rotation = 1) #Model1 summary(std, standardized=TRUE) fitMeasures(std, c(“chisq”, “rmsea”, “srmr”, “cfi”))

—–Model1 - Version2 model1_1 <- ’ # latent variable model rainfall_summer =~

lu_mean_rain_jan + 
lu_mean_rain_feb +
lu_mean_rain_mar +
lu_mean_rain_apr +
lu_mean_rain_oct +
lu_mean_rain_nov + 
lu_mean_rain_dec 

rainfall_winter =~
lu_mean_rain_may +
lu_mean_rain_jun +
lu_mean_rain_jul +
lu_mean_rain_aug +
lu_mean_rain_sep

# regressions

yield ~ rainfall_summer + rainfall_winter

# residual correlations (covariances)

’

std_1 <- sem(model1_1,data=yield_by_region_nor) semPaths(std_1,what=“paths”, whatLabels = “stand”, rotation = 1) #Model1 summary(std_1, standardized=TRUE) fitMeasures(std_1, c(“chisq”, “rmsea”, “srmr”, “cfi”)) anova(std,std_1)

—-Model2 version 1 model2 <- ’ # latent variable model Tempreture =~

lu_mean_temp_nov + lu_mean_temp_dec + lu_mean_temp_jan + lu_mean_temp_feb + lu_mean_temp_mar + lu_mean_temp_apr + lu_mean_temp_may + lu_mean_temp_jun + lu_mean_temp_jul + lu_mean_temp_aug + lu_mean_temp_sep + lu_mean_temp_oct

# regressions yield ~ Tempreture

# residual correlations (covariances)

’ std2=sem(model2,data=yield_by_region_nor) semPaths(std2,what=“paths”, whatLabels = “stand”, rotation = 1) summary(std2, standardized=TRUE) fitMeasures(std2, c(“chisq”, “rmsea”, “srmr”, “cfi”))

—– Model2 version 1 model2_1 <- ’ # latent variable model summer_Tempreture =~

lu_mean_temp_nov + lu_mean_temp_dec + lu_mean_temp_jan + lu_mean_temp_feb + lu_mean_temp_mar + lu_mean_temp_apr + lu_mean_temp_oct

winter_Tempreture =~ lu_mean_temp_may + lu_mean_temp_jun + lu_mean_temp_jul + lu_mean_temp_aug + lu_mean_temp_sep

# regressions yield ~ summer_Tempreture + winter_Tempreture

# residual correlations (covariances)

’ std2_1=sem(model2_1,data=yield_by_region_nor) semPaths(std2_1,what=“paths”, whatLabels = “stand”, rotation = 1) summary(std2_1, standardized=TRUE) fitMeasures(std2_1, c(“chisq”, “rmsea”, “srmr”, “cfi”))

—— Model3 model4 <- ’ # latent variable model summer_Tempreture =~

lu_mean_temp_nov + lu_mean_temp_dec + lu_mean_temp_jan + lu_mean_temp_feb + lu_mean_temp_mar + lu_mean_temp_apr + lu_mean_temp_oct

rainfall_summer =~

lu_mean_rain_jan + 
lu_mean_rain_feb +
lu_mean_rain_mar +
lu_mean_rain_apr +
lu_mean_rain_oct +
lu_mean_rain_nov + 
lu_mean_rain_dec

# regressions yield ~ summer_Tempreture + rainfall_summer

# residual correlations (covariances)

’ std4=sem(model4,data=yield_by_region_nor) semPaths(std4,what=“paths”, whatLabels = “stand”, rotation = 1) summary(std4, standardized=TRUE) fitMeasures(std4, c(“chisq”, “rmsea”, “srmr”, “cfi”))

parameterEstimates(std4, standardized=TRUE) %>% filter(op == “=~”) %>% dplyr::select(‘Latent Factor’=lhs, Indicator=rhs, B=est, SE=se, Z=z, ‘p-value’=pvalue, Beta=std.all) %>% kable(digits = 3, format=“pandoc”, caption=“Factor Loadings”)

—–Model4 model4_1 <- ’ # latent variable model

summer_Tempreture =~

lu_mean_temp_nov + lu_mean_temp_dec + lu_mean_temp_jan + lu_mean_temp_feb + lu_mean_temp_mar + lu_mean_temp_apr + lu_mean_temp_oct

rainfall_summer =~

lu_mean_rain_jan + 
lu_mean_rain_feb +
lu_mean_rain_mar +
lu_mean_rain_apr +
lu_mean_rain_oct +
lu_mean_rain_nov + 
lu_mean_rain_dec

# regressions yield ~ summer_Tempreture + rainfall_summer + lu_ph

# residual correlations (covariances) lu_mean_temp_jan ~~ lu_mean_rain_jan lu_mean_temp_feb ~~ lu_mean_rain_feb lu_mean_temp_mar ~~ lu_mean_rain_mar lu_mean_temp_apr ~~ lu_mean_rain_apr lu_mean_temp_oct ~~ lu_mean_rain_oct lu_mean_temp_nov ~~ lu_mean_rain_nov lu_mean_temp_dec ~~ lu_mean_rain_dec

’ std4_1=sem(model4_1,data=yield_by_region_nor) semPaths(std4_1,what=“paths”, whatLabels = “stand”, rotation = 1) summary(std4_1, standardized=TRUE) fitMeasures(std4_1, c(“chisq”, “rmsea”, “srmr”, “cfi”))

SEM Analysis - Geographical Factors and Agriculture Productivity

Wei Lin

8/11/2020

1. Introduction

2. Background

3. Current Study

4. Dataset

5. Analysis

5.1 Fit indices

5.2 Model1 - Version1

5.2.1 Model1 - Version2

5.3 Model2 version 1

version 2

5.4 Model3

6. Conclusion and reflection

7. Reflection

Reference

Appendix

coding