1. Data Preparation

# Load data (panel data) 
data('RiceFarms',  package = 'plm')

stargazer(RiceFarms, type = 'text', title = ' Table 1: Data Summary Statistics')
## 
## Table 1: Data Summary Statistics
## ========================================================
## Statistic    N      Mean      St. Dev.     Min     Max  
## --------------------------------------------------------
## id         1,026 374,954.100 164,378.900 101,001 609,245
## size       1,026    0.432       0.547     0.010   5.322 
## seed       1,026   18.206      45.251       1     1,250 
## urea       1,026   95.441      127.149      1     1,250 
## phosphate  1,026   33.728      47.588       0      700  
## pesticide  1,026   595.005    2,927.581     0    62,600 
## pseed      1,026   112.072     64.280    40.000  375.000
## purea      1,026   78.980       8.674    50.000  100.000
## pphosph    1,026   79.568       9.272    60.000  120.000
## hiredlabor 1,026   237.023     422.233      1     4,536 
## famlabor   1,026   151.470     148.116      1     1,526 
## totlabor   1,026   388.447     484.204     17     4,774 
## wage       1,026   80.423      42.189    30.000  175.350
## goutput    1,026  1,405.167   1,921.757    42    20,960 
## noutput    1,026  1,240.920   1,638.983    42    17,610 
## price      1,026   90.961      37.495    50.000  190.000
## --------------------------------------------------------

Comments: Data contains no missing values. However, it may suffer from outliers in some variables.

2. Multi-Linear Regression

# Multi-linear Model 
mlr <- lm(goutput ~ phosphate + 
                    pphosph+
                    size +
                    seed +
                    varieties + 
                    wage +
                    totlabor + 
                    factor(region),
   data = RiceFarms
   )

stargazer(mlr, type = 'text')
## 
## =====================================================
##                               Dependent variable:    
##                           ---------------------------
##                                     goutput          
## -----------------------------------------------------
## phosphate                          10.296***         
##                                     (0.612)          
##                                                      
## pphosph                              1.841           
##                                     (3.791)          
##                                                      
## size                             1,714.079***        
##                                    (109.906)         
##                                                      
## seed                                 0.930           
##                                     (0.594)          
##                                                      
## varietieshigh                     203.785***         
##                                    (77.325)          
##                                                      
## varietiesmixed                     -109.855          
##                                    (104.478)         
##                                                      
## wage                               2.498***          
##                                     (0.849)          
##                                                      
## totlabor                           0.981***          
##                                     (0.110)          
##                                                      
## factor(region)langan               141.568*          
##                                    (84.964)          
##                                                      
## factor(region)gunungwangi           -45.203          
##                                    (102.699)         
##                                                      
## factor(region)malausma              -52.304          
##                                    (104.870)         
##                                                      
## factor(region)sukaambit             11.254           
##                                    (99.470)          
##                                                      
## factor(region)ciwangi               127.068          
##                                    (105.914)         
##                                                      
## Constant                          -508.665**         
##                                    (257.105)         
##                                                      
## -----------------------------------------------------
## Observations                         1,026           
## R2                                   0.878           
## Adjusted R2                          0.877           
## Residual Std. Error           674.987 (df = 1012)    
## F Statistic               561.280*** (df = 13; 1012) 
## =====================================================
## Note:                     *p<0.1; **p<0.05; ***p<0.01

Comments: Interpretation of the coefficients are the following and only coefficients that are statistically significant will be interpreted.

  1. Increase in One kilogram of phosphate used in production, will lead to 10 kilograms of increase in gross rice output.
  2. Each increase in hectare of the farm, will increase the gross rice output by 1714 kilograms
  3. On average, high yielding variaty will increase the gross rice output by 203.7 kilograms comparing to traditional variety.
  4. One Rupiah increase in wage will increase the gross rice output by 2.49 kilograms
  5. Each hour increase in total labor hours hired will increase the output by 0.98 kilograms.
  6. Comparing to Wargabinangun, region Langan will have a higher gross output by 141.5 kilograms.

3. Residual Analysis

# plot residual graphs 
par(mfrow = c(2, 2))
plot(mlr)

Comments:

  1. Residual vs. Fitted:
    • Residuals show a sign of fanning out which can suggest heteroskedasticity exist in our model. The fitted line also concaved up which suggests that the variance of the residuals are not constant.
  2. Normal Q-Q:
    • Majority of the residual dots located on the dashed line are within the -/+2 standard deviation unit which means that 95% of residuals are on the line. This can allow us to conclude that the distribution of the residual are either normal or close enough to be normally distributed.
  3. Scale-Location:
    • Quite a lot residual dots are scattered above the threshold of 1 standard deviation away from the mean. This is an signal of influence from outliers.
  4. Residuals vs. Leverage:
    • There are 5 large influential residual points in this graph that are causing the mean line of standard deviation to deviate from the center of 0. The data entries that corresponds to these residual points need to be removed in order to reduce bias in this model.

4. Gauss-Markov Assumptions