Title: “The impact of R&D expenditure on technology spillover and industrial growth”

Research Questions

  1. How has the expenditures on R&D impacted technological spillover between the sectors and geographical areas?

  2. Are there any margin of differences between the expenditures on R&D in Europe in comparison to other parts of the world?

  3. Has the margin of difference suggest a different level of technological spillover between geographical areas?

  4. Can we conclude that the spillover in one geographical area is better than the other?

Methodology

Data Collection

Importing the data set

The data set consist of 1629 observations with 7 variables.

Checking for the structure of the data

## Classes 'tbl_df', 'tbl' and 'data.frame':    1629 obs. of  7 variables:
##  $ year  : num  1983 1983 1983 1983 1983 ...
##  $ fi    : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ sector: num  4 5 2 2 11 5 1 11 3 2 ...
##  $ geo   : num  3 3 3 1 4 1 3 3 3 3 ...
##  $ patent: num  18 4 29 45 1 0 1 0 0 47 ...
##  $ rdexp : num  5.29 4.31 3.76 5.87 4.21 ...
##  $ spil  : num  8.98 10.42 9.65 9.63 8.7 ...

Extracts of the first 5 observations

## # A tibble: 5 x 7
##    year    fi sector   geo patent rdexp  spil
##   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1  1983     1      4     3     18  5.29  8.98
## 2  1983     2      5     3      4  4.31 10.4 
## 3  1983     3      2     3     29  3.76  9.65
## 4  1983     4      2     1     45  5.87  9.63
## 5  1983     5     11     4      1  4.21  8.70

Checking for missing number

## [1] 0

This shows that there is no missing values

Note: fi , sector and geo are categorical variables but they encoded into numeric for the purpose of analysis.

Summary of the data

##       year            fi          sector           geo       
##  Min.   :1983   Min.   :  1   Min.   : 1.00   Min.   :1.000  
##  1st Qu.:1985   1st Qu.: 46   1st Qu.: 3.00   1st Qu.:3.000  
##  Median :1987   Median : 91   Median : 5.00   Median :3.000  
##  Mean   :1987   Mean   : 91   Mean   : 6.32   Mean   :2.641  
##  3rd Qu.:1989   3rd Qu.:136   3rd Qu.: 9.00   3rd Qu.:3.000  
##  Max.   :1991   Max.   :181   Max.   :15.00   Max.   :4.000  
##      patent           rdexp             spil       
##  Min.   :  0.00   Min.   :0.8651   Min.   : 6.825  
##  1st Qu.:  3.00   1st Qu.:4.1617   1st Qu.: 8.864  
##  Median : 18.00   Median :5.0752   Median : 9.619  
##  Mean   : 60.79   Mean   :5.2013   Mean   : 9.399  
##  3rd Qu.: 57.00   3rd Qu.:6.0911   3rd Qu.: 9.980  
##  Max.   :925.00   Max.   :8.7000   Max.   :10.759
##        vars    n    mean     sd  median trimmed   mad     min     max
## year      1 1629 1987.00   2.58 1987.00 1987.00  2.97 1983.00 1991.00
## fi        2 1629   91.00  52.27   91.00   91.00 66.72    1.00  181.00
## sector    3 1629    6.32   4.23    5.00    5.89  4.45    1.00   15.00
## geo       4 1629    2.64   0.73    3.00    2.79  0.00    1.00    4.00
## patent    5 1629   60.79 121.56   18.00   31.13 25.20    0.00  925.00
## rdexp     6 1629    5.20   1.26    5.08    5.12  1.43    0.87    8.70
## spil      7 1629    9.40   0.93    9.62    9.46  0.89    6.82   10.76
##         range  skew kurtosis   se
## year     8.00  0.00    -1.23 0.06
## fi     180.00  0.00    -1.20 1.29
## sector  14.00  0.71    -0.65 0.10
## geo      3.00 -1.58     0.86 0.02
## patent 925.00  3.84    17.36 3.01
## rdexp    7.83  0.46    -0.46 0.03
## spil     3.93 -0.65    -0.29 0.02

Exploratory Data Analysis

Checking the distribution of the dataset

Boxplot: To check for outliers

Correlation Coefficients

From the correlation table, its obvious there is a strong relationship of about 55% between patent and rdexp which implies that there is a multicollinearity between the model if i will have to consider all the variables for model building. Rd exp on the other hand has 35% correlation coefficient with spil, the highest among other predictors.

Data in tables

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## # A tibble: 4 x 3
##     geo sum_rdexp sum_spil
##   <dbl>     <dbl>    <dbl>
## 1     1    1460.    2378. 
## 2     2     707.    1083. 
## 3     3    6264.   11771. 
## 4     4      41.3     78.8
## # A tibble: 15 x 3
##    sector sum_rdexp sum_spil
##     <dbl>     <dbl>    <dbl>
##  1      1     573.     1105.
##  2      2    1295.     2477.
##  3      3     983.     1724.
##  4      4     628.     1080.
##  5      5    1514.     2764.
##  6      6     334.      638.
##  7      7     518.      787.
##  8      8     122.      210.
##  9      9     511.     1038.
## 10     10     391.      826.
## 11     11     124.      236.
## 12     12     342.      599.
## 13     13     173.      324.
## 14     14      67.5     173.
## 15     15     897.     1331.
## # A tibble: 9 x 3
##    year sum_rdexp sum_spil
##   <dbl>     <dbl>    <dbl>
## 1  1983      892.    1664.
## 2  1984      912.    1678.
## 3  1985      929.    1692.
## 4  1986      937.    1696.
## 5  1987      946.    1701.
## 6  1988      957.    1711.
## 7  1989      962.    1715.
## 8  1990      969.    1725.
## 9  1991      969.    1731.

Bar Chart

How has the expenditures on R&D impacted technological spillover between the sectors and geographical areas?

It is evident from the chart that the more a sector spend on r/d the highest the technological spillover effect on the sector. Take for an instance, Sector_1 with about 600m pounds expenditure on r/d further improve the sector as a result technological spillover by about 1200m pounds. This further improve the sector and the industry.

Scatter Plot of R/D exp and Spillover.

## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

Scatter Plot of R/D exp and Spillover over the years under review.

library(ggplot2)
#Scatter_plot
scatter <- ggplot(Yr, aes(year, sum_spil))
scatter + geom_point() + labs(x ="Year", y = "sum_spil") + 
  geom_smooth(method = "lm", colour = "Red", alpha = 0.1, fill = "Blue")

scatter <- ggplot(Yr, aes(year, sum_rdexp))
scatter + geom_point() + labs(x ="Year", y = "sum_rdexp") + 
  geom_smooth(method = "lm", colour = "Red", alpha = 0.1, fill = "Blue")

Linear Regression

## 
## Call:
## lm(formula = spil ~ ., data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.48192 -0.43267  0.05886  0.46771  1.83142 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.174e+01  1.467e+01  -4.209 2.71e-05 ***
## year         3.586e-02  7.389e-03   4.853 1.33e-06 ***
## fi          -4.228e-04  3.649e-04  -1.159    0.247    
## sector      -9.342e-02  4.621e-03 -20.214  < 2e-16 ***
## geo         -2.777e-01  2.858e-02  -9.714  < 2e-16 ***
## patent      -1.335e-03  1.934e-04  -6.904 7.24e-12 ***
## rdexp        2.568e-01  1.853e-02  13.855  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7639 on 1622 degrees of freedom
## Multiple R-squared:  0.3206, Adjusted R-squared:  0.3181 
## F-statistic: 127.6 on 6 and 1622 DF,  p-value: < 2.2e-16

New Model

regressor = lm(formula = spil ~ year + sector + geo + patent + rdexp,
               data = data)
summary(regressor)
## 
## Call:
## lm(formula = spil ~ year + sector + geo + patent + rdexp, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.46087 -0.43013  0.06179  0.47028  1.83642 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.160e+01  1.467e+01  -4.199 2.83e-05 ***
## year         3.576e-02  7.389e-03   4.839 1.43e-06 ***
## sector      -9.343e-02  4.622e-03 -20.214  < 2e-16 ***
## geo         -2.743e-01  2.844e-02  -9.645  < 2e-16 ***
## patent      -1.331e-03  1.934e-04  -6.881 8.48e-12 ***
## rdexp        2.583e-01  1.849e-02  13.972  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7639 on 1623 degrees of freedom
## Multiple R-squared:   0.32,  Adjusted R-squared:  0.3179 
## F-statistic: 152.8 on 5 and 1623 DF,  p-value: < 2.2e-16
plot(regressor)

This model is significant as p-value is less than 0.05 (p-value: < 2.2e-16). With Adjusted R-squared of roughly 31.8% (Adjusted R-squared: 0.3179) it means the model can explain only 31.8% of the variation in technological spillover while 68% of the variation in spillover is being explained by other factors outside the model. Each of the variables are significant because they have all got P values that is far less than 0.05.

Conclusion

  1. Based on this model, Expenditure on Research and Development is a major factor for the reasons for variation in technological spillover. This explains the fact that R/D has got a huge impact on spillover.