Evaluating the impact of University Industry Collaborations (UICs) on Knowledge diffussion in the UK (2015 - 2022).

Research question:

Comparatively, how efficient are these different types of UICs in promoting knowledge transfer and innovation?

Expected outcomes (hypothesis):

Some definitions:

About the data:

For data about income from different types of UICs we used Higher Education Statistics Agency (HESA) databases available here: https://www.hesa.ac.uk/collection/c19032/hebci_a_questions For data about publications we downloaded publications from Scival (filtered by UK publications in English between the years 2015 and 2022). Available here: https://www.scival.com/home

rm(list = ls())

library(MASS)
 library(dplyr)
## 
## Adjuntando el paquete: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
 library(ggplot2)
 library(lmtest)
## Cargando paquete requerido: zoo
## 
## Adjuntando el paquete: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
 library(sandwich)
 library(corrplot)
## corrplot 0.95 loaded
 library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
 library(broom)
 library(knitr)
 library(patchwork)
## Warning: package 'patchwork' was built under R version 4.4.2
## 
## Adjuntando el paquete: 'patchwork'
## The following object is masked from 'package:MASS':
## 
##     area
Data <- read.csv("https://raw.githubusercontent.com/martinayazllizakalik/econometrics-project/refs/heads/main/Database%20HE%20providers%20uk%202015%202022%20%20UICs%2C%20staff%20and%20income.csv")

 Data <- as_tibble(Data)
 
colnames(Data)
## [1] "HE.Provider"                                                       
## [2] "No.of.publications"                                                
## [3] "Academic.Year..start."                                             
## [4] "Business.and.Community.Servicies..Income."                         
## [5] "Continous.Profesional.Development.and.Continous.Education..income."
## [6] "Collaborative.Research..income."                                   
## [7] "Intelectual.Property.Rights..income."                              
## [8] "Total.comprehensive.income.for.the.year"                           
## [9] "Total.staff"
colnames(Data) <- c("HE.Provider", "No of publications", "Academic.Year..start.", "Business and Community Servicies (Income)", "Continous Profesional Development and Continous Education (income)", "Collaborative Research (income)", "Intelectual Property Rights (income)","Total.comprehensive.income.for.the.year", "Total.staff")

A brief description of the columns:

HE provider: Universities in the UK (England, Northen Ireland, Scotland and Wales) Academic year (start): from 2015 to 2022 N° of publications: schoolarly output for all members of the staff Business and community Service incomes Continous profesional development incomes Collaborative research income Property Rights incomes Total income (including incomes from UICs) Total staff: n of employed staff either part time or full time

Lets start by some descriptive statistics

xxx <- Data %>%
   select(`No of publications`, `Business and Community Servicies (Income)`, `Continous Profesional Development and Continous Education (income)`, `Collaborative Research (income)`, `Intelectual Property Rights (income)`) 
 stargazer(as.data.frame(xxx), 
           type = "text",
           title = "Table: Descriptive statistics of some variables")
## 
## Table: Descriptive statistics of some variables
## ==========================================================================================================
## Statistic                                                            N      Mean     St. Dev.  Min   Max  
## ----------------------------------------------------------------------------------------------------------
## No of publications                                                 1,180 1,793.122  3,080.457   1  20,402 
## Business and Community Servicies (Income)                          1,168 14,333.180 28,886.080  0  296,893
## Continous Profesional Development and Continous Education (income) 1,168 33,123.780 52,959.430  0  564,869
## Collaborative Research (income)                                    1,168 10,714.730 20,286.230  0  117,677
## Intelectual Property Rights (income)                               1,168 1,049.897  5,574.154   0  75,382 
## ----------------------------------------------------------------------------------------------------------
            covariate.labels = c("Publications", "BnC", "CPD", "ColRes", "IP")

Adding a Russell Group column

russell_group_universities <- c(
  "The University of York", "The University of Warwick", "The University of Southampton", "The University of Sheffield", "Queen Mary University of London", "The University of Oxford", "University of Nottingham", "Newcastle University", "The University of Manchester", "The University of Liverpool", "The University of Leeds", "Imperial College of Science", "Technology and Medicine", "The University of Glasgow", "The University of Exeter", "The University of Edinburgh", "University of Durham", "Cardiff University", "The University of Cambridge", "The University of Bristol", "The University of Birmingham", "Queen's University Belfast"
)
Data <- Data %>%
  mutate(russell_group = ifelse(`HE.Provider` %in% russell_group_universities, 1, 0))

Since we want to compare between HE providers, we want the average of all years on our database. We group our data by HE.provider and showing the results as the mean value

grouped_data <- Data %>%
  group_by(`HE.Provider`) %>%
  summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE)))

Trying to see the general tendency of our data. We plot a correlation matrix

selected.Data <- grouped_data %>% select(`No of publications`, `Business and Community Servicies (Income)`, `Continous Profesional Development and Continous Education (income)`, `Collaborative Research (income)`, `Intelectual Property Rights (income)`)

corr.matrix <- cor(selected.Data)
 corr.matrix
##                                                                    No of publications
## No of publications                                                          1.0000000
## Business and Community Servicies (Income)                                   0.9197700
## Continous Profesional Development and Continous Education (income)          0.3603748
## Collaborative Research (income)                                             0.7898532
## Intelectual Property Rights (income)                                        0.6080855
##                                                                    Business and Community Servicies (Income)
## No of publications                                                                                 0.9197700
## Business and Community Servicies (Income)                                                          1.0000000
## Continous Profesional Development and Continous Education (income)                                 0.3104401
## Collaborative Research (income)                                                                    0.6899733
## Intelectual Property Rights (income)                                                               0.7550289
##                                                                    Continous Profesional Development and Continous Education (income)
## No of publications                                                                                                          0.3603748
## Business and Community Servicies (Income)                                                                                   0.3104401
## Continous Profesional Development and Continous Education (income)                                                          1.0000000
## Collaborative Research (income)                                                                                             0.3180354
## Intelectual Property Rights (income)                                                                                        0.2275546
##                                                                    Collaborative Research (income)
## No of publications                                                                       0.7898532
## Business and Community Servicies (Income)                                                0.6899733
## Continous Profesional Development and Continous Education (income)                       0.3180354
## Collaborative Research (income)                                                          1.0000000
## Intelectual Property Rights (income)                                                     0.3855972
##                                                                    Intelectual Property Rights (income)
## No of publications                                                                            0.6080855
## Business and Community Servicies (Income)                                                     0.7550289
## Continous Profesional Development and Continous Education (income)                            0.2275546
## Collaborative Research (income)                                                               0.3855972
## Intelectual Property Rights (income)                                                          1.0000000
 colnames(corr.matrix) <- c("Publications", "BnC", "CPD", "ColRes", "IP")
 rownames(corr.matrix) <- c("Publications", "BnC", "CPD", "ColRes", "IP")
 
 corrplot(corr.matrix, method = 'square')

We can start tu see that Business and Community Services seems to have the strongest correlation with our variable of interest: “N° of Publications” Collaborative research, as expected, also has a positive correlation. Now, we can plot each explanatory variable with our Y.

plotBnC <- ggplot(grouped_data, aes(x = `Business and Community Servicies (Income)`, y = `No of publications`)) +
   geom_point() +
   geom_smooth(method = "lm", se = TRUE, size = 1) +
   labs(x = "BnC", 
        y = "N° of Publications",
        title = "Linear regression line BnC")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plotCPD <- ggplot(grouped_data, aes(x = `Continous Profesional Development and Continous Education (income)`, y = `No of publications`)) +
   geom_point() +
   geom_smooth(method = "lm", se = TRUE, size = 1) +
   labs(x = "CPE", 
        y = "N° of Publications",
        title = "Linear regression line CPD")
plotColRes <- ggplot(grouped_data, aes(x = `Collaborative Research (income)`, y = `No of publications`)) +
   geom_point() +
   geom_smooth(method = "lm", se = TRUE, size = 1) +
   labs(x = "ColRes", 
        y = "N° of Publications",
        title = "Linear regression line ColRes")
plotIP <- ggplot(grouped_data, aes(x = `Intelectual Property Rights (income)`, y = `No of publications`)) +
   geom_point() +
   geom_smooth(method = "lm", se = TRUE, size = 1) +
   labs(x = "IP", 
        y = "N° of Publications",
        title = "Linear regression line IP")
combined_plot <- (plotBnC | plotCPD) / (plotColRes | plotIP)
combined_plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

And try to estimate our model without any control variables

names(grouped_data)
##  [1] "HE.Provider"                                                       
##  [2] "No of publications"                                                
##  [3] "Academic.Year..start."                                             
##  [4] "Business and Community Servicies (Income)"                         
##  [5] "Continous Profesional Development and Continous Education (income)"
##  [6] "Collaborative Research (income)"                                   
##  [7] "Intelectual Property Rights (income)"                              
##  [8] "Total.comprehensive.income.for.the.year"                           
##  [9] "Total.staff"                                                       
## [10] "russell_group"
lm.multiple <- lm(`No of publications` ~ `Business and Community Servicies (Income)`+ `Continous Profesional Development and Continous Education (income)`+ `Collaborative Research (income)`+ `Intelectual Property Rights (income)`, data = grouped_data)

 summary(lm.multiple)
## 
## Call:
## lm(formula = `No of publications` ~ `Business and Community Servicies (Income)` + 
##     `Continous Profesional Development and Continous Education (income)` + 
##     `Collaborative Research (income)` + `Intelectual Property Rights (income)`, 
##     data = grouped_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3724.7  -287.8   -47.0   243.1  5181.1 
## 
## Coefficients:
##                                                                        Estimate
## (Intercept)                                                           51.443432
## `Business and Community Servicies (Income)`                            0.089911
## `Continous Profesional Development and Continous Education (income)`   0.003534
## `Collaborative Research (income)`                                      0.040861
## `Intelectual Property Rights (income)`                                -0.076532
##                                                                      Std. Error
## (Intercept)                                                          102.650013
## `Business and Community Servicies (Income)`                            0.005852
## `Continous Profesional Development and Continous Education (income)`   0.001873
## `Collaborative Research (income)`                                      0.006125
## `Intelectual Property Rights (income)`                                 0.026641
##                                                                      t value
## (Intercept)                                                            0.501
## `Business and Community Servicies (Income)`                           15.363
## `Continous Profesional Development and Continous Education (income)`   1.887
## `Collaborative Research (income)`                                      6.671
## `Intelectual Property Rights (income)`                                -2.873
##                                                                      Pr(>|t|)
## (Intercept)                                                           0.61702
## `Business and Community Servicies (Income)`                           < 2e-16
## `Continous Profesional Development and Continous Education (income)`  0.06116
## `Collaborative Research (income)`                                     4.9e-10
## `Intelectual Property Rights (income)`                                0.00468
##                                                                         
## (Intercept)                                                             
## `Business and Community Servicies (Income)`                          ***
## `Continous Profesional Development and Continous Education (income)` .  
## `Collaborative Research (income)`                                    ***
## `Intelectual Property Rights (income)`                               ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 976.4 on 146 degrees of freedom
## Multiple R-squared:  0.8998, Adjusted R-squared:  0.8971 
## F-statistic: 327.9 on 4 and 146 DF,  p-value: < 2.2e-16
lm_results <- tidy(lm.multiple)
lm_results

It would seem that: 1. Business and Community Services income are significant to explain N° of publications since the p value is < 2e-16 2. CPD shows weak evidence of association witha p value of 0.061 3. Collaborative Research has a positive and significant association. 4. Intellectual property rights income have a negative and significant association.

What happens when we control for University size, University income, and Rusell Group?

lm.multiple2 <- lm(`No of publications` ~ `Business and Community Servicies (Income)`+ `Continous Profesional Development and Continous Education (income)`+ `Collaborative Research (income)`+ `Intelectual Property Rights (income)`+ `Total.comprehensive.income.for.the.year` + `Total.staff` + `russell_group`, data = grouped_data)

 summary(lm.multiple2)
## 
## Call:
## lm(formula = `No of publications` ~ `Business and Community Servicies (Income)` + 
##     `Continous Profesional Development and Continous Education (income)` + 
##     `Collaborative Research (income)` + `Intelectual Property Rights (income)` + 
##     Total.comprehensive.income.for.the.year + Total.staff + russell_group, 
##     data = grouped_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1959.5  -285.0    80.8   310.7  3744.2 
## 
## Coefficients:
##                                                                        Estimate
## (Intercept)                                                          -4.863e+02
## `Business and Community Servicies (Income)`                           4.239e-02
## `Continous Profesional Development and Continous Education (income)` -2.066e-03
## `Collaborative Research (income)`                                     2.730e-02
## `Intelectual Property Rights (income)`                               -7.240e-02
## Total.comprehensive.income.for.the.year                               1.175e-03
## Total.staff                                                           2.279e-02
## russell_group                                                        -7.980e+02
##                                                                      Std. Error
## (Intercept)                                                           9.787e+01
## `Business and Community Servicies (Income)`                           6.913e-03
## `Continous Profesional Development and Continous Education (income)`  1.501e-03
## `Collaborative Research (income)`                                     5.151e-03
## `Intelectual Property Rights (income)`                                2.069e-02
## Total.comprehensive.income.for.the.year                               2.410e-04
## Total.staff                                                           1.008e-02
## russell_group                                                         2.559e+02
##                                                                      t value
## (Intercept)                                                           -4.969
## `Business and Community Servicies (Income)`                            6.132
## `Continous Profesional Development and Continous Education (income)`  -1.377
## `Collaborative Research (income)`                                      5.299
## `Intelectual Property Rights (income)`                                -3.499
## Total.comprehensive.income.for.the.year                                4.875
## Total.staff                                                            2.262
## russell_group                                                         -3.118
##                                                                      Pr(>|t|)
## (Intercept)                                                          1.98e-06
## `Business and Community Servicies (Income)`                          8.73e-09
## `Continous Profesional Development and Continous Education (income)`  0.17088
## `Collaborative Research (income)`                                    4.53e-07
## `Intelectual Property Rights (income)`                                0.00063
## Total.comprehensive.income.for.the.year                              2.97e-06
## Total.staff                                                           0.02527
## russell_group                                                         0.00222
##                                                                         
## (Intercept)                                                          ***
## `Business and Community Servicies (Income)`                          ***
## `Continous Profesional Development and Continous Education (income)`    
## `Collaborative Research (income)`                                    ***
## `Intelectual Property Rights (income)`                               ***
## Total.comprehensive.income.for.the.year                              ***
## Total.staff                                                          *  
## russell_group                                                        ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 719.1 on 137 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.9484, Adjusted R-squared:  0.9457 
## F-statistic: 359.6 on 7 and 137 DF,  p-value: < 2.2e-16
 lm_results <- tidy(lm.multiple2)
lm_results

Some conclusions:

  1. R² = 0.9484: The model explains 94.84% of the variance in No.of.publications,
  2. Business and Community Services (Income): Positive coefficient (0.04239, p < 0.001); significant contributor to publications.
  3. Collaborative Research (Income): Positive coefficient (0.0273, p < 0.001); significant and impactful.
  4. Intellectual Property Rights (Income): Negative coefficient (-0.0724, p < 0.01); trade-off with academic publishing.
  5. Continuous Professional Development (Income): Negative coefficient (-0.002066, p = 0.17); not statistically significant.
  6. Total Comprehensive Income: Positive coefficient (0.001175, p < 0.001); small but significant impact.
  7. Total Staff: Positive coefficient (0.02279, p < 0.05); more staff increases publication output.
  8. Russell Group: Negative coefficient (-798.0, p < 0.01); associated with fewer publications, likely due to different priorities

Lets show both models in the same table

stargazer(lm.multiple, lm.multiple2, type = "text", align = TRUE)
## 
## ======================================================================================================================
##                                                                                     Dependent variable:               
##                                                                      -------------------------------------------------
##                                                                                    `No of publications`               
##                                                                                (1)                      (2)           
## ----------------------------------------------------------------------------------------------------------------------
## `Business and Community Servicies (Income)`                                  0.090***                 0.042***        
##                                                                              (0.006)                  (0.007)         
##                                                                                                                       
## `Continous Profesional Development and Continous Education (income)`          0.004*                   -0.002         
##                                                                              (0.002)                  (0.002)         
##                                                                                                                       
## `Collaborative Research (income)`                                            0.041***                 0.027***        
##                                                                              (0.006)                  (0.005)         
##                                                                                                                       
## `Intelectual Property Rights (income)`                                      -0.077***                -0.072***        
##                                                                              (0.027)                  (0.021)         
##                                                                                                                       
## Total.comprehensive.income.for.the.year                                                               0.001***        
##                                                                                                       (0.0002)        
##                                                                                                                       
## Total.staff                                                                                           0.023**         
##                                                                                                       (0.010)         
##                                                                                                                       
## russell_group                                                                                       -798.015***       
##                                                                                                      (255.939)        
##                                                                                                                       
## Constant                                                                      51.443                -486.282***       
##                                                                             (102.650)                 (97.867)        
##                                                                                                                       
## ----------------------------------------------------------------------------------------------------------------------
## Observations                                                                   151                      145           
## R2                                                                            0.900                    0.948          
## Adjusted R2                                                                   0.897                    0.946          
## Residual Std. Error                                                     976.449 (df = 146)       719.096 (df = 137)   
## F Statistic                                                          327.893*** (df = 4; 146) 359.600*** (df = 7; 137)
## ======================================================================================================================
## Note:                                                                                      *p<0.1; **p<0.05; ***p<0.01

Checking for the robustness of our model, we can plot the residuals

 plot(lm.multiple, which = c(1))

 plot(lm.multiple2, which = c(1))

We seem to be at a situation of heteroskedasticiy: a non-constant variance in the residuals. So we test more formally

bptest(lm.multiple2)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm.multiple2
## BP = 58.336, df = 7, p-value = 3.241e-10

Since the null hipothesis is that there is NO heteroskedasticity and our observed p value is really small, we dismiss the null hipothesis and conclude that we DO have heteroskedasticity.

To solve this, we tried a new approach by creating log variables, will appear in “data” file.

Data.log <- grouped_data %>%
  mutate(logPub = log(ifelse(`No of publications`== 0, 1, `No of publications`))) %>%
  mutate(logBnC = log(ifelse(`Business and Community Servicies (Income)` == 0, 1, `Business and Community Servicies (Income)`))) %>%
  mutate(logCPE = log(ifelse(`Continous Profesional Development and Continous Education (income)` == 0, 1, `Continous Profesional Development and Continous Education (income)`))) %>%
  mutate(logColRes = log(ifelse(`Collaborative Research (income)` == 0, 1, `Collaborative Research (income)`))) %>%
  mutate(logIP = log(ifelse(`Intelectual Property Rights (income)` == 0, 1, `Intelectual Property Rights (income)`))) %>%
  mutate(logTotal = log(ifelse(`Total.comprehensive.income.for.the.year` == 0, 1, `Total.comprehensive.income.for.the.year`))) %>%
  mutate(logStaff = log(ifelse(`Total.staff` == 0, 1, `Total.staff`)))

We now run the new functions with log values. Model 3 doesn’t have control variables while model 4 has.

names(Data.log)
##  [1] "HE.Provider"                                                       
##  [2] "No of publications"                                                
##  [3] "Academic.Year..start."                                             
##  [4] "Business and Community Servicies (Income)"                         
##  [5] "Continous Profesional Development and Continous Education (income)"
##  [6] "Collaborative Research (income)"                                   
##  [7] "Intelectual Property Rights (income)"                              
##  [8] "Total.comprehensive.income.for.the.year"                           
##  [9] "Total.staff"                                                       
## [10] "russell_group"                                                     
## [11] "logPub"                                                            
## [12] "logBnC"                                                            
## [13] "logCPE"                                                            
## [14] "logColRes"                                                         
## [15] "logIP"                                                             
## [16] "logTotal"                                                          
## [17] "logStaff"
lm.multiple3 <- lm(`logPub`~ `logBnC` + `logColRes`+`logCPE`+`logIP`, data = Data.log)
summary(lm.multiple3)
## 
## Call:
## lm(formula = logPub ~ logBnC + logColRes + logCPE + logIP, data = Data.log)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4532 -0.3700  0.1209  0.4758  6.5563 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.47754    0.34333   4.304 3.06e-05 ***
## logBnC       0.24555    0.07615   3.225  0.00156 ** 
## logColRes    0.27469    0.04909   5.596 1.05e-07 ***
## logCPE       0.04542    0.04827   0.941  0.34827    
## logIP        0.12515    0.03822   3.274  0.00132 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.06 on 146 degrees of freedom
## Multiple R-squared:  0.7631, Adjusted R-squared:  0.7566 
## F-statistic: 117.6 on 4 and 146 DF,  p-value: < 2.2e-16
lm.multiple4 <- lm(`logPub`~ `logBnC` + `logColRes`+`logCPE`+`logIP`+`logTotal`+`logStaff`+`russell_group`, data = Data.log)
summary(lm.multiple4)
## 
## Call:
## lm(formula = logPub ~ logBnC + logColRes + logCPE + logIP + logTotal + 
##     logStaff + russell_group, data = Data.log)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00728 -0.27304  0.05615  0.39983  1.51164 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -9.39006    1.24026  -7.571 4.93e-12 ***
## logBnC         0.13457    0.06135   2.194 0.029943 *  
## logColRes      0.11916    0.03328   3.581 0.000475 ***
## logCPE         0.07279    0.03698   1.968 0.051035 .  
## logIP          0.03045    0.02526   1.206 0.230006    
## logTotal       0.63687    0.18785   3.390 0.000912 ***
## logStaff       0.46321    0.17119   2.706 0.007681 ** 
## russell_group -0.05264    0.20490  -0.257 0.797638    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6488 on 137 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.9077, Adjusted R-squared:  0.903 
## F-statistic: 192.6 on 7 and 137 DF,  p-value: < 2.2e-16

Let’s check again for Heteroskedasticity

 plot(lm.multiple3, which = c(1))

 plot(lm.multiple4, which = c(1))

And BP test

bptest(lm.multiple3)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm.multiple3
## BP = 26.379, df = 4, p-value = 2.653e-05
bptest(lm.multiple4)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm.multiple4
## BP = 38.475, df = 7, p-value = 2.46e-06

We have Heteroskedasticity again for the 2 new log models, and also for the two previous models. Let’s confirm this with the White test.

library(whitestrap)
## 
## Please cite as:
## Lopez, J. (2020), White's test and Bootstrapped White's test under the methodology of Jeong, J., Lee, K. (1999) package version 0.0.1
 white_test(lm.multiple)
## White's test results
## 
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 30.37
## P-value: 0
 white_test(lm.multiple2)
## White's test results
## 
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 32.95
## P-value: 0
 white_test(lm.multiple3)
## White's test results
## 
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 24.77
## P-value: 4e-06
 white_test(lm.multiple4)
## White's test results
## 
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 39.6
## P-value: 0

p-value for all four models are very small. We can’t avoid Heteroskedasticity. Hence, we have to estimate robust standard errors with ‘vcovHC’ fuction.

coeftest(lm.multiple, vcov = vcovHC(lm.multiple, type = "HC0"))
## 
## t test of coefficients:
## 
##                                                                        Estimate
## (Intercept)                                                          51.4434318
## `Business and Community Servicies (Income)`                           0.0899108
## `Continous Profesional Development and Continous Education (income)`  0.0035339
## `Collaborative Research (income)`                                     0.0408612
## `Intelectual Property Rights (income)`                               -0.0765318
##                                                                      Std. Error
## (Intercept)                                                          88.3610388
## `Business and Community Servicies (Income)`                           0.0088900
## `Continous Profesional Development and Continous Education (income)`  0.0031208
## `Collaborative Research (income)`                                     0.0107121
## `Intelectual Property Rights (income)`                                0.0360715
##                                                                      t value
## (Intercept)                                                           0.5822
## `Business and Community Servicies (Income)`                          10.1137
## `Continous Profesional Development and Continous Education (income)`  1.1324
## `Collaborative Research (income)`                                     3.8145
## `Intelectual Property Rights (income)`                               -2.1217
##                                                                       Pr(>|t|)
## (Intercept)                                                          0.5613327
## `Business and Community Servicies (Income)`                          < 2.2e-16
## `Continous Profesional Development and Continous Education (income)` 0.2593338
## `Collaborative Research (income)`                                    0.0002008
## `Intelectual Property Rights (income)`                               0.0355544
##                                                                         
## (Intercept)                                                             
## `Business and Community Servicies (Income)`                          ***
## `Continous Profesional Development and Continous Education (income)`    
## `Collaborative Research (income)`                                    ***
## `Intelectual Property Rights (income)`                               *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple2, vcov = vcovHC(lm.multiple2, type = "HC0"))
## 
## t test of coefficients:
## 
##                                                                         Estimate
## (Intercept)                                                          -4.8628e+02
## `Business and Community Servicies (Income)`                           4.2387e-02
## `Continous Profesional Development and Continous Education (income)` -2.0662e-03
## `Collaborative Research (income)`                                     2.7298e-02
## `Intelectual Property Rights (income)`                               -7.2400e-02
## Total.comprehensive.income.for.the.year                               1.1746e-03
## Total.staff                                                           2.2791e-02
## russell_group                                                        -7.9801e+02
##                                                                       Std. Error
## (Intercept)                                                           1.4468e+02
## `Business and Community Servicies (Income)`                           1.0822e-02
## `Continous Profesional Development and Continous Education (income)`  2.2475e-03
## `Collaborative Research (income)`                                     8.7215e-03
## `Intelectual Property Rights (income)`                                2.5919e-02
## Total.comprehensive.income.for.the.year                               4.4005e-04
## Total.staff                                                           1.6771e-02
## russell_group                                                         5.5701e+02
##                                                                      t value
## (Intercept)                                                          -3.3612
## `Business and Community Servicies (Income)`                           3.9166
## `Continous Profesional Development and Continous Education (income)` -0.9194
## `Collaborative Research (income)`                                     3.1299
## `Intelectual Property Rights (income)`                               -2.7933
## Total.comprehensive.income.for.the.year                               2.6692
## Total.staff                                                           1.3590
## russell_group                                                        -1.4327
##                                                                       Pr(>|t|)
## (Intercept)                                                          0.0010059
## `Business and Community Servicies (Income)`                          0.0001411
## `Continous Profesional Development and Continous Education (income)` 0.3595261
## `Collaborative Research (income)`                                    0.0021373
## `Intelectual Property Rights (income)`                               0.0059644
## Total.comprehensive.income.for.the.year                              0.0085221
## Total.staff                                                          0.1763780
## russell_group                                                        0.1542278
##                                                                         
## (Intercept)                                                          ** 
## `Business and Community Servicies (Income)`                          ***
## `Continous Profesional Development and Continous Education (income)`    
## `Collaborative Research (income)`                                    ** 
## `Intelectual Property Rights (income)`                               ** 
## Total.comprehensive.income.for.the.year                              ** 
## Total.staff                                                             
## russell_group                                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple3, vcov = vcovHC(lm.multiple3, type = "HC0"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.477542   0.799330  1.8485 0.0665564 .  
## logBnC      0.245550   0.103215  2.3790 0.0186518 *  
## logColRes   0.274687   0.064160  4.2813 3.349e-05 ***
## logCPE      0.045424   0.069502  0.6536 0.5144274    
## logIP       0.125146   0.035894  3.4865 0.0006467 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple4, vcov = vcovHC(lm.multiple4, type = "HC0"))
## 
## t test of coefficients:
## 
##                Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)   -9.390057   1.689123 -5.5591 1.368e-07 ***
## logBnC         0.134575   0.073323  1.8354  0.068619 .  
## logColRes      0.119156   0.037417  3.1845  0.001795 ** 
## logCPE         0.072786   0.047611  1.5288  0.128623    
## logIP          0.030451   0.022445  1.3567  0.177109    
## logTotal       0.636870   0.226319  2.8140  0.005613 ** 
## logStaff       0.463208   0.195498  2.3694  0.019214 *  
## russell_group -0.052639   0.142919 -0.3683  0.713210    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally we want to check for endogeneity for our “strongest” variable: Collaborative Research. To do this we look for an Instrumental Variable that fullfils the following conditions:

We belive income from research grants may be a good IV

library(AER)
## Cargando paquete requerido: car
## Cargando paquete requerido: carData
## 
## Adjuntando el paquete: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Cargando paquete requerido: survival
Data.income <- read.csv("https://raw.githubusercontent.com/martinayazllizakalik/econometrics-project/refs/heads/main/research%20grant%20data%20hesa.csv")

Data.income <- Data.income %>%
  group_by(HE.Provider) %>%
  summarize(mean_value = mean(Value..000s., na.rm = TRUE))  

Data.income <- left_join(grouped_data, Data.income, by = "HE.Provider")

colnames(Data.income)[11] <- "Research grant"

ivreg_model <- ivreg(`Collaborative Research (income)` ~ `Research grant`, data = Data.income)


  stargazer(ivreg_model, type = "text",
           dep.var.labels = c("Collaborative Research"),
           covariate.labels = c("Research grant income"))
## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                         Collaborative Research   
## -------------------------------------------------
## Research grant income          0.151***          
##                                 (0.011)          
##                                                  
## Constant                     4,050.595***        
##                               (1,107.931)        
##                                                  
## -------------------------------------------------
## Observations                      151            
## R2                               0.572           
## Adjusted R2                      0.570           
## Residual Std. Error      12,458.430 (df = 149)   
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

Our IV seems to be significant: The coefficient is 0.151, and it is statistically significant (indicated by ***).

To conclude: