Evaluating the impact of University Industry Collaborations (UICs) on Knowledge diffussion in the UK (2015 - 2022).
Research question:
Comparatively, how efficient are these different types of UICs in promoting knowledge transfer and innovation?
Expected outcomes (hypothesis):
Some definitions:
About the data:
For data about income from different types of UICs we used Higher Education Statistics Agency (HESA) databases available here: https://www.hesa.ac.uk/collection/c19032/hebci_a_questions For data about publications we downloaded publications from Scival (filtered by UK publications in English between the years 2015 and 2022). Available here: https://www.scival.com/home
rm(list = ls())
library(MASS)
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lmtest)
## Cargando paquete requerido: zoo
##
## Adjuntando el paquete: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(sandwich)
library(corrplot)
## corrplot 0.95 loaded
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(broom)
library(knitr)
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.4.2
##
## Adjuntando el paquete: 'patchwork'
## The following object is masked from 'package:MASS':
##
## area
Data <- read.csv("https://raw.githubusercontent.com/martinayazllizakalik/econometrics-project/refs/heads/main/Database%20HE%20providers%20uk%202015%202022%20%20UICs%2C%20staff%20and%20income.csv")
Data <- as_tibble(Data)
colnames(Data)
## [1] "HE.Provider"
## [2] "No.of.publications"
## [3] "Academic.Year..start."
## [4] "Business.and.Community.Servicies..Income."
## [5] "Continous.Profesional.Development.and.Continous.Education..income."
## [6] "Collaborative.Research..income."
## [7] "Intelectual.Property.Rights..income."
## [8] "Total.comprehensive.income.for.the.year"
## [9] "Total.staff"
colnames(Data) <- c("HE.Provider", "No of publications", "Academic.Year..start.", "Business and Community Servicies (Income)", "Continous Profesional Development and Continous Education (income)", "Collaborative Research (income)", "Intelectual Property Rights (income)","Total.comprehensive.income.for.the.year", "Total.staff")
A brief description of the columns:
HE provider: Universities in the UK (England, Northen Ireland, Scotland and Wales) Academic year (start): from 2015 to 2022 N° of publications: schoolarly output for all members of the staff Business and community Service incomes Continous profesional development incomes Collaborative research income Property Rights incomes Total income (including incomes from UICs) Total staff: n of employed staff either part time or full time
Lets start by some descriptive statistics
xxx <- Data %>%
select(`No of publications`, `Business and Community Servicies (Income)`, `Continous Profesional Development and Continous Education (income)`, `Collaborative Research (income)`, `Intelectual Property Rights (income)`)
stargazer(as.data.frame(xxx),
type = "text",
title = "Table: Descriptive statistics of some variables")
##
## Table: Descriptive statistics of some variables
## ==========================================================================================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------------------------------------------------------------------
## No of publications 1,180 1,793.122 3,080.457 1 20,402
## Business and Community Servicies (Income) 1,168 14,333.180 28,886.080 0 296,893
## Continous Profesional Development and Continous Education (income) 1,168 33,123.780 52,959.430 0 564,869
## Collaborative Research (income) 1,168 10,714.730 20,286.230 0 117,677
## Intelectual Property Rights (income) 1,168 1,049.897 5,574.154 0 75,382
## ----------------------------------------------------------------------------------------------------------
covariate.labels = c("Publications", "BnC", "CPD", "ColRes", "IP")
Adding a Russell Group column
russell_group_universities <- c(
"The University of York", "The University of Warwick", "The University of Southampton", "The University of Sheffield", "Queen Mary University of London", "The University of Oxford", "University of Nottingham", "Newcastle University", "The University of Manchester", "The University of Liverpool", "The University of Leeds", "Imperial College of Science", "Technology and Medicine", "The University of Glasgow", "The University of Exeter", "The University of Edinburgh", "University of Durham", "Cardiff University", "The University of Cambridge", "The University of Bristol", "The University of Birmingham", "Queen's University Belfast"
)
Data <- Data %>%
mutate(russell_group = ifelse(`HE.Provider` %in% russell_group_universities, 1, 0))
Since we want to compare between HE providers, we want the average of all years on our database. We group our data by HE.provider and showing the results as the mean value
grouped_data <- Data %>%
group_by(`HE.Provider`) %>%
summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE)))
Trying to see the general tendency of our data. We plot a correlation matrix
selected.Data <- grouped_data %>% select(`No of publications`, `Business and Community Servicies (Income)`, `Continous Profesional Development and Continous Education (income)`, `Collaborative Research (income)`, `Intelectual Property Rights (income)`)
corr.matrix <- cor(selected.Data)
corr.matrix
## No of publications
## No of publications 1.0000000
## Business and Community Servicies (Income) 0.9197700
## Continous Profesional Development and Continous Education (income) 0.3603748
## Collaborative Research (income) 0.7898532
## Intelectual Property Rights (income) 0.6080855
## Business and Community Servicies (Income)
## No of publications 0.9197700
## Business and Community Servicies (Income) 1.0000000
## Continous Profesional Development and Continous Education (income) 0.3104401
## Collaborative Research (income) 0.6899733
## Intelectual Property Rights (income) 0.7550289
## Continous Profesional Development and Continous Education (income)
## No of publications 0.3603748
## Business and Community Servicies (Income) 0.3104401
## Continous Profesional Development and Continous Education (income) 1.0000000
## Collaborative Research (income) 0.3180354
## Intelectual Property Rights (income) 0.2275546
## Collaborative Research (income)
## No of publications 0.7898532
## Business and Community Servicies (Income) 0.6899733
## Continous Profesional Development and Continous Education (income) 0.3180354
## Collaborative Research (income) 1.0000000
## Intelectual Property Rights (income) 0.3855972
## Intelectual Property Rights (income)
## No of publications 0.6080855
## Business and Community Servicies (Income) 0.7550289
## Continous Profesional Development and Continous Education (income) 0.2275546
## Collaborative Research (income) 0.3855972
## Intelectual Property Rights (income) 1.0000000
colnames(corr.matrix) <- c("Publications", "BnC", "CPD", "ColRes", "IP")
rownames(corr.matrix) <- c("Publications", "BnC", "CPD", "ColRes", "IP")
corrplot(corr.matrix, method = 'square')
We can start tu see that Business and Community Services seems to have the strongest correlation with our variable of interest: “N° of Publications” Collaborative research, as expected, also has a positive correlation. Now, we can plot each explanatory variable with our Y.
plotBnC <- ggplot(grouped_data, aes(x = `Business and Community Servicies (Income)`, y = `No of publications`)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE, size = 1) +
labs(x = "BnC",
y = "N° of Publications",
title = "Linear regression line BnC")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plotCPD <- ggplot(grouped_data, aes(x = `Continous Profesional Development and Continous Education (income)`, y = `No of publications`)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE, size = 1) +
labs(x = "CPE",
y = "N° of Publications",
title = "Linear regression line CPD")
plotColRes <- ggplot(grouped_data, aes(x = `Collaborative Research (income)`, y = `No of publications`)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE, size = 1) +
labs(x = "ColRes",
y = "N° of Publications",
title = "Linear regression line ColRes")
plotIP <- ggplot(grouped_data, aes(x = `Intelectual Property Rights (income)`, y = `No of publications`)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE, size = 1) +
labs(x = "IP",
y = "N° of Publications",
title = "Linear regression line IP")
combined_plot <- (plotBnC | plotCPD) / (plotColRes | plotIP)
combined_plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
And try to estimate our model without any control variables
names(grouped_data)
## [1] "HE.Provider"
## [2] "No of publications"
## [3] "Academic.Year..start."
## [4] "Business and Community Servicies (Income)"
## [5] "Continous Profesional Development and Continous Education (income)"
## [6] "Collaborative Research (income)"
## [7] "Intelectual Property Rights (income)"
## [8] "Total.comprehensive.income.for.the.year"
## [9] "Total.staff"
## [10] "russell_group"
lm.multiple <- lm(`No of publications` ~ `Business and Community Servicies (Income)`+ `Continous Profesional Development and Continous Education (income)`+ `Collaborative Research (income)`+ `Intelectual Property Rights (income)`, data = grouped_data)
summary(lm.multiple)
##
## Call:
## lm(formula = `No of publications` ~ `Business and Community Servicies (Income)` +
## `Continous Profesional Development and Continous Education (income)` +
## `Collaborative Research (income)` + `Intelectual Property Rights (income)`,
## data = grouped_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3724.7 -287.8 -47.0 243.1 5181.1
##
## Coefficients:
## Estimate
## (Intercept) 51.443432
## `Business and Community Servicies (Income)` 0.089911
## `Continous Profesional Development and Continous Education (income)` 0.003534
## `Collaborative Research (income)` 0.040861
## `Intelectual Property Rights (income)` -0.076532
## Std. Error
## (Intercept) 102.650013
## `Business and Community Servicies (Income)` 0.005852
## `Continous Profesional Development and Continous Education (income)` 0.001873
## `Collaborative Research (income)` 0.006125
## `Intelectual Property Rights (income)` 0.026641
## t value
## (Intercept) 0.501
## `Business and Community Servicies (Income)` 15.363
## `Continous Profesional Development and Continous Education (income)` 1.887
## `Collaborative Research (income)` 6.671
## `Intelectual Property Rights (income)` -2.873
## Pr(>|t|)
## (Intercept) 0.61702
## `Business and Community Servicies (Income)` < 2e-16
## `Continous Profesional Development and Continous Education (income)` 0.06116
## `Collaborative Research (income)` 4.9e-10
## `Intelectual Property Rights (income)` 0.00468
##
## (Intercept)
## `Business and Community Servicies (Income)` ***
## `Continous Profesional Development and Continous Education (income)` .
## `Collaborative Research (income)` ***
## `Intelectual Property Rights (income)` **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 976.4 on 146 degrees of freedom
## Multiple R-squared: 0.8998, Adjusted R-squared: 0.8971
## F-statistic: 327.9 on 4 and 146 DF, p-value: < 2.2e-16
lm_results <- tidy(lm.multiple)
lm_results
It would seem that: 1. Business and Community Services income are significant to explain N° of publications since the p value is < 2e-16 2. CPD shows weak evidence of association witha p value of 0.061 3. Collaborative Research has a positive and significant association. 4. Intellectual property rights income have a negative and significant association.
What happens when we control for University size, University income, and Rusell Group?
lm.multiple2 <- lm(`No of publications` ~ `Business and Community Servicies (Income)`+ `Continous Profesional Development and Continous Education (income)`+ `Collaborative Research (income)`+ `Intelectual Property Rights (income)`+ `Total.comprehensive.income.for.the.year` + `Total.staff` + `russell_group`, data = grouped_data)
summary(lm.multiple2)
##
## Call:
## lm(formula = `No of publications` ~ `Business and Community Servicies (Income)` +
## `Continous Profesional Development and Continous Education (income)` +
## `Collaborative Research (income)` + `Intelectual Property Rights (income)` +
## Total.comprehensive.income.for.the.year + Total.staff + russell_group,
## data = grouped_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1959.5 -285.0 80.8 310.7 3744.2
##
## Coefficients:
## Estimate
## (Intercept) -4.863e+02
## `Business and Community Servicies (Income)` 4.239e-02
## `Continous Profesional Development and Continous Education (income)` -2.066e-03
## `Collaborative Research (income)` 2.730e-02
## `Intelectual Property Rights (income)` -7.240e-02
## Total.comprehensive.income.for.the.year 1.175e-03
## Total.staff 2.279e-02
## russell_group -7.980e+02
## Std. Error
## (Intercept) 9.787e+01
## `Business and Community Servicies (Income)` 6.913e-03
## `Continous Profesional Development and Continous Education (income)` 1.501e-03
## `Collaborative Research (income)` 5.151e-03
## `Intelectual Property Rights (income)` 2.069e-02
## Total.comprehensive.income.for.the.year 2.410e-04
## Total.staff 1.008e-02
## russell_group 2.559e+02
## t value
## (Intercept) -4.969
## `Business and Community Servicies (Income)` 6.132
## `Continous Profesional Development and Continous Education (income)` -1.377
## `Collaborative Research (income)` 5.299
## `Intelectual Property Rights (income)` -3.499
## Total.comprehensive.income.for.the.year 4.875
## Total.staff 2.262
## russell_group -3.118
## Pr(>|t|)
## (Intercept) 1.98e-06
## `Business and Community Servicies (Income)` 8.73e-09
## `Continous Profesional Development and Continous Education (income)` 0.17088
## `Collaborative Research (income)` 4.53e-07
## `Intelectual Property Rights (income)` 0.00063
## Total.comprehensive.income.for.the.year 2.97e-06
## Total.staff 0.02527
## russell_group 0.00222
##
## (Intercept) ***
## `Business and Community Servicies (Income)` ***
## `Continous Profesional Development and Continous Education (income)`
## `Collaborative Research (income)` ***
## `Intelectual Property Rights (income)` ***
## Total.comprehensive.income.for.the.year ***
## Total.staff *
## russell_group **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 719.1 on 137 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.9484, Adjusted R-squared: 0.9457
## F-statistic: 359.6 on 7 and 137 DF, p-value: < 2.2e-16
lm_results <- tidy(lm.multiple2)
lm_results
Some conclusions:
Lets show both models in the same table
stargazer(lm.multiple, lm.multiple2, type = "text", align = TRUE)
##
## ======================================================================================================================
## Dependent variable:
## -------------------------------------------------
## `No of publications`
## (1) (2)
## ----------------------------------------------------------------------------------------------------------------------
## `Business and Community Servicies (Income)` 0.090*** 0.042***
## (0.006) (0.007)
##
## `Continous Profesional Development and Continous Education (income)` 0.004* -0.002
## (0.002) (0.002)
##
## `Collaborative Research (income)` 0.041*** 0.027***
## (0.006) (0.005)
##
## `Intelectual Property Rights (income)` -0.077*** -0.072***
## (0.027) (0.021)
##
## Total.comprehensive.income.for.the.year 0.001***
## (0.0002)
##
## Total.staff 0.023**
## (0.010)
##
## russell_group -798.015***
## (255.939)
##
## Constant 51.443 -486.282***
## (102.650) (97.867)
##
## ----------------------------------------------------------------------------------------------------------------------
## Observations 151 145
## R2 0.900 0.948
## Adjusted R2 0.897 0.946
## Residual Std. Error 976.449 (df = 146) 719.096 (df = 137)
## F Statistic 327.893*** (df = 4; 146) 359.600*** (df = 7; 137)
## ======================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Checking for the robustness of our model, we can plot the residuals
plot(lm.multiple, which = c(1))
plot(lm.multiple2, which = c(1))
We seem to be at a situation of heteroskedasticiy: a non-constant
variance in the residuals. So we test more formally
bptest(lm.multiple2)
##
## studentized Breusch-Pagan test
##
## data: lm.multiple2
## BP = 58.336, df = 7, p-value = 3.241e-10
Since the null hipothesis is that there is NO heteroskedasticity and our observed p value is really small, we dismiss the null hipothesis and conclude that we DO have heteroskedasticity.
To solve this, we tried a new approach by creating log variables, will appear in “data” file.
Data.log <- grouped_data %>%
mutate(logPub = log(ifelse(`No of publications`== 0, 1, `No of publications`))) %>%
mutate(logBnC = log(ifelse(`Business and Community Servicies (Income)` == 0, 1, `Business and Community Servicies (Income)`))) %>%
mutate(logCPE = log(ifelse(`Continous Profesional Development and Continous Education (income)` == 0, 1, `Continous Profesional Development and Continous Education (income)`))) %>%
mutate(logColRes = log(ifelse(`Collaborative Research (income)` == 0, 1, `Collaborative Research (income)`))) %>%
mutate(logIP = log(ifelse(`Intelectual Property Rights (income)` == 0, 1, `Intelectual Property Rights (income)`))) %>%
mutate(logTotal = log(ifelse(`Total.comprehensive.income.for.the.year` == 0, 1, `Total.comprehensive.income.for.the.year`))) %>%
mutate(logStaff = log(ifelse(`Total.staff` == 0, 1, `Total.staff`)))
We now run the new functions with log values. Model 3 doesn’t have control variables while model 4 has.
names(Data.log)
## [1] "HE.Provider"
## [2] "No of publications"
## [3] "Academic.Year..start."
## [4] "Business and Community Servicies (Income)"
## [5] "Continous Profesional Development and Continous Education (income)"
## [6] "Collaborative Research (income)"
## [7] "Intelectual Property Rights (income)"
## [8] "Total.comprehensive.income.for.the.year"
## [9] "Total.staff"
## [10] "russell_group"
## [11] "logPub"
## [12] "logBnC"
## [13] "logCPE"
## [14] "logColRes"
## [15] "logIP"
## [16] "logTotal"
## [17] "logStaff"
lm.multiple3 <- lm(`logPub`~ `logBnC` + `logColRes`+`logCPE`+`logIP`, data = Data.log)
summary(lm.multiple3)
##
## Call:
## lm(formula = logPub ~ logBnC + logColRes + logCPE + logIP, data = Data.log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4532 -0.3700 0.1209 0.4758 6.5563
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.47754 0.34333 4.304 3.06e-05 ***
## logBnC 0.24555 0.07615 3.225 0.00156 **
## logColRes 0.27469 0.04909 5.596 1.05e-07 ***
## logCPE 0.04542 0.04827 0.941 0.34827
## logIP 0.12515 0.03822 3.274 0.00132 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.06 on 146 degrees of freedom
## Multiple R-squared: 0.7631, Adjusted R-squared: 0.7566
## F-statistic: 117.6 on 4 and 146 DF, p-value: < 2.2e-16
lm.multiple4 <- lm(`logPub`~ `logBnC` + `logColRes`+`logCPE`+`logIP`+`logTotal`+`logStaff`+`russell_group`, data = Data.log)
summary(lm.multiple4)
##
## Call:
## lm(formula = logPub ~ logBnC + logColRes + logCPE + logIP + logTotal +
## logStaff + russell_group, data = Data.log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00728 -0.27304 0.05615 0.39983 1.51164
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.39006 1.24026 -7.571 4.93e-12 ***
## logBnC 0.13457 0.06135 2.194 0.029943 *
## logColRes 0.11916 0.03328 3.581 0.000475 ***
## logCPE 0.07279 0.03698 1.968 0.051035 .
## logIP 0.03045 0.02526 1.206 0.230006
## logTotal 0.63687 0.18785 3.390 0.000912 ***
## logStaff 0.46321 0.17119 2.706 0.007681 **
## russell_group -0.05264 0.20490 -0.257 0.797638
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6488 on 137 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.9077, Adjusted R-squared: 0.903
## F-statistic: 192.6 on 7 and 137 DF, p-value: < 2.2e-16
Let’s check again for Heteroskedasticity
plot(lm.multiple3, which = c(1))
plot(lm.multiple4, which = c(1))
And BP test
bptest(lm.multiple3)
##
## studentized Breusch-Pagan test
##
## data: lm.multiple3
## BP = 26.379, df = 4, p-value = 2.653e-05
bptest(lm.multiple4)
##
## studentized Breusch-Pagan test
##
## data: lm.multiple4
## BP = 38.475, df = 7, p-value = 2.46e-06
We have Heteroskedasticity again for the 2 new log models, and also for the two previous models. Let’s confirm this with the White test.
library(whitestrap)
##
## Please cite as:
## Lopez, J. (2020), White's test and Bootstrapped White's test under the methodology of Jeong, J., Lee, K. (1999) package version 0.0.1
white_test(lm.multiple)
## White's test results
##
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 30.37
## P-value: 0
white_test(lm.multiple2)
## White's test results
##
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 32.95
## P-value: 0
white_test(lm.multiple3)
## White's test results
##
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 24.77
## P-value: 4e-06
white_test(lm.multiple4)
## White's test results
##
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 39.6
## P-value: 0
p-value for all four models are very small. We can’t avoid Heteroskedasticity. Hence, we have to estimate robust standard errors with ‘vcovHC’ fuction.
coeftest(lm.multiple, vcov = vcovHC(lm.multiple, type = "HC0"))
##
## t test of coefficients:
##
## Estimate
## (Intercept) 51.4434318
## `Business and Community Servicies (Income)` 0.0899108
## `Continous Profesional Development and Continous Education (income)` 0.0035339
## `Collaborative Research (income)` 0.0408612
## `Intelectual Property Rights (income)` -0.0765318
## Std. Error
## (Intercept) 88.3610388
## `Business and Community Servicies (Income)` 0.0088900
## `Continous Profesional Development and Continous Education (income)` 0.0031208
## `Collaborative Research (income)` 0.0107121
## `Intelectual Property Rights (income)` 0.0360715
## t value
## (Intercept) 0.5822
## `Business and Community Servicies (Income)` 10.1137
## `Continous Profesional Development and Continous Education (income)` 1.1324
## `Collaborative Research (income)` 3.8145
## `Intelectual Property Rights (income)` -2.1217
## Pr(>|t|)
## (Intercept) 0.5613327
## `Business and Community Servicies (Income)` < 2.2e-16
## `Continous Profesional Development and Continous Education (income)` 0.2593338
## `Collaborative Research (income)` 0.0002008
## `Intelectual Property Rights (income)` 0.0355544
##
## (Intercept)
## `Business and Community Servicies (Income)` ***
## `Continous Profesional Development and Continous Education (income)`
## `Collaborative Research (income)` ***
## `Intelectual Property Rights (income)` *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple2, vcov = vcovHC(lm.multiple2, type = "HC0"))
##
## t test of coefficients:
##
## Estimate
## (Intercept) -4.8628e+02
## `Business and Community Servicies (Income)` 4.2387e-02
## `Continous Profesional Development and Continous Education (income)` -2.0662e-03
## `Collaborative Research (income)` 2.7298e-02
## `Intelectual Property Rights (income)` -7.2400e-02
## Total.comprehensive.income.for.the.year 1.1746e-03
## Total.staff 2.2791e-02
## russell_group -7.9801e+02
## Std. Error
## (Intercept) 1.4468e+02
## `Business and Community Servicies (Income)` 1.0822e-02
## `Continous Profesional Development and Continous Education (income)` 2.2475e-03
## `Collaborative Research (income)` 8.7215e-03
## `Intelectual Property Rights (income)` 2.5919e-02
## Total.comprehensive.income.for.the.year 4.4005e-04
## Total.staff 1.6771e-02
## russell_group 5.5701e+02
## t value
## (Intercept) -3.3612
## `Business and Community Servicies (Income)` 3.9166
## `Continous Profesional Development and Continous Education (income)` -0.9194
## `Collaborative Research (income)` 3.1299
## `Intelectual Property Rights (income)` -2.7933
## Total.comprehensive.income.for.the.year 2.6692
## Total.staff 1.3590
## russell_group -1.4327
## Pr(>|t|)
## (Intercept) 0.0010059
## `Business and Community Servicies (Income)` 0.0001411
## `Continous Profesional Development and Continous Education (income)` 0.3595261
## `Collaborative Research (income)` 0.0021373
## `Intelectual Property Rights (income)` 0.0059644
## Total.comprehensive.income.for.the.year 0.0085221
## Total.staff 0.1763780
## russell_group 0.1542278
##
## (Intercept) **
## `Business and Community Servicies (Income)` ***
## `Continous Profesional Development and Continous Education (income)`
## `Collaborative Research (income)` **
## `Intelectual Property Rights (income)` **
## Total.comprehensive.income.for.the.year **
## Total.staff
## russell_group
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple3, vcov = vcovHC(lm.multiple3, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.477542 0.799330 1.8485 0.0665564 .
## logBnC 0.245550 0.103215 2.3790 0.0186518 *
## logColRes 0.274687 0.064160 4.2813 3.349e-05 ***
## logCPE 0.045424 0.069502 0.6536 0.5144274
## logIP 0.125146 0.035894 3.4865 0.0006467 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm.multiple4, vcov = vcovHC(lm.multiple4, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.390057 1.689123 -5.5591 1.368e-07 ***
## logBnC 0.134575 0.073323 1.8354 0.068619 .
## logColRes 0.119156 0.037417 3.1845 0.001795 **
## logCPE 0.072786 0.047611 1.5288 0.128623
## logIP 0.030451 0.022445 1.3567 0.177109
## logTotal 0.636870 0.226319 2.8140 0.005613 **
## logStaff 0.463208 0.195498 2.3694 0.019214 *
## russell_group -0.052639 0.142919 -0.3683 0.713210
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally we want to check for endogeneity for our “strongest” variable: Collaborative Research. To do this we look for an Instrumental Variable that fullfils the following conditions:
We belive income from research grants may be a good IV
library(AER)
## Cargando paquete requerido: car
## Cargando paquete requerido: carData
##
## Adjuntando el paquete: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## Cargando paquete requerido: survival
Data.income <- read.csv("https://raw.githubusercontent.com/martinayazllizakalik/econometrics-project/refs/heads/main/research%20grant%20data%20hesa.csv")
Data.income <- Data.income %>%
group_by(HE.Provider) %>%
summarize(mean_value = mean(Value..000s., na.rm = TRUE))
Data.income <- left_join(grouped_data, Data.income, by = "HE.Provider")
colnames(Data.income)[11] <- "Research grant"
ivreg_model <- ivreg(`Collaborative Research (income)` ~ `Research grant`, data = Data.income)
stargazer(ivreg_model, type = "text",
dep.var.labels = c("Collaborative Research"),
covariate.labels = c("Research grant income"))
##
## =================================================
## Dependent variable:
## ---------------------------
## Collaborative Research
## -------------------------------------------------
## Research grant income 0.151***
## (0.011)
##
## Constant 4,050.595***
## (1,107.931)
##
## -------------------------------------------------
## Observations 151
## R2 0.572
## Adjusted R2 0.570
## Residual Std. Error 12,458.430 (df = 149)
## =================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Our IV seems to be significant: The coefficient is 0.151, and it is statistically significant (indicated by ***).
To conclude: