# Clear memory
rm(list = ls())
# Optionally, remove specific objects individually
# rm(data1, model1)
# Clear all graphics devices
graphics.off()
# Detach packages (replace with actual package names)
# ?detach
# require(splines) # package
# detach(package:splines)
# Clear the console
cat("\014")
V. Statistical Analysis
Regression Analysis
Setup
Clean the environment
A basic setup routine in R that includes clearing the memory / removing datasets or objects, and setting up an environment to start fresh. This can be useful at the beginning of an analysis or a new session to ensure that no old variables or data interfere with your work.
- The default working directory in R is not necessarily the directory where your script is located.
- Your R session is now clean and ready to go!
Load the packages
Again, there are multiple ways to do this. I am assuming these basic packages are installed already and you just have to load them.
A good practice to list the packages in the order you use them, and put a comment on what it does.
# Load essential packages
packages <- c(
"conflicted", # Resolves function name conflicts between packages
"knitr", # For dynamic report generation and R Markdown
"stargazer", # Formats regression tables and summary statistics
"visdat", # Visualizes missing data and data structures
"tidyverse", # Collection of packages for data manipulation and analysis
"ggplot2", # Creates customizable data visualizations
"gridExtra", # Arranges multiple plots on one page
"car", # Provides regression diagnostics and statistical tools
"AER" # Applied Econometrics with R
)
?library
for (pkg in packages) {
library(package = pkg,
character.only = TRUE # whether package can be assumed to be character strings.
)
}
If you do not have any of the package, you may want to install them, or use the code chunk below to install and load them.
# Install and load packages
for (pkg in packages) {
if (!require(pkg, character.only = TRUE)) {
install.packages(pkg)
library(pkg, character.only = TRUE)
}
}
In R, it’s common to use multiple packages in a single project. However, sometimes different packages contain functions with the same name, leading to conflicts. For example, both the dplyr
and plyr
packages have a summarize()
function, which could cause issues if both packages are loaded simultaneously.
How to Manage Function Conflicts
-
Namespace Specification: The most direct way to manage function conflicts is by specifying the package namespace explicitly when calling a function. This avoids ambiguity and ensures you’re using the correct function from the desired package.
-
conflicted
Package: Theconflicted
package is a helpful tool for managing function conflicts. When a conflict occurs, this package forces you to choose which function to use, preventing accidental usage of the wrong one.# install.packages("conflicted") # Load the conflicted package library(conflicted) # Set preference for dplyr's summarize function conflict_prefer(name = "summarize", winner = "dplyr" )
[conflicted] Will prefer dplyr::summarize over any other package.
# Now, you can use summarize without specifying the namespace result <- summarize(mtcars, mean_value = mean(mpg) )
Load Order: The order in which packages are loaded can influence which function is used by default. If two functions have the same name, R will use the function from the package that was loaded last. However, relying on load order can be error-prone, so it’s generally better to use namespace specification or the
conflicted
package.
By managing function conflicts carefully, you can avoid unexpected behavior and ensure that your code runs smoothly, even when using multiple packages with overlapping functionality.
Alternatively -
?lapply
# Load each package individually using lapply
lapply(X = packages,
FUN = library,
character.only = TRUE
)
Load the data
Can come from multiple ways.
APIs. Example FRED API for macroeconomic data.
Flat file like
.csv
or.xlsx
.Base R packages like
utils::data()
or user written R packages likeAER
(data(package = "AER")
). Run the following command to explore data sets in R -data(package = .packages(all.available = TRUE))
.
Data
We will use a simple dataset mtcars
from base R for now.
To read up on the variable definitions, lets open up mtcars
help files-
?mtcars
Motor Trend Car Road Tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Format
A data frame with 32 observations on 11 (numeric) variables.
[, 1] | mpg |
Miles/(US) gallon |
[, 2] | cyl |
Number of cylinders |
[, 3] | disp |
Displacement (cu.in.) |
[, 4] | hp |
Gross horsepower |
[, 5] | drat |
Rear axle ratio |
[, 6] | wt |
Weight (1000 lbs) |
[, 7] | qsec |
1/4 mile time |
[, 8] | vs |
Engine (0 = V-shaped, 1 = straight) |
[, 9] | am |
Transmission (0 = automatic, 1 = manual) |
[,10] | gear |
Number of forward gears |
[,11] | carb |
Number of carburetors |
Summary Statistics
Check for missing values.
vis_dat(df)
Before we run regressions, always a good idea to do summary statistics (helps understand the variables better as well).
With kable
-
# Generate descriptive statistics using psych::describe
summary_stats <- psych::describe(mtcars)
# Create a nicely formatted table using kable
kable(x = summary_stats,
caption = "Descriptive Statistics for mtcars Dataset",
digits = 2 )
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 1 | 32 | 20.09 | 6.03 | 19.20 | 19.70 | 5.41 | 10.40 | 33.90 | 23.50 | 0.61 | -0.37 | 1.07 |
cyl | 2 | 32 | 6.19 | 1.79 | 6.00 | 6.23 | 2.97 | 4.00 | 8.00 | 4.00 | -0.17 | -1.76 | 0.32 |
disp | 3 | 32 | 230.72 | 123.94 | 196.30 | 222.52 | 140.48 | 71.10 | 472.00 | 400.90 | 0.38 | -1.21 | 21.91 |
hp | 4 | 32 | 146.69 | 68.56 | 123.00 | 141.19 | 77.10 | 52.00 | 335.00 | 283.00 | 0.73 | -0.14 | 12.12 |
drat | 5 | 32 | 3.60 | 0.53 | 3.70 | 3.58 | 0.70 | 2.76 | 4.93 | 2.17 | 0.27 | -0.71 | 0.09 |
wt | 6 | 32 | 3.22 | 0.98 | 3.33 | 3.15 | 0.77 | 1.51 | 5.42 | 3.91 | 0.42 | -0.02 | 0.17 |
qsec | 7 | 32 | 17.85 | 1.79 | 17.71 | 17.83 | 1.42 | 14.50 | 22.90 | 8.40 | 0.37 | 0.34 | 0.32 |
vs | 8 | 32 | 0.44 | 0.50 | 0.00 | 0.42 | 0.00 | 0.00 | 1.00 | 1.00 | 0.24 | -2.00 | 0.09 |
am | 9 | 32 | 0.41 | 0.50 | 0.00 | 0.38 | 0.00 | 0.00 | 1.00 | 1.00 | 0.36 | -1.92 | 0.09 |
gear | 10 | 32 | 3.69 | 0.74 | 4.00 | 3.62 | 1.48 | 3.00 | 5.00 | 2.00 | 0.53 | -1.07 | 0.13 |
carb | 11 | 32 | 2.81 | 1.62 | 2.00 | 2.65 | 1.48 | 1.00 | 8.00 | 7.00 | 1.05 | 1.26 | 0.29 |
With stargazer
-
Summary Statistics for mtcars Dataset
===============================================
Statistic N Mean St. Dev. Min Max
-----------------------------------------------
MPG 32 20.09 6.03 10.40 33.90
Cylinders 32 6.19 1.79 4 8
Disp. 32 230.72 123.94 71.10 472.00
HP 32 146.69 68.56 52 335
Rear Axle Ratio 32 3.60 0.53 2.76 4.93
Weight 32 3.22 0.98 1.51 5.42
Q-Mile Time 32 17.85 1.79 14.50 22.90
V/S 32 0.44 0.50 0 1
Transmission 32 0.41 0.50 0 1
Gears 32 3.69 0.74 3 5
Carbs 32 2.81 1.62 1 8
-----------------------------------------------
Simple Linear Regression
Estimating Equation
Lets write out the estimating equation before we run the regression -
\[ mpg_i = \beta_0 + \beta_1 hp_i + \epsilon_i \]
Now, lets implement it in R -
?lm
# Simple Linear Regression: mpg vs hp
lm(data = df, formula = mpg ~ hp)
Call:
lm(formula = mpg ~ hp, data = df)
Coefficients:
(Intercept) hp
30.09886 -0.06823
Usually, we store the lm
output as an object in R -
?lm
# Simple Linear Regression: mpg vs hp
model_bivariate <- lm(data = df,
formula = mpg ~ hp
)
We saved the output of the model into the object model_bivariate
. Lets print it out and explore the output.
summary(object = model_bivariate)
Call:
lm(formula = mpg ~ hp, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.7121 -2.1122 -0.8854 1.5819 8.2360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
hp -0.06823 0.01012 -6.742 1.79e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
- We can call upon specific parts of the output as well if we want.
model_bivariate$coefficients
(Intercept) hp
30.09886054 -0.06822828
Plot the regression line
Base R
The three lines fit on top of each other, which is why you only see the last line.
plot(x = mtcars$hp,
y = mtcars$mpg,
main = "Simple Linear Regression: mpg vs hp",
xlab = "Horsepower (hp)",
ylab = "Miles per Gallon (mpg)"
)
?abline # function adds one or more straight lines through the current plot.
# abline(model_bivariate, col = "red")
abline(a = 30.09886, # intercept
b = -0.06823, # slope
col = "red"
)
abline(model_bivariate,
col = "blue")
# abline(model_bivariate, col = "red")
abline(a = mean(mtcars$mpg) - model_bivariate$coefficients[2] * mean(mtcars$hp) ,
b = cov(mtcars$mpg, mtcars$hp) / var(mtcars$hp),
col = "green"
)
You can confirm your logic/intuition of formulas.
\[ \beta_1 = \dfrac{Cov(y,x)}{Var(x)} \]
We can average and rearrange the original estimating equation to solve for \(\beta_0\) if we know \(\beta_1\) -
\[\begin{eqnarray} y_i &=& \beta_0 + \beta_1 * x_i + \epsilon_i \\ => \bar y &=& \beta_0 + \beta_1 * \bar x \\ => \beta_0 &=& \bar y- \beta_1 * \bar x \end{eqnarray}\]
How to type out multiple equations and aligning them under =
.
ggplot
?geom_smooth
# Create a scatter plot with a regression line
ggplot(data = mtcars,
mapping = aes(x = hp, y = mpg)) +
geom_point() + # Add scatter plot points
geom_smooth(method = "lm",
col = "red") + # Add a linear regression line
labs(
title = "Simple Linear Regression: mpg vs hp",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)"
) +
theme_minimal() # Use a minimal theme for cleaner appearance
`geom_smooth()` using formula = 'y ~ x'
Explore the model output
- List of 12 objects (see
Environment
tab)
model_bivariate
Call:
lm(formula = mpg ~ hp, data = df)
Coefficients:
(Intercept) hp
30.09886 -0.06823
- Coefficients
model_bivariate$coefficients
(Intercept) hp
30.09886054 -0.06822828
- Residuals - should add to 0.
head(model_bivariate$residuals)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
-1.5937500 -1.5937500 -0.9536307 -1.1937500
Hornet Sportabout Valiant
0.5410881 -4.8348913
sum(model_bivariate$residuals) # should add to zero
[1] 1.487699e-14
[1] 0
All kinds of model fit metrics can be calculated from residuals.
# Calculate Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
mae <- mean(abs (model_bivariate$residuals) )
rmse <- sqrt(mean(model_bivariate$residuals^2 ))
# Print performance metrics
print(paste("Mean Absolute Error (MAE):", round(mae, 2)))
[1] "Mean Absolute Error (MAE): 2.91"
[1] "Root Mean Squared Error (RMSE): 3.74"
- Predicted values - points on the best fit line in a new data.
- Fitted values - points on the best fit line in the data.
When using a regression model to make predictions on new data, the predicted values represent the points on the best fit line (or plane, in the case of multiple regression) corresponding to the new observations. To ensure accurate predictions, the new data must have variables with the same names as those used to fit the model.
Example 1: Split the original data into testing and training, implement model on training data and predict on testing data
# Set seed for reproducibility
set.seed(123)
?sample
# Define proportion for training data (80% for training, 20% for testing)
train_proportion <- 0.8
# Number of observations in the dataset
num_observations <- nrow(mtcars)
?seq_len
# Create random indices for training set
train_indices <- sample(x = seq_len(num_observations), # desired length of the sequence
size = floor(train_proportion * num_observations) # floor ensure we have a whole number
)
# Split the data into training and testing sets
mtcars_train <- mtcars[ train_indices, ]
mtcars_test <- mtcars[-train_indices, ]
model_bivariate_train <-
lm(data = mtcars_train,
formula = mpg ~ hp)
head(predict(object = model_bivariate_train,
newdata = mtcars_test)
)
Mazda RX4 Wag Valiant Merc 450SE Merc 450SL
23.15635 23.49597 18.40169 18.40169
Lincoln Continental Toyota Corona
16.02436 24.03936
Multivariate Regression
Lets type our multivariate regression first.
\[ mpg_i = \beta_0 + \beta_1 hp_i + \beta_2 wt_i + \epsilon_i \]
- Implementation in R is the same, just add more independent variables.
model_mult1 <-
lm(data = df,
formula = mpg ~ hp + wt
)
column.labels = c("Diff-in-Diff")
Quick way to create higher order terms
- Suppose you think
mpg
is quadratically related withwt
. You do not have to create a new variable and add it in yourdf
data frame, but can simply ask R to create one on the spot with theI()
function.
I()
function
Call:
lm(formula = mpg ~ hp + wt + I(wt^2), data = df)
Coefficients:
(Intercept) hp wt I(wt^2)
47.83728 -0.02728 -10.82217 0.98181
To generate variables “on the go” in R, especially when you want to create polynomial or interaction terms that do not exist as separate variables in your data frame, you can use the
I()
function within formulas. TheI()
function allows you to include mathematical expressions, such as squares or products of variables, directly within model formulas or when creating new variables.If you think
mpg
is more non-linearly related withwt
(in specifically is too many powers), you can try thepoly
function as well.
Make sure the variables you want to add are being generated !
lm(data = df,
formula = mpg ~ hp + wt + wt^2
) # does not work !!!
Call:
lm(formula = mpg ~ hp + wt + wt^2, data = df)
Coefficients:
(Intercept) hp wt
37.22727 -0.03177 -3.87783
- Print out the models side by side with
stargazer
package.
stargazer(model_mult2, model_mult2p, model_mult2po,
type = "text"
)
======================================================================
Dependent variable:
--------------------------------
mpg
(1) (2) (3)
----------------------------------------------------------------------
hp -0.027*** -0.027*** -0.027***
(0.008) (0.008) (0.008)
wt -10.822***
(2.281)
I(wt2) 0.982***
(0.313)
poly(x = wt, degree = 2, raw = TRUE)1 -10.822***
(2.281)
poly(x = wt, degree = 2, raw = TRUE)2 0.982***
(0.313)
poly(x = wt, degree = 2)1 -22.255***
(3.039)
poly(x = wt, degree = 2)2 7.240***
(2.307)
Constant 47.837*** 47.837*** 24.093***
(3.659) (3.659) (1.245)
----------------------------------------------------------------------
Observations 32 32 32
R2 0.872 0.872 0.872
Adjusted R2 0.858 0.858 0.858
Residual Std. Error (df = 28) 2.270 2.270 2.270
F Statistic (df = 3; 28) 63.503*** 63.503*** 63.503***
======================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
- Can control lots of settings.
stargazer(model_mult2, model_mult2p, model_mult2po,
type = "text",
title = "My regression table",
omit.stat = c("n"),
add.lines = list(c("F-Statistics",
round(summary(model_mult2)$fstatistic[1], digits = 1),
round(summary(model_mult2p)$fstatistic[1], digits = 1),
round(summary(model_mult2po)$fstatistic[1], digits = 1)))
)
My regression table
======================================================================
Dependent variable:
--------------------------------
mpg
(1) (2) (3)
----------------------------------------------------------------------
hp -0.027*** -0.027*** -0.027***
(0.008) (0.008) (0.008)
wt -10.822***
(2.281)
I(wt2) 0.982***
(0.313)
poly(x = wt, degree = 2, raw = TRUE)1 -10.822***
(2.281)
poly(x = wt, degree = 2, raw = TRUE)2 0.982***
(0.313)
poly(x = wt, degree = 2)1 -22.255***
(3.039)
poly(x = wt, degree = 2)2 7.240***
(2.307)
Constant 47.837*** 47.837*** 24.093***
(3.659) (3.659) (1.245)
----------------------------------------------------------------------
F-Statistics 63.5 63.5 63.5
R2 0.872 0.872 0.872
Adjusted R2 0.858 0.858 0.858
Residual Std. Error (df = 28) 2.270 2.270 2.270
F Statistic (df = 3; 28) 63.503*** 63.503*** 63.503***
======================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Quick way to create interaction terms
- Short hand notation can be useful if using Diff-in-Diff model.
# only interaction term
model_mult5 <- lm(data = df,
formula = mpg ~ hp * wt
)
#interaction term and the original terms
model_mult5a <- lm(data = df,
formula = mpg ~ hp : wt
)
stargazer(model_mult5, model_mult5a, type = "text")
=================================================================
Dependent variable:
---------------------------------------------
mpg
(1) (2)
-----------------------------------------------------------------
hp -0.120***
(0.025)
wt -8.217***
(1.270)
hp:wt 0.028*** -0.015***
(0.007) (0.002)
Constant 49.808*** 27.746***
(3.605) (1.062)
-----------------------------------------------------------------
Observations 32 32
R2 0.885 0.712
Adjusted R2 0.872 0.702
Residual Std. Error 2.153 (df = 28) 3.288 (df = 30)
F Statistic 71.660*** (df = 3; 28) 74.136*** (df = 1; 30)
=================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Declaring factor variables
- You might have to decide if a quantitative variable needs to be treated as a continuous variable or a categorical variable !
# continous variable
model_mult7 <- lm(data = df,
formula = mpg ~ hp + wt + cyl
)
# categorical variable
model_mult7f <- lm(data = df,
formula = mpg ~ hp + wt + as.factor(cyl)
)
stargazer(model_mult7 , model_mult7f, type = "text")
=================================================================
Dependent variable:
---------------------------------------------
mpg
(1) (2)
-----------------------------------------------------------------
hp -0.018 -0.023*
(0.012) (0.012)
wt -3.167*** -3.181***
(0.741) (0.720)
cyl -0.942*
(0.551)
as.factor(cyl)6 -3.359**
(1.402)
as.factor(cyl)8 -3.186
(2.170)
Constant 38.752*** 35.846***
(1.787) (2.041)
-----------------------------------------------------------------
Observations 32 32
R2 0.843 0.857
Adjusted R2 0.826 0.836
Residual Std. Error 2.512 (df = 28) 2.440 (df = 27)
F Statistic 50.171*** (df = 3; 28) 40.525*** (df = 4; 27)
=================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Residual Analysis
Residual analysis (diagnostics) is a crucial step in evaluating the performance and validity of a regression model. It helps in diagnosing potential issues like non-linearity, heteroscedasticity, and the presence of outliers, which can affect the model’s accuracy and reliability.
- Common to check if Gauss markov assumptions hold.
-
par(mfrow=c(nrows, ncols))
function in R, which is used to set up a multi-panel plotting layout. This function divides the plotting area into a grid withnrows
rows andncols
columns, allowing you to display multiple plots in a single plotting window.
par(mfrow=c(nrows=2, 2)) # sets up the plotting area into a 2-by-2 grid.
# Residuals vs Fitted Plot
plot(model_mult1, which = 1)
# Normal Q-Q Plot
plot(model_mult1, which = 2)
# Scale-Location Plot
plot(model_mult1, which = 3)
# Cook's Distance
plot(model_mult1, which = 4)
- After creating the plots, it’s good practice to reset the plotting layout to the default (single plot).
- If you are using
ggplot2
, you can use thegridExtra
package to achieve a similar layout.
# Residuals vs Fitted
p1 <- ggplot(model_mult1, aes(.fitted, .resid)) +
geom_point() +
ggtitle("Residuals vs Fitted") +
xlab("Fitted values") +
ylab("Residuals") +
theme_minimal()
# Normal Q-Q Plot
p2 <- ggplot(model_mult1, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
ggtitle("Normal Q-Q") +
xlab("Theoretical Quantiles") +
ylab("Sample Quantiles") +
theme_minimal()
# Scale-Location Plot
p3 <- ggplot(model_mult1, aes(.fitted, sqrt(abs(.resid)))) +
geom_point() +
ggtitle("Scale-Location") +
xlab("Fitted values") +
ylab("Square Root of |Residuals|") +
theme_minimal()
# Cook's Distance
p4 <- ggplot(model_mult1, aes(seq_along(.cooksd), .cooksd)) +
geom_point() +
ggtitle("Cook's Distance") +
xlab("Index") +
ylab("Cook's Distance") +
theme_minimal()
# Combine plots using gridExtra
grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)
Statistical tests
- To check for heteroscedasticity (non-constant variance) in regression models, you can run specific statistical tests as well if you do not like visually inspecting the residuals.
# VIF to check for multicollinearity
car::vif(model_mult1)
hp wt
1.766625 1.766625
Appendix
1. Writing Math in Rmarkdown
2. Quoting usage of R.
citation()
To cite R in publications use:
R Core Team (2025). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2025},
url = {https://www.R-project.org/},
}
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.
3. Quoting usage of certain packages in R.
- To generate citations for multiple R packages using
citation()
, you can’t directly pass a vector of package names. Instead, you’ll need to loop through each package name and callcitation()
for each one.
To cite package 'conflicted' in publications use:
Wickham H (2023). _conflicted: An Alternative Conflict Resolution
Strategy_. doi:10.32614/CRAN.package.conflicted
<https://doi.org/10.32614/CRAN.package.conflicted>, R package version
1.2.0, <https://CRAN.R-project.org/package=conflicted>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {conflicted: An Alternative Conflict Resolution Strategy},
author = {Hadley Wickham},
year = {2023},
note = {R package version 1.2.0},
url = {https://CRAN.R-project.org/package=conflicted},
doi = {10.32614/CRAN.package.conflicted},
}
To cite package 'knitr' in publications use:
Xie Y (2025). _knitr: A General-Purpose Package for Dynamic Report
Generation in R_. R package version 1.50, <https://yihui.org/knitr/>.
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition.
Chapman and Hall/CRC. ISBN 978-1498716963
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible
Research in R. In Victoria Stodden, Friedrich Leisch and Roger D.
Peng, editors, Implementing Reproducible Computational Research.
Chapman and Hall/CRC. ISBN 978-1466561595
To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.
Please cite stargazer in publications as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and
Summary Statistics Tables. R package version 5.2.3.
https://CRAN.R-project.org/package=stargazer
A BibTeX entry for LaTeX users is
@Manual{,
title = {stargazer: Well-Formatted Regression and Summary Statistics Tables},
author = {Marek Hlavac},
year = {2022},
note = {R package version 5.2.3},
organization = {Social Policy Institute},
address = {Bratislava, Slovakia},
url = {https://CRAN.R-project.org/package=stargazer},
}
To cite package 'visdat' in publications use:
Tierney N (2017). "visdat: Visualising Whole Data Frames." _JOSS_,
*2*(16), 355. doi:10.21105/joss.00355
<https://doi.org/10.21105/joss.00355>,
<http://dx.doi.org/10.21105/joss.00355>.
A BibTeX entry for LaTeX users is
@Article{,
title = {visdat: Visualising Whole Data Frames},
author = {Nicholas Tierney},
doi = {10.21105/joss.00355},
url = {http://dx.doi.org/10.21105/joss.00355},
year = {2017},
publisher = {Journal of Open Source Software},
volume = {2},
number = {16},
pages = {355},
journal = {JOSS},
}
To cite package 'tidyverse' in publications use:
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
A BibTeX entry for LaTeX users is
@Article{,
title = {Welcome to the {tidyverse}},
author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
year = {2019},
journal = {Journal of Open Source Software},
volume = {4},
number = {43},
pages = {1686},
doi = {10.21105/joss.01686},
}
To cite ggplot2 in publications, please use
H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
Springer-Verlag New York, 2016.
A BibTeX entry for LaTeX users is
@Book{,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2016},
isbn = {978-3-319-24277-4},
url = {https://ggplot2.tidyverse.org},
}
To cite package 'gridExtra' in publications use:
Auguie B (2017). _gridExtra: Miscellaneous Functions for "Grid"
Graphics_. doi:10.32614/CRAN.package.gridExtra
<https://doi.org/10.32614/CRAN.package.gridExtra>, R package version
2.3, <https://CRAN.R-project.org/package=gridExtra>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {gridExtra: Miscellaneous Functions for "Grid" Graphics},
author = {Baptiste Auguie},
year = {2017},
note = {R package version 2.3},
url = {https://CRAN.R-project.org/package=gridExtra},
doi = {10.32614/CRAN.package.gridExtra},
}
To cite the car package in publications use:
Fox J, Weisberg S (2019). _An R Companion to Applied Regression_,
Third edition. Sage, Thousand Oaks CA.
<https://www.john-fox.ca/Companion/>.
A BibTeX entry for LaTeX users is
@Book{,
title = {An {R} Companion to Applied Regression},
edition = {Third},
author = {John Fox and Sanford Weisberg},
year = {2019},
publisher = {Sage},
address = {Thousand Oaks {CA}},
url = {https://www.john-fox.ca/Companion/},
}
To cite AER, please use:
Kleiber C, Zeileis A (2008). _Applied Econometrics with R_.
Springer-Verlag, New York. doi:10.1007/978-0-387-77318-6
<https://doi.org/10.1007/978-0-387-77318-6>,
<https://CRAN.R-project.org/package=AER>.
A BibTeX entry for LaTeX users is
@Book{,
title = {Applied Econometrics with {R}},
author = {Christian Kleiber and Achim Zeileis},
year = {2008},
publisher = {Springer-Verlag},
address = {New York},
doi = {10.1007/978-0-387-77318-6},
url = {https://CRAN.R-project.org/package=AER},
}