Generate a synthetic data set from a simple model:
genDataSet <- function(n) {
x <- rnorm(n)
y <- 2 * x + rnorm(n)
data.frame(x,y)
}
genDataSet(n) will create fake data. As an example:
set.seed(1)
test = genDataSet(100)
plot(test)
Which can be fit quite easily:
summary(lm(y~x, test))
##
## Call:
## lm(formula = y ~ x, data = test)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
The resulting slope is fit to 2.0 +- 0.1
Now lets generate multiple synthetic data sets and fit the slope over and over again to get a feel for what the 0.1 standard error means. Each data set is a possible realization of the model.
genSlope <- function(n){
test <- lm(y~x, genDataSet(n))
test$coefficients[2]
}
slopeData = sapply(rep(100, 100), genSlope)
hist(slopeData)
mean(slopeData)
## [1] 2.001821
sd(slopeData)
## [1] 0.1028822
So with the standard error is about 0.1, as calculated from summary(lm)