STD_ID PST_ID TIME GRD_LVL GENDER ETHNICITY MINORITY SES
1 F16_T1_1 F2016_1 0 3 Female White non_Minority low_SES
2 F16_T1_2 F2016_1 0 3 Female White non_Minority low_SES
3 F16_T1_3 F2016_1 0 3 Female White non_Minority low_SES
4 F16_T1_4 F2016_1 0 3 Male Hispanics Minority low_SES
5 F16_T1_5 F2016_1 0 3 Male White non_Minority low_SES
6 F16_T1_6 F2016_1 0 3 Female Hispanics Minority low_SES
ESE ESOL PRETEST POSTTEST
1 non_Exceptional Students ELs 31.94 68.06
2 Gifted Students Non ELs 59.72 91.67
3 non_Exceptional Students Non ELs 47.22 83.33
4 non_Exceptional Students ELs 51.39 87.50
5 non_Exceptional Students Non ELs 51.39 87.50
6 Gifted Students Non ELs 59.72 100.00
[1] 13163 12
'data.frame': 13163 obs. of 12 variables:
$ STD_ID : chr "F16_T1_1" "F16_T1_2" "F16_T1_3" "F16_T1_4" ...
$ PST_ID : chr "F2016_1" "F2016_1" "F2016_1" "F2016_1" ...
$ TIME : int 0 0 0 0 0 0 0 0 0 0 ...
$ GRD_LVL : int 3 3 3 3 3 3 3 3 3 3 ...
$ GENDER : chr "Female" "Female" "Female" "Male" ...
$ ETHNICITY: chr "White" "White" "White" "Hispanics" ...
$ MINORITY : chr "non_Minority" "non_Minority" "non_Minority" "Minority" ...
$ SES : chr "low_SES" "low_SES" "low_SES" "low_SES" ...
$ ESE : chr "non_Exceptional Students" "Gifted Students" "non_Exceptional Students" "non_Exceptional Students" ...
$ ESOL : chr "ELs" "Non ELs" "Non ELs" "ELs" ...
$ PRETEST : num 31.9 59.7 47.2 51.4 51.4 ...
$ POSTTEST : num 68.1 91.7 83.3 87.5 87.5 ...
STD_ID PST_ID GRD_LVL GENDER ETHNICITY MINORITY SES ESE
1 F16_T1_1 F2016_1 3 Female White non_Minority low_SES non_Exceptional Students
2 F16_T1_2 F2016_1 3 Female White non_Minority low_SES Gifted Students
3 F16_T1_3 F2016_1 3 Female White non_Minority low_SES non_Exceptional Students
4 F16_T1_4 F2016_1 3 Male Hispanics Minority low_SES non_Exceptional Students
5 F16_T1_5 F2016_1 3 Male White non_Minority low_SES non_Exceptional Students
6 F16_T1_6 F2016_1 3 Female Hispanics Minority low_SES Gifted Students
ESOL PRETEST POSTTEST DATE
1 ELs 31.94 68.06 2016-08-16
2 Non ELs 59.72 91.67 2016-08-16
3 Non ELs 47.22 83.33 2016-08-16
4 ELs 51.39 87.50 2016-08-16
5 Non ELs 51.39 87.50 2016-08-16
6 Non ELs 59.72 100.00 2016-08-16
'data.frame': 13163 obs. of 12 variables:
$ STD_ID : chr "F16_T1_1" "F16_T1_2" "F16_T1_3" "F16_T1_4" ...
$ PST_ID : chr "F2016_1" "F2016_1" "F2016_1" "F2016_1" ...
$ GRD_LVL : int 3 3 3 3 3 3 3 3 3 3 ...
$ GENDER : chr "Female" "Female" "Female" "Male" ...
$ ETHNICITY: chr "White" "White" "White" "Hispanics" ...
$ MINORITY : chr "non_Minority" "non_Minority" "non_Minority" "Minority" ...
$ SES : chr "low_SES" "low_SES" "low_SES" "low_SES" ...
$ ESE : chr "non_Exceptional Students" "Gifted Students" "non_Exceptional Students" "non_Exceptional Students" ...
$ ESOL : chr "ELs" "Non ELs" "Non ELs" "ELs" ...
$ PRETEST : num 31.9 59.7 47.2 51.4 51.4 ...
$ POSTTEST : num 68.1 91.7 83.3 87.5 87.5 ...
$ DATE : Date, format: "2016-08-16" "2016-08-16" ...
STD_ID PST_ID GRD_LVL GENDER ETHNICITY MINORITY SES ESE ESOL
0 0 0 0 352 0 173 0 0
PRETEST POSTTEST DATE
0 0 0
'data.frame': 13163 obs. of 4 variables:
$ esol : chr "ELs" "Non ELs" "Non ELs" "ELs" ...
$ pretest : num 31.9 59.7 47.2 51.4 51.4 ...
$ posttest: num 68.1 91.7 83.3 87.5 87.5 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
ELs Exited ELs Non ELs
1416 354 11393
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 3.000 3.000 2.758 3.000 3.000
esol pretest posttest date
0 0 0 0
[1] "ts"
Augmented Dickey-Fuller Test
data: time_series
Dickey-Fuller = -17.222, Lag order = 20, p-value = 0.01
alternative hypothesis: stationary
The Augmented Dickey-Fuller (ADF) test is used to determine whether a time series is stationary or not. Stationarity refers to a time series having a constant mean, variance, and autocovariance structure over time.
In the output you provided, the test statistic (Dickey-Fuller) is -17.222, and the p-value is 0.01. The lag order used for the test is 20.
The null hypothesis of the ADF test is that the time series is non-stationary. The alternative hypothesis is that the time series is stationary.
Since the p-value (0.01) is less than the significance level of 0.05, we can reject the null hypothesis and conclude that the time series is stationary. This means that the mean, variance, and autocovariance structure of the time series do not change over time.
[1] "numeric"
[1] "numeric"
MSE: 366.7782
RMSE: 19.15145
MAPE: Inf %
The MSE (Mean Squared Error) is a measure of the average squared differences between the predicted values and the actual values. The lower the MSE, the better the model fits the data. In this case, the MSE is 366.7782.
The RMSE (Root Mean Squared Error) is the square root of the MSE, and is a measure of the average distance between the predicted values and the actual values. Like the MSE, the lower the RMSE, the better the model fits the data. In this case, the RMSE is 19.15145.
The MAPE (Mean Absolute Percentage Error) is a measure of the accuracy of the model as a percentage of the actual values. It indicates the average percentage difference between the predicted and actual values. A value of 0% indicates a perfect fit. In this case, the MAPE is Inf %, which suggests that the model is not accurate.
This means that the Mean Absolute Error (MAE) between the predicted values and the actual values in the test dataset is approximately 15.01643. In other words, the average difference between the predicted values and the actual values is around 15.01. A lower MAE indicates better accuracy of the model.
'data.frame': 1416 obs. of 4 variables:
$ esol : num 1 1 1 1 1 1 1 1 1 1 ...
$ pretest : num 31.9 51.4 43.1 11.1 23.6 ...
$ posttest: num 68.1 87.5 87.5 72.2 87.5 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
'data.frame': 11393 obs. of 4 variables:
$ esol : num 3 3 3 3 3 3 3 3 3 3 ...
$ pretest : num 59.7 47.2 51.4 59.7 27.8 ...
$ posttest: num 91.7 83.3 87.5 100 79.2 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
'data.frame': 354 obs. of 4 variables:
$ esol : num 2 2 2 2 2 2 2 2 2 2 ...
$ pretest : num 37.5 42.5 50 60 33.3 ...
$ posttest: num 90 90 90 93.3 75 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
ELs Pretest Model Metrics:
MAE: 33.81893
MSE: 1447.849
RMSE: 38.05061
MAPE: Inf %
ELs Posttest Model Metrics:
MAE: 18.37413
MSE: 509.6918
RMSE: 22.57635
MAPE: Inf %
The metrics printed for each model represent different measures of error between the predicted values and the actual values of the response variable for the ELs group, pretest and posttest.
MAE (Mean Absolute Error) is the average of the absolute difference between the predicted values and the actual values. In the case of the ELs Pretest Model, the average absolute difference is 33.82, and for the ELs Posttest Model, it is 18.37.
MSE (Mean Squared Error) is the average of the squared difference between the predicted values and the actual values. In the case of the ELs Pretest Model, the average squared difference is 1447.85, and for the ELs Posttest Model, it is 509.69.
RMSE (Root Mean Squared Error) is the square root of the average of the squared difference between the predicted values and the actual values. In the case of the ELs Pretest Model, the square root of the average squared difference is 38.05, and for the ELs Posttest Model, it is 22.58.
MAPE (Mean Absolute Percentage Error) is the average of the absolute percentage difference between the predicted values and the actual values. In the case of both models, the MAPE is calculated as Inf% due to the presence of 0 values in the actual values.
Therefore, these metrics can help us compare the performance of the two models and determine which model provides better predictions for the ELs group, pretest and posttest.
The posttest model for ELs performed better than the pretest model, as the MAE, MSE, and RMSE are lower and closer to 0 in the posttest model. However, it’s important to note that the MAPE is still infinite for both models, which suggests that the models are not accurate in predicting the values.
esol pretest posttest date
Min. :1.000 Min. : 0.00 Min. : 0.00 Min. :2016-08-16
1st Qu.:3.000 1st Qu.: 30.00 1st Qu.: 70.83 1st Qu.:2017-01-01
Median :3.000 Median : 48.81 Median : 85.42 Median :2017-08-01
Mean :2.758 Mean : 48.66 Mean : 80.65 Mean :2017-07-21
3rd Qu.:3.000 3rd Qu.: 66.67 3rd Qu.: 95.83 3rd Qu.:2018-01-01
Max. :3.000 Max. :100.00 Max. :100.00 Max. :2018-08-01
Call:
lm(formula = ts_data[, 2] ~ ts_data[, 1])
Residuals:
Min 1Q Median 3Q Max
-21.0536 -3.2297 -0.9865 2.9144 15.0461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.74101 3.81267 16.980 1.66e-14 ***
ts_data[, 1] 0.46206 0.07788 5.933 4.77e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.249 on 23 degrees of freedom
Multiple R-squared: 0.6048, Adjusted R-squared: 0.5876
F-statistic: 35.2 on 1 and 23 DF, p-value: 4.766e-06
This output is from a linear regression model that regresses the pretest scores on the date.
The intercept of 64.74 means that the predicted pretest score is 64.74 when the date is 0 (the reference date). The coefficient of 0.46 means that for each one-unit increase in date, the predicted pretest score will increase by 0.46 units.
The p-value of 4.77e-06 is less than 0.05, which means that the coefficient of date is statistically significant. The R-squared of 0.60 indicates that 60.48% of the variance in pretest scores can be explained by the linear relationship with date. The Adjusted R-squared of 0.59 indicates that the model is not overfitting the data.
The residuals show that the minimum residual is -21.05, the maximum residual is 15.05, and the majority of the residuals are within 3 standard deviations from the mean, which suggests that the model fits the data well.
Call:
lm(formula = ts_data[, 2] ~ ts_data[, 1] + ts_data[, 3])
Residuals:
Min 1Q Median 3Q Max
-18.533 -4.195 -1.192 5.465 14.777
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.32892 4.89498 12.325 2.38e-11 ***
ts_data[, 1] 0.40356 0.08708 4.634 0.000128 ***
ts_data[, 3] 2.86639 2.05433 1.395 0.176855
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.084 on 22 degrees of freedom
Multiple R-squared: 0.6369, Adjusted R-squared: 0.6039
F-statistic: 19.3 on 2 and 22 DF, p-value: 1.445e-05
The linear model was fitted using the lm() function, with posttest as the response variable and pretest and esol as predictor variables. The resulting model shows that pretest and esol have a statistically significant relationship with posttest, with p-values of 0.00013 and 0.1769, respectively. The intercept is also statistically significant with a p-value of 2.38e-11. The multiple R-squared value of the model is 0.6369, indicating that 63.69% of the variation in posttest can be explained by pretest and esol. The residual standard error is 8.084, and the F-statistic has a p-value of 1.445e-05, indicating that the model is significant overall. The linear regression model suggests that the posttest scores (dependent variable) are significantly related to pretest scores and ESOL levels (independent variables). The intercept of the model is 60.32892, which means that if both the pretest score and ESOL level are zero, the predicted posttest score would be 60.32892.
The coefficient of the pretest score is 0.40356, indicating that for every one-unit increase in the pretest score, the posttest score is expected to increase by 0.40356 units, holding other variables constant. This coefficient is statistically significant at the 0.001 level, meaning that the relationship between pretest and posttest scores is highly likely not due to chance.
The coefficient of the ESOL level is 2.86639, indicating that for every one-unit increase in the ESOL level, the posttest score is expected to increase by 2.86639 units, holding other variables constant. However, this coefficient is not statistically significant at the 0.05 level, meaning that the relationship between ESOL level and posttest scores could be due to chance.
The multiple R-squared value of 0.6369 suggests that the model explains 63.69% of the variance in the posttest scores, and the adjusted R-squared value of 0.6039 takes into account the number of variables in the model. The F-statistic value of 19.3 with a p-value of 1.445e-05 suggests that the overall model is statistically significant at the 0.001 level.
Therefore, based on this model, both pretest scores and ESOL levels are significant predictors of posttest scores, but the effect of ESOL level may not be as strong as the effect of the pretest score.
Call:
lm(formula = ts_data_time[, 2] ~ ts_data_time[, 1] + dummies)
Residuals:
Min 1Q Median 3Q Max
-97.095 -8.646 2.905 11.601 40.163
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.654466 0.343871 193.835 < 2e-16 ***
ts_data_time[, 1] 0.304410 0.006087 50.012 < 2e-16 ***
dummiesesolel -6.817758 0.490405 -13.902 < 2e-16 ***
dummiesesolexited-el -3.263711 0.933799 -3.495 0.000475 ***
dummiesesolnon-els NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.29 on 13159 degrees of freedom
Multiple R-squared: 0.1813, Adjusted R-squared: 0.1811
F-statistic: 971.2 on 3 and 13159 DF, p-value: < 2.2e-16
'data.frame': 1416 obs. of 4 variables:
$ esol : num 1 1 1 1 1 1 1 1 1 1 ...
$ pretest : num 31.9 51.4 43.1 11.1 23.6 ...
$ posttest: num 68.1 87.5 87.5 72.2 87.5 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
'data.frame': 11393 obs. of 4 variables:
$ esol : num 3 3 3 3 3 3 3 3 3 3 ...
$ pretest : num 59.7 47.2 51.4 59.7 27.8 ...
$ posttest: num 91.7 83.3 87.5 100 79.2 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
'data.frame': 354 obs. of 4 variables:
$ esol : num 2 2 2 2 2 2 2 2 2 2 ...
$ pretest : num 37.5 42.5 50 60 33.3 ...
$ posttest: num 90 90 90 93.3 75 ...
$ date : Date, format: "2016-08-16" "2016-08-16" ...
[,1] [,2] [,3] [,4]
[1,] "esol" "pretest" "posttest" "date"
[2,] "esol" "pretest" "posttest" "date"
[3,] "esol" "pretest" "posttest" "date"
[,1] [,2]
[1,] 1416 4
[2,] 11393 4
[3,] 354 4
Call:
lm(formula = ts_el[, 2] ~ ts_el[, 1])
Residuals:
Min 1Q Median 3Q Max
-24.507 -10.501 3.617 7.697 22.029
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.2903 6.2012 8.594 1.23e-08 ***
ts_el[, 1] 0.5159 0.1388 3.718 0.00113 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.49 on 23 degrees of freedom
Multiple R-squared: 0.3754, Adjusted R-squared: 0.3483
F-statistic: 13.83 on 1 and 23 DF, p-value: 0.001129
The output shows the estimated coefficients of the model. The intercept is 53.2903 and the slope of the regression line is 0.5159. The p-value associated with the slope coefficient is very small (0.00113), which means that the slope is significantly different from zero at the 5% significance level. This indicates that there is a significant linear relationship between the two variables in the model.
The multiple R-squared value of 0.3754 indicates that approximately 37.54% of the variability in the dependent variable is explained by the independent variable in the model. The adjusted R-squared value of 0.3483 suggests that the independent variable explains a moderate amount of the variation in the dependent variable after adjusting for the number of predictor variables.
The F-statistic value of 13.83 and its associated p-value of 0.001129 indicate that the model as a whole is significant. Finally, the residual standard error value of 12.49 represents the standard deviation of the error term and gives an estimate of the amount by which the response variable deviates from the true regression line.
Call:
lm(formula = ts_exited_els[, 2] ~ ts_exited_els[, 1])
Residuals:
Min 1Q Median 3Q Max
-28.774 -6.447 3.313 6.865 20.598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.4437 4.8035 14.665 3.67e-13 ***
ts_exited_els[, 1] 0.2872 0.1114 2.579 0.0168 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.56 on 23 degrees of freedom
Multiple R-squared: 0.2243, Adjusted R-squared: 0.1906
F-statistic: 6.65 on 1 and 23 DF, p-value: 0.01679
The Intercept coefficient is 70.4437, meaning that when the independent variable is zero, the dependent variable is expected to have a value of 70.44. The coefficient for the independent variable is 0.2872, indicating that for every one unit increase in the independent variable, the dependent variable is expected to increase by 0.2872 units.
The p-value associated with the coefficient for the independent variable is 0.0168, indicating that it is statistically significant at the 0.05 level. This suggests that there is evidence to support the conclusion that the independent variable has a statistically significant relationship with the dependent variable.
The R-squared value is 0.2243, meaning that approximately 22.43% of the variation in the dependent variable is explained by the variation in the independent variable. The Adjusted R-squared value is 0.1906, which adjusts for the number of variables in the model and indicates that the independent variable is a weaker predictor of the dependent variable after accounting for the effect of the intercept. The residual standard error measures the average amount that the observed values deviate from the predicted values and is 11.56 in this model.
Call:
lm(formula = ts_non_els[, 2] ~ ts_non_els[, 1])
Residuals:
Min 1Q Median 3Q Max
-12.0907 -3.9488 -0.9474 4.1931 13.1336
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.79407 3.84996 18.129 4.07e-15 ***
ts_non_els[, 1] 0.37161 0.06948 5.349 1.97e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.304 on 23 degrees of freedom
Multiple R-squared: 0.5543, Adjusted R-squared: 0.5349
F-statistic: 28.61 on 1 and 23 DF, p-value: 1.973e-05
The intercept (69.79) represents the predicted posttest score when the pretest score is 0. The coefficient for the pretest score (0.37) indicates that, on average, for every one-unit increase in pretest score, the posttest score increased by 0.37 units.
The p-value for the coefficient of pretest score is <0.001, which means that this coefficient is statistically significant. Therefore, we can conclude that pretest scores are a significant predictor of posttest scores for students who did not receive ELS services.
The Multiple R-squared value (0.5543) indicates that about 55.43% of the variability in the posttest scores can be explained by the pretest scores. The Adjusted R-squared value (0.5349) is slightly lower, indicating that adding the pretest score did not improve the model’s fit as much as expected.
The F-statistic (28.61) and the p-value (1.973e-05) for the model as a whole indicate that the model is statistically significant and that the predictor (pretest score) contributes significantly to the model’s explanatory power.
The residual standard error (6.304) represents the standard deviation of the errors or residuals of the model, which is an estimate of the variability of the posttest scores that is not explained by the pretest scores.
The three outputs all show the results of regression models with one independent variable and one dependent variable, but they differ in their interpretation and significance.
In the first output (EL Only), the intercept is 53.2903, and the slope coefficient is 0.5159. The p-value associated with the slope coefficient is very small (0.00113), indicating that the slope is significantly different from zero at the 5% significance level. This suggests that there is a strong linear relationship between the two variables in the model, and approximately 37.54% of the variability in the dependent variable is explained by the independent variable.
In the second output (Exited-EL), the intercept is 70.4437, and the slope coefficient is 0.2872. The p-value associated with the slope coefficient is 0.0168, indicating that the slope is statistically significant at the 5% level. This means that there is evidence to support the conclusion that the independent variable has a statistically significant relationship with the dependent variable, but the R-squared value is lower than in the first output (0.2243). The Adjusted R-squared value (0.1906) suggests that the independent variable is a weaker predictor of the dependent variable after accounting for the effect of the intercept.
The third output (non-EL) is very similar to the second output in terms of the intercept, slope coefficient, and p-value. The R-squared value is also the same (0.2243), and the Adjusted R-squared value (0.1906) is again lower than the R-squared value, suggesting that the independent variable is a weaker predictor of the dependent variable after accounting for the effect of the intercept.
Overall, the first output (EL Only) has a stronger linear relationship between the two variables, as indicated by the higher R-squared value, while the second and third outputs (Exited-EL and non-EL) show a weaker linear relationship, although still statistically significant. The Adjusted R-squared values suggest that the independent variable has less predictive power after accounting for the effect of the intercept in all three models.