Evaluating equation (3.4) with \(N1,2,\hat{x}_{1,2} and s_{1,2}\) gives a t-value of \(t_{1,2}=-2.740403\). The following R-snippet shows how the two samples were generated and how the t-value was calculated with the user function \(“t.val()”\) (note that both samples truly differ by \(Δμ=5\) with “true” uncertainty \(σ=5\)).

## Old(N=10): 133.08 +/- 4.98
## New(N=10): 139.41 +/- 5.34
## t-value= -2.740403

As such, the value of \(t=-2.74\) does not tell much, but it would tell more if the distribution of t were known under the assumption that both samples come from the same distribution. This distribution can easily be obtained by Monte Carlo simulation as shown with the following R-code. The code generates 100000 t-values by sampling from a normal distribution, \(N(μ=0,σ^2=1)\), with sample size Nold,new=10 and shows the resulting distribution as histogram, figure 3.10. It then counts the number of t-values exeeding of falling short of \(±2.74\) (that is \(N|abs(t)>2.74)\) and reports this as a percentage value, \(α=0.01312\).

Monte Carlo t-distribution of sample size \(N=10\) obtained by 100000 resampling steps from \(∼N(0,1)\); the test value \(±2.74\) in the present example is marked red in the plot.

## 
##   FALSE    TRUE 
## 0.98688 0.01312

Of course, there are ready-made functions for performing t-tests in all statistical software packages, in R, e.g., the base function t.test():

## 
##  Two Sample t-test
## 
## data:  old and new
## t = -2.7404, df = 18, p-value = 0.01344
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.173926  -1.475941
## sample estimates:
## mean of x mean of y 
##  133.0842  139.4091

The t-tests reports an \(α-value^14\) of \(0.01344\) in good agreement with the result obtained from Monte Carlo simulation. In addition the parameter df in the output reports the degrees of freedom, \(df=18\), of the test. The t-test is based on two “internal” parameters,\(\bar{X}_1\) and \(\bar{X}_2\) and these two parameters must be deducted from the total sample size \(N=20\), hence \(df=1815\). By convention a test result is called significant in statistics when \(0.01≤α≤0.05\) and highly significant when \(α<0.01\). When a test result is jugded significant given α, H0 is rejected in favour of H1 while silently accepting that in \(100∗α%\) of the cases \(H0\) will be true.

## 
##      Two-sample t test power calculation 
## 
##               n = 17.84712
##               d = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater
## 
## NOTE: n is number in *each* group

Least Squares problems can be solved conveniently with statistical software, making Ordinary Least Squares (OLS) to the most widely used statistical method in science. In R the lm() function provides all functionalities for solving OLS problems, e.g. with the R example code

## 
## Call:
## lm(formula = y ~ x, data = x.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0306 -0.5751 -0.2109  0.5522  2.7050 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.6800     0.3273   8.188 6.51e-09 ***
## x             5.0094     0.1124  44.562  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9189 on 28 degrees of freedom
## Multiple R-squared:  0.9861, Adjusted R-squared:  0.9856 
## F-statistic:  1986 on 1 and 28 DF,  p-value: < 2.2e-16
##       a0       a1 
## 2.680009 5.009426
##           y         x
## y 1.0000000 0.9930235
## x 0.9930235 1.0000000
## [1] 0.9188509
## [1] 0.9860958
## [1] "Df"      "Sum Sq"  "Mean Sq" "F value" "Pr(>F)"
## [1] 0.9860958
## [1] 1985.773
## [1] 0

The R-code for estimating and testing the contrast (new-old) with OLS then becomes

## NULL
## 
## Call:
## lm(formula = y ~ treat, data = x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.815 -3.365 -0.930  3.495 12.672 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  139.408      1.632  85.404   <2e-16 ***
## treatold      -6.323      2.308  -2.739   0.0135 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.162 on 18 degrees of freedom
## Multiple R-squared:  0.2942, Adjusted R-squared:  0.255 
## F-statistic: 7.502 on 1 and 18 DF,  p-value: 0.01348

Outliers are a reality in empirical data and might reflect a lack of experimental control or alternatively model inadequacy, i.e. a non-linearity not approriately described by the empirical model. Depending on the objective of the \(project^20\), outlier should be checked by repeating the outlying experiments.

Effects of outlying observations on OLS estimation.

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.4949  -1.3215   0.6568   3.2538  16.0924 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.6366     2.6802   0.984 0.333666    
## x             3.8858     0.9205   4.221 0.000232 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.524 on 28 degrees of freedom
## Multiple R-squared:  0.3889, Adjusted R-squared:  0.3671 
## F-statistic: 17.82 on 1 and 28 DF,  p-value: 0.0002315