“It seems as simple as A, B, C. They know the direction by compass, the distance, and the speed. I should not call it anything more than mathematical certainty.”
The sea wolf, Jack London

Introduction

The Network-Centric Warfare Technology program element PE 0603766E addresses high payoff opportunities to develop and rapidly mature advanced technologies and systems required for today’s network-centric warfare concepts. It is imperative for the future of the U.S. forces to operate flawlessly with each other, regardless of which services and systems are involved in any particular mission.

The overarching goal of this program element is to enable technologies at all levels, regardless of service component, to operate as one system.

Research question

We want to estimate correlations among all parts of PE 0603766E budget and predict future budget trends.

library(plotly)
load("netcentric.dat")
load("nc.dat")
f <- list(
  family = "Courier New, monospace",
  size = 12,
  color = "#7f7f7f"
)



plot_ly(data=nc4,x = Year, y = money, group = PE, name="Total", type = "scatter") %>% 
  layout(title = "DARPA NETWORK CENTRIC WARFARE TECHNOLOGY PE 0603766E", font = f, yaxis = list(title = "Million dollars" ))

ANOVA

Analysis of variance of R & D funding programs PE 0603766E data shows that the classified part of the program (“Classified”) significantly differs from other parts of PE 0603766E.

net<-stack(netcentric,select = c("Classified","Maritime","Joint"))
plot_ly(net, y = values, x=ind,type = "box",color = ind) %>%
  layout(title = "DARPA NETWORK CENTRIC WARFARE TECHNOLOGY PE 0603766E", font = f, yaxis = list(title = "Million dollars" ), xaxis = list(title = "Program" ))
## Warning in doColorRamp(colorMatrix, x, alpha, ifelse(is.na(na.color),
## "", : '.Random.seed' не является целочисленным вектором, он типа 'NULL', и
## поэтому пропущен
summary(aov(data=net,values~ind))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ind          2  17418    8709   17.45 3.86e-06 ***
## Residuals   39  19462     499                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear and nonlinear regression

In general, as is easily seen on the chart, all parts are strongly correlated with the total financing.

library(car)
## Warning: package 'car' was built under R version 3.2.5
scatterplotMatrix(netcentric[,-1],smoother = F,col = "red",ellipse = T)

The constructed linear regression model accurately describes the correlation of total costs and expenses for the secret part of 0603766E PE. We can be 95% confident that the cost of the Secret part of the R & D has reached 55% of Total minus $ 12.5 million over the period 2004-2017 years: \(y = -12.45398 + 0.5485 * x\). This model takes into account 91% of the variance, while T-test gives a high degree of significance of the parameters. In other words, DARPA has spent and will spend more than half of the total amount allocated to R & D to develop 0603766E secret projects in this area. As you can see, the trend is very stable.

fit<-lm(Classified~Total,data = netcentric)
summary(fit)
## 
## Call:
## lm(formula = Classified ~ Total, data = netcentric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.949  -5.219  -2.745   8.813  15.422 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.45398    8.91679  -1.397    0.188    
## Total         0.54850    0.04741  11.568 7.27e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.812 on 12 degrees of freedom
## Multiple R-squared:  0.9177, Adjusted R-squared:  0.9109 
## F-statistic: 133.8 on 1 and 12 DF,  p-value: 7.266e-08
par(mfrow=c(2,2))
plot(fit)

fit22<-lm(Classified~poly(Total,2),data = netcentric)
summary(fit22)
## 
## Call:
## lm(formula = Classified ~ poly(Total, 2), data = netcentric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.6277  -4.9569  -0.8381   6.6356  12.1600 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       86.136      2.404   35.83 9.64e-13 ***
## poly(Total, 2)1  113.512      8.996   12.62 6.93e-08 ***
## poly(Total, 2)2   16.284      8.996    1.81   0.0976 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.996 on 11 degrees of freedom
## Multiple R-squared:  0.9366, Adjusted R-squared:  0.9251 
## F-statistic: 81.25 on 2 and 11 DF,  p-value: 2.58e-07
plot(fit22)

par(mfrow=c(1,1))
plot(netcentric$Total,netcentric$Classified,main = "Linear versus nonlinear trend",xlab = "Total",ylab = "Classified")
lines(netcentric$Total,predict(fit),lty=2,col="blue",lwd=3)
lines(netcentric$Total,predict(fit22),lty=2,col="red",lwd=3)
grid()

cor(netcentric$Classified,predict(fit))
## [1] 0.9579717
cor(netcentric$Classified,predict(fit22))
## [1] 0.9677792
anova(fit,fit22)
## Analysis of Variance Table
## 
## Model 1: Classified ~ Total
## Model 2: Classified ~ poly(Total, 2)
##   Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
## 1     12 1155.39                              
## 2     11  890.21  1    265.18 3.2767 0.09764 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(fit,fit22)
##       df      AIC
## fit    3 107.5141
## fit22  4 105.8639

As we see nonlinear model fits best according to the results of tests.

A linear extrapolation of trend

Now we try to evaluate how R & D PE 0603766E will be funded for the next 10 years. For this purpose we use a linear extrapolation of the trend. We assume that the linear trend is identical to the conservative scenario. As a result, we find that the Total cost of R & D PE 0603766E in 2022 would be $327 million, compared to $208 million in 2012.

fit.2<-lm(Total~Year,data = netcentric)
summary(fit.2)
## 
## Call:
## lm(formula = Total ~ Year, data = netcentric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.723 -16.914   3.714  14.264  33.026 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -25610.396   2825.905  -9.063 1.03e-06 ***
## Year            12.828      1.406   9.126 9.53e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.2 on 12 degrees of freedom
## Multiple R-squared:  0.8741, Adjusted R-squared:  0.8636 
## F-statistic: 83.29 on 1 and 12 DF,  p-value: 9.527e-07
predict(fit.2,data.frame(Year=2022),interval = "conf")
##        fit      lwr      upr
## 1 327.2643 289.9449 364.5838
ggplot(netcentric, aes(x=Year, y=Total)) +
  geom_point(shape=1) +    
  geom_smooth(method=lm)

par(mfrow=c(2,2))
plot(fit)

Conclusions

It now remains only to substitute the value obtained in the linear and nonlinear regression equations and get an answer by predict

predict(fit,data.frame(Total=327),interval = "conf")
##        fit      lwr      upr
## 1 166.9049 150.6549 183.1548
predict(fit22,data.frame(Total=327),interval = "conf")
##        fit      lwr      upr
## 1 209.2065 155.6152 262.7979

So we got the mean values for two models as \(166.9049\) million dollars for linear regression and \(209.2065\) million dollars for nonlinear regression will be spent on the Secret part of R & D PE 0603766E of DARPA in 2022.