There has been much publicity about the ability of artificial neural networks to learn and generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software. This paper explains what neural networks are, translates neural network jargon into statistical jargon, and shows the relationships between neural networks and statistical models such as generalized linear models, maximum redundancy analysis, projection pursuit, and cluster analysis.
Neural networks are a wide class of flexible nonlinear regression and discriminant models, data reduction models, and nonlinear dynamical systems. They consist of an often large number of “neurons,” i.e. simple linear or nonlinear computing elements, interconnected in often complex ways and often organized into layers.
Artificial neural networks are used in three main ways: - as models of biological nervous systems and “intelligence” - as real-time adaptive signal processors or controllers implemented in hardware for applications such as robots - as data analytic methods
This blog is concerned with artificial neural networks for data analysis. The development of artificial neural networks arose from the attempt to simulate biological nervous systems by combining many simple computing elements (neurons) into a highly interconnected system and hoping that complex phenomena such as “intelligence” would emerge as the result of self-organization or learning. The alleged potential intelligence of neural networks led to much research in implementing artificial neural networks in hardware such as VLSI chips. The literature remains confused as to whether artificial neural networks are supposed to be realistic biological models or practical machines. For data analysis, biological plausibility and hardware implementability are irrelevant. The alleged intelligence of artificial neural networks is a matter of dispute. Artificial neural networks rarely have more than a few hundred or a few thousand neurons, while the human brain has about one hundred billion neurons. Networks comparable to a human brain in complexity are still far beyond the capacity of the fastest, most highly parallel computers in existence.
Artificial neural networks, like many statistical methods, are capable of processing vast amounts of data and making predictions that are sometimes surprisingly accurate; this does not make them “intelligent” in the usual sense of the word. Artificial neural networks “learn” in much the same way that many statistical algorithms do estimation, but usually much more slowly than statistical algorithms.
When neural networks (henceforth NNs, with the adjective “artificial” implied) are used for data analysis, it is important to distinguish between NN models and NN algorithms.
Many NN models are similar or identical to popular statistical techniques such as generalized linear models, polynomial regression, nonparametric regression and discriminant analysis, projection pursuit regression, principal components, and cluster analysis, especially where the emphasis is on prediction of complicated phenomenarather than on explanation. These NN models can be very useful. There are also a few NN models, such as counterpropagation, learning vector quantization, and self-organizing maps, that have no precise statistical equivalent but may be useful for data analysis.
Many NN researchers are engineers, physicists, neurophysiologists, psychologists, or computer scientists who know little about statistics and nonlinear optimization. NN researchers routinely reinvent methods that have been known in the statistical or mathematical literature for decades or centuries, but they often fail to understand how these methods work . The common implementations of NNs are based on biological or engineering criteria, such as how easy it is to fit the net on a chip, rather than on well-established statistical and optimization criteria.
Standard NN learning algorithms are inefficient because they are designed to be implemented on massively parallel computers but are, in fact, usually implemented on common serial computers such as ordinary PCs. On a serial computer, NNs can be trained more efficiently by standard numerical optimization algorithms such as those used for nonlinear regression. Nonlinear regression algorithms can fit most NN models orders of magnitude faster than the standard NN algorithms. Another reason for the inefficiency of NN algorithms is that they are often designed for situations where the data are not stored, but each observation is available transiently in a real-time environment. Transient data are inappropriate for most types of statistical analysis. In statistical applications, the data are usually stored and are repeatedly accessible, so statistical algorithms can be faster and more stable than NN algorithms. Hence, for most practical data analysis applications, the usual NN algorithms are not useful. You do not need to know anything about NN training methods such as backpropagation to use NNs.