Networks and Learning

Barry Kort

Lead Scientist

MITRE Network Center

On Wednesday and Thursday, November 15-16, 1989, I attended the MIT Industrial Liaison Program entitled "Networks and Learning". Here is my report...

Professor Thomaso Poggio of the MIT Department of Brain and Cognitive Sciences opened the symposium by reviewing the history of advances in the field. About every 20 years there is an "epidemic" of activity lasting about 12 years, followed by about 8 years of inactivity. Sixty years ago there began the Gestalt school in Europe. Forty years ago Cybernetics emerged in the US. Twenty years ago Perceptrons generated a flurry of research. Today, Neural Networks represent the latest breakthrough in this series. [Neural Networks are highly interconnected structures of relatively simple units, with algebraic connection weights.]

Professor Leon Cooper, Co-Director of the Center for Neural Science at Brown University, spoke on "Neural Networks in Real-World Applications." Neural Nets learn from examples. Give them lots of examples of Input/Output pairs, and they build a smooth mapping from the input space to the output space. Neural Nets work best when the rules are vague or unknown. The classical 3-stage neural net makes a good classifier. It can divide up the input space into arbitrarily shaped regions. At first the network just divides the space in halves and quarters, using straight line boundaries ("hyperplanes" for the mathematically minded). Eventually (and with considerable training) the network can form arbitrarily curved boundaries to achieve arbitrarily general classification. Given enough of the critical features upon which to reach a decision, networks have been able to recognize and categorize diseased hearts from heartbeat patterns. With a sufficiently rich supply of clues, the accuracy of such classifiers can approach 100%. Accuracy depends on the sample length of the heartbeat pattern‹a hurried decision is an error-prone decision.

Professor Ron Rivest, Associate Director of MIT's Laboratory for Computer Science, surveyed "The Theoretical Aspects of Learning and Networks." He addresses the question, "How do we discover good methods of solution for the problems we wish to solve?" In studying Neural Networks, he notes their strengths and characteristics: learning from example, expressiveness, computational complexity, sample space complexity, learning a mapping. The fundamental unit of a neural network is a linear adder followed by a threshold trigger. If the algebraic sum of the input signals exceeds threshold, the output signal fires. Neural nets need not be constrained to boolean signals (zero/one), but can handle continuous analog signal levels. And the threshold trigger can be relaxed to an S-shaped response. Rivest tells us that any continuous function mapping the unit interval [-1, 1] into itself can be approximated arbitrarily well with a 3-stage neural network. (The theorem extends to the Cartesian product: the mapping can be from an m-fold unit hypercube into an n-fold unit hypercube.) Training the neural net amounts to finding the coefficients which minimize the error between the examples and the neural network's approximation. The so-called Error Backpropagation algorithm is mathematically equivalent to least squares curve fitting using steepest descent. While this method works, it can be very slow. In fact, training a 3-stage neural network is an NP-complete problem‹the work increases exponentially with the size of the network. The classical solution to this dilemma is to decompose the problem down into smaller subproblems, each solvable by a smaller system. Open issues in neural network technology include the incorporation of prior domain knowledge, and the inapplicability of powerful learning methods such as Socratic-style guided discovery and experimentation. There is a need to merge the statistical paradigm of neural networks with the more traditional knowledge representation techniques of analytical and symbolic approaches.

Professor Terry Sejnowski, Director of the Computational Neurobiology Laboratory at the Salk Institute for Biological Studies, gave a captivating lecture on "Learning Algorithms in the Brain." Terry, who studies biological neural networks, has witnessed the successful "reverse engineering" of several complete systems. The Vestibular Occular Reflex is the feedforward circuit from the semicircular canals of the inner ear to the eye muscles which allow us to fixate on a target even as we move and bob our heads. If you shake your head as you read this sentence, your eyes can remain fixed on the text. This very old circuit has been around for hundreds of millions of years, going back to our reptilian ancestors. It is found in the brain stem, and operates with only a 7-ms delay. (Tracking a moving target is more complex, requiring a feedback circuit that taps into the higher cognitive centers.) The Vestibular Occular Reflex appears to be overdesigned, generating opposing signals which at first appear to serve no function. Only last week, a veteran researcher finally explained how the dynamic tension between opposing signals allows the long-term adaptation to growth of the body and other factors (such as new eyeglasses) which could otherwise defeat the performance of the reflex. Terry also described the operation of one of the simplest neurons, found in the hippocampus, which mediates long-term memory. The Hebbs Synapse is one that undergoes a physiological change when the neuron happens to fire during simultaneous occurrence of stimuli representing the input/output pair of a training sample. After the physiological change, the neuron becomes permanently sensitized to the input stimulus. The Hebbs Synapse would seem to be the foundation for superstitious learning.

After a refreshing lunch of cold roast beef and warm conversation, Professor Thomaso Poggio returned to the podium to speak on "Networks for Learning: A Vision Application." He began by reviewing the theoretical result that equates the operation of a 2-layer neural network to linear regression. To achieve polynomial regression, one needs a 3-layer neural network. Such a neural net can reconstruct a (smooth) hypersurface from sparse data. (An example of a non-smooth map would be a telephone directory which maps names into numbers. No smooth interpolation will enable you to estimate the telephone number of someone whose name is not in the directory.) Professor Poggio explored the deep connection between classical curve fitting and 3-stage neural networks. The architecture of the neural net corresponds to the so-called HyperBasis Functions which are fitted to the training data. A particularly simple but convenient basis function is a gaussian centered around each sample x-value. The interpolated y-value is then just the average of all the sample y-values weighted by their gaussian multipliers. In other words, the nearest neighbors to x are averaged to estimate the output, y(x). For smooth maps, such a scheme works well.

Dr. Richard Lippmann of the MIT Lincoln Laboratory spoke on "Neural Network Pattern Classifiers for Speech Recognition." Historically, classification has progressed through four stages‹Probabalistic Classifiers using linear discriminant functions, Hyperplane Separation using piecewise linear boundaries, Receptive Field Classification using radial basis functions, and the new Exemplar Method using multilayer Perceptrons and feature maps. Surveying and comparing alternate architectures and algorithms for speech recognition, Dr. Lippmann, reviewed the diversity of techniques, comparing results, accuracy, speed, and computational resources required. From the best to the worst, they can differ by orders of magnitude in cost and performance.

Professor Michael Jordan of MIT's Department of Brain and Cognitive Science spoke on "Adaptive Networks for Motor Control and Robotics." There has been much progress in this field over the last five years, but neural nets do not represent a revolutionary breakthrough. The "Inverse Problem" in control theory is classical: find the control sequence which will drive the system from the current state to the goal state. It is well known from Cybernetics that the controller must compute (directly or recursively) an inverse model of the forward system. This is equivalent to the problem of diagnosing cause from effect. The classical solution is to build a model of the forward system and let the controller learn the inverse through unsupervised learning (playing with the model). The learning proceeds incrementally, corresponding to backpropagation or gradient descent based on the transposed Jacobian (first derivative). This is essentially how humans learn to fly and drive using simulators.

Danny Hillis, Founding Scientist of Thinking Machines Corporation, captured the audience with a spellbinding talk on "Intelligence as an Emergent Phenomenon." Danny began with a survey of computational problems well-suited to massively parallel architectures--matrix algebra and parallel search. He uses the biological metaphor of evolution as his model for massively parallel computation and search. Since the evolution of intelligence is not studied as much as the engineering approach (divide and conquer) or the biological approach (reverse engineer nature's best ideas), Danny chose to apply his connection machine to the exploration of evolutionary processes. He invented a mathematical organism (called a "ramp") which seeks to evolve and perfect itself. A population cloud of these ramps inhabits his connection machine, mutating, evolving, and competing for survival of the fittest. Danny's color videos show the evolution of the species under different circumstances. He found that the steady state did not generally lead to a 100 percent population of perfect ramps. Rather 2 or more immiscible populations of suboptimal ramps formed pockets with seething boundaries. He then introduced a species of parasites which attacked ramps at their weakest points, so that stable populations would eventually succumb to a destructive epidemic. The parasites did not clear the way for the emergence of perfect and immune ramps. Rather, the populations cycled through a roiling rise and fall of suboptimal ramps, still sequestered into camps of Gog and Magog. The eerie resemblance to modern geopolitics and classical mythology was palpable and profound.

Professor John Wyatt of the MIT Department of Electrical Engineering and Computer Science closed the first day's program with a talk on "Analog VLSI Hardware for Early Vision: Parallel Distributed Computation without Learning." Professor Wyatt's students are building analog devices that can be stimulated by focusing a scene image onto the surface of a chip. His devices for image processing use low precision (about 8 bits) analog processing based on the inherent bulk properties of silicon. His goal is to produce chips costing $4.95. One such chip can find the fixed point when the scene is zoomed. (Say you are approaching the back of a slow moving truck. As the back of the truck looms larger in your field of view, the fixed point in the scene corresponds to the point of impact if you fail to slow down.) Identification of the coordinates of the fixed point and the estimated time to impact are the output of this chip. Charged-coupled devices and other technologies are being transformed into such image processing devices as stereo depth estimation, image smoothing and segmentation, and motion vision.

The second day of the symposium focused on the Japanese, European, and American perspectives for the development and application of neural nets.

Professor Shun-Ichi Amari of the Department of Mathematical Engineering and Information Physics at the University of Tokyo explored the mathematical theory of neural nets. Whereas conventional computers operate on symbols using programmed sequential logic, neural nets correspond more to intuitive styles of information processing‹pattern recognition, dynamic parallel processing, and learning. Professor Amari explored neural network operation in terms of mathematical mapping theory and fixed points. Here, the fixed points represent the set of weights corresponding to the stable state after extensive training.

Dr. Wolfram Buttner of Siemens Corporate Research and Development discussed several initiatives in Europe to develop early commercial applications of neural net technology. Workpiece recognition in the robotic factory and classification of stimuli into categories are recurring themes here. There is also interest in unsupervised learning (playing with models or exploring complex environments), decision support systems (modeling, prediction, diagnosis, scenario analysis, optimal decision making with imperfect information) and computer languages for neural network architectures. Dr. Buttner described NeuroPascal, an extension to Pascal for parallel neurocomputing architectures.

Dr. Scott Kirkpatrick, Manager of Workstation Design at IBM's Thomas J. Watson Research Center, explored numerous potential applications of neural nets as information processing elements. They can be viewed as filters, transformers, classifiers, and predictors. Commercial applications include routine processing of high-volume data streams such as credit-checking and programmed arbitrage trading. They are also well-suited to adaptive equalization, echo cancellation, and other signal processing tasks. SAIC is using them in its automated luggage inspection system to recognize the telltale signs of suspect contents of checked luggage. Neurogammon 1.0, which took two years to build, plays a mean game of backgammon, beating all other machines and giving world class humans a run for their money. Hard problems for neural nets include 3D object recognition in complex scenes, natural language understanding, and "database mining" (theory construction). Today's commercially viable applications of neural nets could only support about 200 people. It will be many years before neurocomputing becomes a profitable industry.

Marvin Minsky, MIT's Donner Professor of Science, gave an entertaining talk on "Future Models". The human brain has over 400 specialized architectures, and is equivalent in capacity to about 200 Connection Machines (Model CM-2). There are about 2000 data buses interconnecting the various departments of the brain. As one moves up the hierarchy of information processing, one begins at Sensory-Motor and advances through Concrete Thinking, Operational Thinking, "Other Stages", and arrives at Formal Thinking as the highest cognitive stage. A human subject matter expert who is a world class master in his field has about 20-50 thousand discrete "chunks" of knowledge. Among the computational paradigms found in the brain, there are Space Frames (for visual information), Script Frames (for stories), Trans-Frames (for mapping between frames), K-Lines (recurring thematic threads in a person's lifetime experiential learning curve), Semantic Networks (for vocabulary and ideas), Trees (for hierarchical and taxonomical knowledge), and Rule-Based Systems (for bureaucrats). Minsky's theory is summarized in his latest book, Society of Mind. Results with neural networks solving "interesting" problems such as playing backgammon or doing freshman calculus reveal that we don't always know which problems are hard. It appears that a problem is hard until somebody shows an easy way to solve it. After that, it's deemed trivial. As to intelligence, Minsky says that humans are good at what humans do. He says, "A frog is very good at catching flies. And you're not."

The afternoon panel discussion, led by Patrick Winston, provided the speakers and audience another chance to visit and revisit topics of interest. That commercial neural networks are not solving profoundly deep and important problems was a source of dismay to some, who thought that we had enough programmed trading and credit checking going on already, and we don't need more robots turning down our loans and sending the stock markets into instability.

The deeper significance of the symposium is that research in neural networks is stimulating the field of brain and cognitive science and giving us new insights into who we are, how we came to be that way, and where we can go, if we use our higher cognitive functions to best advantage.

Barry Kort

This report originally appeared in November 1989 on UseNet and was reprinted in AI Magazine, Volume 11, Number 3, Fall 1990.