About:
Exploring new approaches to machine hosted
neural-network simulation, and the science
behind them.
Your moderator:
John Repici
A programmer who is obsessed with giving experimenters
a better environment for developing biologically-guided
neural network designs. Author of
an introductory book on the subject titled:
"Netlab Loligo: New Approaches to Neural Network
Simulation". BOOK REVIEWERS ARE NEEDED!
Can you help?
“Certainly, one of the most relevant and obvious characteristics of a present moment is that it goes away, and that characteristic must be represented internally.”
Stated plainly[1], the principle behind multitemporal synapses is that we maintain the blunt “residue” of past lessons in long-term connections, while everything else is quickly forgotten, and learned over again, in the instant. In other words, we re-learn the detailed parts of our responses as we are confronted with each new current situation.[2]
One of the primary benefits of applying this principle, in the form of multitemporal synapses, is a neural network construct that is completely free of the usual problems associated with catastrophic forgetting. When you eliminate catastrophic forgetting from your neural network structure, the practical result is the ability to develop networks that continuously learn from their surroundings, just like their natural counterparts.
One major challenge with conventional neural network models has been in how to maintain connections that store enough intricate in-the-moment response-details to deal with any contingency that the system may encounter. Conventionally, such details would overwhelm long-term lessons stored in permanent connections-weights. This characteristic of conventional neural network models is known as The Stability Plasticity Problem, and is the underlying cause of "catastrophic forgetting."
When an artificial neural network that has learned a training set of responses, then encounters a new response to be learned, the result is usually ‘catastrophic forgetting’ of all earlier learning. Training on the new detail alters connections that are maintained by the network in a holistic (global) fashion. Because of this, it is almost certain that such a change will radically alter the outputs that were desired for the original training set.
A primary goal for any adaptive system, is for it to be able to continuously learn and adapt to the complex environment with which it interacts. Because of catastrophic forgetting, traditional approaches to neural network modeling have had difficulty adapting to the intricacies of each new moment the system encounters.
A given present moment situation will be, at once, very much like, and completely unlike, similar situations that have been previously experienced.
The term "multi-temporal synapse" is the name for a new computational learning model. It sharply simplifies the underlying biology, which is not limited to multitemporal connections within single synapses[3]. For this reason, it is probably more helpful to think in terms of multi-temporal connection strengths, rather than multi-temporal synapses.
The underlying theory of learning and behavior, which is described in this section and the section below, predicts, and may explain, our experience of a “present moment” in time. It is based on the recognition that learning is ubiquitous. That is, that any interaction with an adaptive entity [4] can be counted on to result in learning.
Connection strengths stored in long-term connection components immediately drive behavior. This makes it easier for short-term weights to quickly form detailed responses to the present situation. Short term weights are able to adapt more quickly on their own, because of the already-starting responses driven by the long-term weights at the same connection-points[5].
Think of this as being like hand-over-hand training of an autistic student. In this analogy, short term weights serve as the student. The beginnings of responses driven by long-term weights act as the teacher prompting the student, or starting to move the student's hands in the right direction. This is done in order to prompt further activity by the student. Of course, long term connections are able to drive internal neural responses that would not be accessible to a literal teacher.
The theory is not concerned with the exact mechanisms underlying the formation of the connection. It can be applied equally well whether your underlying learning model is based on Hebb's postulate, or a non-associative mechanism, such as habituation, or sensitization. You could also go with backpropagation, or better yet, use a really cool new extension of associative learning called Influence Based Learning.
The method only specifies that a single connection point be multitemporal. That is, it must have multiple distinct connection-strengths (weights), each having distinct acquisition and retention times.
As an example, consider a network in which connection points can each be assigned two weights representing connection strengths at that point. The two weights will be specified to have two different time-spans. The first, a fast-learning, short-term weight, and the second, a slow-learning, long-term weight.
A short-term connection-strength (weight), which will learn quickly — The time it takes for this weight to form will be measured in sub-seconds to seconds. It will decay (~14 dB) almost as quickly as it forms, within seconds to a minute. It can be set to learn using, say, Influence Learning, or, if you'd prefer, some more conventional Hebbian-like learning method. In the absence of factors affecting its strength-value, this weight will quickly drop back to zero and stay there. (This is simplified, there's actually more to it).
A good metaphor for the fast strength-values is of boats rowing against a stream. In this metaphor the forward effort applied through the oars represents the stimulus factors affecting weight values, while the water-current pushing back against the boats represents the forgetting/decay mechanism.
A long-term connection-strength (weight), which will learn slowly — This weight will learn from the value of its respective fast weight. For this two-weight example, the long-term weight will learn very slowly and be permanent or nearly permanent — perhaps decaying over a period of many months, simulating: “what you don't use you lose”.
If the respective fast weight is converged for a given stimulus set it will be spending more of its time at a specific value or values. For a given stimulus-response vector, this will cause the slow-learning weight value to gravitate, very slowly, toward those values where the fast weight is spending more time. Otherwise, the slow weights won't see any values in the fast weights for long enough to cause them to change significantly (again, this is simplified a bit).
For the sake of this explanation, every connection that matters will have both of these connection-strength weights associated with it.
When an adaptive system first encounters a situation similar to one previously experienced, the slow-learning connection-strengths will have some, small, weak connectivity, learned during those previous experiences. At this first blush, these slow weights are the only connections driving neuronal responses [6]. These responses, caused by long-term connections, are residual (for lack of a better word).
In this case, the long-term connections will immediately begin directing the system's responses in the right general way. This residual, but immediate, response, in turn, allows the fast weights to more quickly re-learn “in the moment” how to respond to the intricacies of their current situation. This continuous re-learning of short-term responses—in-the-moment—provides the adaptive system with a perception of a present moment in time.
The theory asserts that a biological entity must re-learn to respond to the details of each new situation it encounters. This will be true even if the encounter is nearly identical to situations experienced—and learned—many times in the past.
On first blush, this seems to fly in the face of parsimony. This approach, however, imparts its own set of benefits to the organism, some of which embody considerable parsimony in their own right. One such benefit embodies making the best use of limited physical resources. By employing a forgetful strategy, the organism is able to retain just enough, so that it can easily re-learn detailed responses in the future. This eliminates the need to support the immense resources that would otherwise be required to retain every tiny detail ever encountered.
A strategy of re-learning immediate responses also gives the organism the intrinsic ability to address the problem of constant learning discussed above. That is, the notion that every new situation is at once like, and not like, past situations. This hints at other, perhaps less obvious, benefits, as well, such as the ability to deal with massive amounts of noise and uncertainty in training information [7].
It works because learning is ubiquitous. — Learning seems to be something that is always occurring. That is, it is a phenomenon which can be counted on to occur, like chemical interactions, or gravity.
There is a paradoxical characteristic of the relationship between long- and short-term connections. The longer a connection takes to form, and decay, the faster are the responses it provides. That is, the long-term connections cause immediate responses, because they are already there when incoming stimuli first arrive. , Short-term connections, on the other hand, must first form, and so, their responses are slower to onset (milliseconds to many seconds).
“Certainly, one of the most relevant and obvious characteristics of a present moment is that it goes away, and that characteristic must be represented internally.
In order to represent this particular characteristic of a present moment phenomenon, its internal representation must include a component that continually falls away. The faster weights within multitemporal synapses provide this representational facility for the immediate present, while longer-term present moments can be represented by connection-weights that decay more slowly.”
This statement alludes to the notion that the present moment perception could be measured in sub-seconds (as demonstrated in the "Interesting Visualization" below), seconds, minutes, hours, or longer. We have also learned that the brain employs a variety of underlying biological mechanisms for representing present moments of different time-frames.
An earlier blog entry makes various attempts—using statically presented explanations—to have readers visualize the concept. For the most part, those attempts seem to miss the mark.
The following video, however, was produced by people who have probably never heard of multitemporal synapses. Their amazing experiment inadvertently does a much better job of relating the concept of multitemporal learning than I ever could with static explanations.
Long face?: What you are viewing in this video may be your immediate responses—driven by long-term connections—before your short-term connection-components have had a chance to form/learn finer “present moment” responses.
The theory of multitemporal learning states that interactive behavior's essential characteristic is that it is always going away. Further, it states that this characteristic must be represented within any interactive/adapting entity.
There is a temporal component of learning and behavior — Your long-term connections (those that are always there driving your neural responses) are made manifest in the above video, because your short term—learned in the moment—responses are never given a chance to form. The speed at which the video is changing makes it impossible for you to form “normal” internal responses to the present situation. Instead you only see the blunt representations that are impressed in long-term connections. These long-term connections are driving your immediate internal responses. In the case of the above video, the information encoded in the long term connections seems to be related to encoding those features that are most important to recognition and discernment of the face in a world with many similar faces.
Doesn't the theory predict that this should also work with only one flashing face? — Yes it does. And, yes, it does, but to a lesser extent. This may be because formation of short-term responses to relational information—not just the facial features but the differences between the two faces—is more complex and requires more time to form short-term responses. This is a guess, though.
UPDATE: Time may not be the only determiner in this video — A second video at the lab that produced this one, flashes faces much more slowly with similar results. You can also pause their new video and maintain the 'distorted' perception for a very long time — if you keep the image(s) in your peripheral vision. [8]
This helps to highlight the point, that in biological terms, the multi-temporal components cited in this theory are not limited to individual synapses, or even to individual neurons. They may involve temporally diverse acquisition and retention rates spread out over multiple neurons and sub-networks within the brain.
On the other hand, it may just be that neurons responsible for processing the peripheral areas of vision have much sparser (or non-existent) short-term components, while their long-term connections are correspondingly more genetically hard-wired.
[pdf] Facing Up to the Problem of Consciousness
I've included this as background, because much of the material in this entry seems (intuitively) to be related somehow to consciousness. This is an introductory level explanation of consciousness and where we are in dealing with the hard problem of explaining it. It is authored by one of the premier thinkers in the field David Chalmers
[1] - E-mail conversations with others have indicated a need for more clarity about these concepts. Part of the problem (okay, most of the problem) may be that I have not done a good job of explaining it. This entry represents my ongoing attempt to do a better job of relating the underlying concepts and how they apply and relate to surrounding concepts.
[2] - Going a little farther out on a limb, what we perceive as remembering may simply be how we experience the process of re-learning, anew, how to deal with the situation currently presented to us. In other words, the sensation of remembering within a present moment, may be caused by newly forming short-term connections, as they develop, with guidance from fast residual responses generated by existing long-term connections.
[3] - In the underlying biology, the different rates of acquisition and forgetting can certainly be at the same synapse. There is a great deal of observational support for this, but it is not the only observed cause for the phenomenon. Alternatively, different learning and retention rates may be based on differences between different synapses on the same neuron, or differences in learning and retention times between closely related neurons. Widening the scope even further, the concept of "closely related" neurons may be based on vicinity, or it may be based on function or effect, just to name two examples. See also Pathfinding, which produces long-term learning based on the formation of new axons and synapses.
[4] - Apparently, based on understanding gained from Quantum Mechanics, all volumetric phenomena are interactively adaptive to a lesser or greater extent.
[5] - The term, “same connection-point” is taken to be abstracted here, and does not necessarily refer to a single terminal or synapse. Two completely different axons from the same source-function to the same destination-function would suffice.
[6] - The word “responses” is used in a general sense here. It refers to all types of responses, including internal events, such as firing neurons
[7] - For an example of this, see U.S. patent # 7,904,398. This patent is based on the concepts discussed in this blog entry. The final embodiment demonstrates this benefit and gives a chart showing just how effective it is at the task of learning from uncertain training cues.
[8] - They are currently questioning whether the result is due to processing the differences between the two faces for the purpose of accentuating distinguishing features. Because their second video also works well if you hold a hand over one side, I am skeptical of that interpretation.
Editing note: - This article was originally posted in November of 2011. It has been edited and reorganized to be more readable in this re-posted edition.