A programmer who is obsessed with giving experimenters
a better environment for developing biologically-guided
neural network designs. Author of
an introductory book on the subject titled:
"Netlab Loligo: New Approaches to Neural Network
Simulation". BOOK REVIEWERS ARE NEEDED!
Can you help?
“Certainly, one of the most relevant and obvious characteristics of a present moment is that it goes away, and that characteristic must be represented internally.”
Stated plainly[1], the principle behind multitemporal synapses is that we maintain the blunt “residue” of past lessons in long-term connections, while everything else is learned in the instant. In other words, we re-learn the detailed parts of our responses as we are confronted with each new current situation.[2]
An earlier blog entry makes various attempts—using statically presented explanations—to have readers visualize the concept. For the most part, those attempts seem to miss the mark.
The following video, however, was produced by people who probably have never heard of multitemporal synapses. Their amazing experiment inadvertently does a much better job of relating the concept of multitemporal learning than I ever could with static presentations.
Long face?: What you are viewing in this video may be your immediate responses—driven by long-term connections—before your short-term connection-components have had a chance to form/learn finer “present moment” responses.
There is a temporal component of learning and behavior — Your long-term connections (those that are always there driving your neural responses) are made manifest in the above video, because your short term—learned in the moment—responses are never given a chance to form. The speed at which the video is changing makes it impossible for you to form “normal” internal responses to the present situation. Instead you only see the blunt representations that are impressed in long-term connections. These long-term connections are driving your immediate internal responses. In the case of the above video, the information encoded in the long term connections seems to be related to encoding those features that are most important to recognition and discernment of the face in a world with many similar faces.
Doesn't the theory predict that this should also work with only one flashing face? — Yes it does. And, yes, it does, but to a lesser extent. This may be because formation of short-term responses to relational information—not just the facial features but the differences between the two faces—is more complex and requires more time to form short-term responses. This is a guess, though.
A primary goal for any adaptive system is for it to be able to continuously learn and adapt to the complex environment with which it interacts. Traditional approaches to neural network modeling have had difficulty adapting to the intricacies of each new moment the system encounters.
The crux of the problem is demonstrated in the following two statements, which seem to be mutually exclusive. That is, the only difference between these two statements is that the second assertion has had the word NOT added to it.
Which of these two statements is true?
Every present situation is like past similar situations.
Every present situation is not like past similar situations.
The answer is that they are both true and logically consistent statements, but that is not the problem. It is the solution.
The difficulty has been in how to maintain connections that store enough intricate in-the-moment response details to deal with any contingency that the system may encounter. Conventionally, such details would overwhelm long-term lessons stored in permanent weights.
The underlying theory of learning and behavior, which is discussed here, explains and predicts our experience of a “present moment” in time. It is based on the recognition that learning is ubiquitous. That is, that any interaction with an adaptive entity [3] can be counted on to result in learning.
Connection strengths stored in long-term connection components immediately drive behavior. This makes it easier for short-term weights to quickly form detailed responses to the present situation. Short term weights are able to adapt more quickly on their own, because of the already-starting responses driven by the long-term weights at the same connection-points[4].
Think of this as being like hand-over-hand training of an autistic student. In this analogy, short term weights serve as the student. The beginnings of responses driven by long-term weights act as the teacher prompting the student, or starting to move the student's hands in the right direction. This is done in order to prompt further activity by the student. Of course, long term connections are able to drive internal neural responses that would not be accessible to a literal teacher.
The theory is not concerned so much with the exact mechanisms underlying the formation of the connection. It can be applied equally well whether your underlying model of connection formation is based on Hebb's postulate, or a non-associative mechanism, such as habituation, or sensitization. You could also go with backpropagation, or better yet, use a really cool new extension of associative learning called Influence Based Learning.
The theory/method only specifies that a single connection point be multitemporal. That is, it must have multiple distinct connection-strengths (weights), each having distinct acquisition and retention times.
For example, consider a network in which connection points can each be assigned two weights representing connection strengths at that point. The two weights will be specified for two different time-spans.
A short-term connection-strength (weight), which will learn quickly — The time it takes for this weight to form will be measured in sub-seconds to seconds. It will decay (~14 dB) almost as quickly as it forms, within seconds to a minute. It can be set to learn using, say, Influence Learning, or, if you'd prefer, some more conventional Hebbian-like learning method. In the absence of factors affecting its strength-value, this weight will quickly drop back to zero and stay there. (This is simplified, there's actually more to it).
A good metaphor for the fast strength-values is of boats rowing against a stream. In this metaphor the forward effort applied through the oars represents the stimulus factors affecting weight values, while the water-current pushing back against the boats represents the forgetting/decay mechanism.
A long-term connection-strength (weight), which will learn slowly — This weight will learn from the value of its respective fast weight. For this two-weight example, the long-term weight will learn very slowly and be permanent or nearly permanent — perhaps decaying over a period of many months, simulating: “what you don't use you lose”.
If the respective fast weight is converged for a given stimulus set it will be spending more of its time at a specific value or values. This will cause the slow-learning weight value to gravitate, very slowly, toward those values where the fast weight is spending more time. Otherwise, the slow weights won't see any values in the fast weights for long enough to cause them to change significantly (again, this is simplified a bit).
Every connection that matters to this explanation will have both of these connection-strength mechanisms associated with it.
When an adaptive system first encounters a situation, the slow-learning connection-strengths are the only connections driving neuronal responses [5]. These responses caused by long-term connections are residual (for lack of a better word). We'll assume that they have learned from short-term weight values, which were formed during similar situations experienced a number of times in the past.
In this case, the long-term connections will immediately begin directing the system's responses in the right general way. This general, but immediate, response, in turn, allows the fast weights to more quickly re-learn “in the moment” how to respond to the intricacies of their current situation. This continuous re-learning of short-term responses—in-the-moment—provides the adaptive system with a perception of a present moment in time.
The theory asserts that a biological entity must re-learn to respond to the details of each new situation it encounters. This will be true even if the encounter is nearly identical to situations experienced—and learned—many times in the past.
On first blush, this seems to fly in the face of parsimony. This approach, however, imparts its own set of benefits to the organism, some of which embody considerable parsimony in their own right. One such benefit embodies making the best use of limited physical resources. By employing a forgetful strategy, the organism is able to retain just enough, so that it can easily re-learn detailed responses in the future. This eliminates the need to support the immense resources that would otherwise be required to retain every tiny detail ever encountered.
A strategy of re-learning immediate responses also gives the organism the intrinsic ability to address the problem of constant learning discussed above. That is, the notion that every new situation is at once like, and not like, past situations. This hints at other, perhaps less obvious, benefits, as well, such as the ability to deal with massive amounts of noise and uncertainty in training information [6].
It works because learning is ubiquitous. — Learning seems to be something that is always occurring. That is, it is a phenomenon which can be counted on to occur, like chemical interactions, or gravity.
There is a paradoxical characteristic of the relationship between long- and short-term connections. The longer a connection takes to form, the faster are the responses it provides. That is, the long-term connections cause immediate responses, because they are already there when incoming stimuli first arrive. , Short-term connections, on the other hand, must first form, and so, their responses are slower to onset (milliseconds to many seconds).
“Certainly, one of the most relevant and obvious characteristics of a present moment is that it goes away, and that characteristic must be represented internally.
In order to represent this particular characteristic of a present moment phenomenon, its internal representation must include a component that continually falls away. The faster weights within multitemporal synapses provide this representational facility for the immediate present, while longer-term present moments can be represented by connection-weights that decay more slowly.”
This statement alludes to the notion that the present moment perception could be measured in sub-seconds (as demonstrated in the above video), seconds, minutes, hours, or longer. We have also learned that the brain employs a variety of underlying biological mechanisms for representing present moments of different time-frames.
[pdf] Facing Up to the Problem of Consciousness
I've included this as background, because much of the material in this entry seems (intuitively) to be related somehow to consciousness. This is an introductory level explanation of consciousness and where we are in dealing with the hard problem of explaining it. It is authored by one of the premier thinkers in the field David Chalmers
[1] - E-mail conversations with others have indicated a need for more clarity about these concepts. Part of the problem (okay, most of the problem) may be that I have not done a good job of explaining it. This entry represents my ongoing attempt to do a better job of relating the underlying concepts and how they apply and relate to surrounding concepts.
[2] - Going a little farther out on a limb, what we perceive as remembering may simply be how we experience the process of re-learning, anew, how to deal with the situation currently presented to us. In other words, the sensation of remembering within a present moment, may be caused by newly forming short-term connections, as they develop, with guidance from residual responses generated by existing long-term connections.
[3] - Apparently, based on understanding gained from Quantum Mechanics, all volumetric phenomena are adaptive to a lesser or greater extent.
[4] - The term, “same connection-point” is taken to be abstracted here, and does not necessarily refer to a single terminal or synapse. Two completely different axons from the same source-function to the same destination-function would suffice.
[5] - The word “responses” is used in a general sense here. It refers to all types of responses, including internal events, such as firing neurons
[6] - For an example of this, see U.S. patent # 7,904,398. This patent is based on the concepts discussed in this blog entry. The final embodiment demonstrates this benefit and gives a chart showing just how effective it is at the task of learning from uncertain training cues.