Log in

No account? Create an account

Look Back | Look Forward

K&A Analysts: You'll find this fascinating. Promoters can ignore this and go back to the party!

One of my beloved mentors, Brian Klemmer, taught people that 99% of our decisions are actually being made for us. Over time and in various frameworks, the reason has been called programs, tapes, baggage, life experiences, sunglasses, and a host of others - but the basic concept is that we are being driven by our subconscious belief systems.

What captured my attention today was an article over at Edge, publishing 191 responses to its annual open ended question. This year's was "What is your favorite deep, elegant, or beautiful explanation?" Among the top hits, Darwin's natural selection and Einstein's general relativity. The one that jumped out and bit me, however, was a response by Terrence J. Sejnowski. Sejnowski is a Computational Neuroscientist and Francis Crick Professor at the Salk Institute and coauthor of The Computational Brain.

I reproduce his submission here in its entirety:

Nature is More Clever Than We Are

We have the clear impression that our deliberative mind makes the most important decisions in our life: What work we do, where we live, who we marry. But contrary to this belief the biological evidence points toward a decision process in an ancient brain system called the basal ganglia, brain circuits that consciousness cannot access. Nonetheless, the mind dutifully makes up plausible explanations for the decisions.

The scientific trail that led to this conclusion began with honeybees. Worker bees forage the spring fields for nectar, which they identify with the color, fragrance and shape of a flower. The learning circuit in the bee brain converges on VUMmx1, a single neuron that receives the sensory input and, a bit later, the value of the nectar, and learns to predict the nectar value of that flower the next time the bee encounters it. The delay is important because the key is prediction, rather than a simple association. This is also the central core of temporal-difference (TD) learning, which can learn a sequence of decisions leading to a goal and is particularly effective in uncertain environments like the world we live in.

Buried deep in your midbrain there is a small collection of neurons, found in our earliest vertebrate ancestors, that project throughout the cortical mantle and basal ganglia that are important for decision making. These neurons release a neurotransmitter called dopamine, which has a powerful influence on our behavior. Dopamine has been called a "reward" molecule, but more important than reward itself is the ability of these neurons to predict reward: If I had that job, how happy would I be? Dopamine neurons, which are central to motivation, implement TD learning, just like VUMmx1.

TD learning solves the problem of finding the shortest path to a goal. It is an "online" algorithm because it learns by exploring and discovers the value of intermediate decisions in reaching the goal. It does this by creating an internal value function, which can be used to predict the consequences of actions. Dopamine neurons evaluate the current state of the entire cortex and inform the brain about the best course of action from the current state. In many cases the best course of action is a guess, but because guesses can be improved, over time TD learning creates a value function of oracular powers. Dopamine may be the source of the "gut feeling" you sometime experience, the stuff that intuition is made from.

When you consider various options, prospective brain circuits evaluate each scenario and the transient level of dopamine registers the predicted value of each decision. The level of dopamine is also related to your level of motivation, so not only will a high level of dopamine indicate a high expected reward, but you will also have a higher level of motivation to pursue it. This is quite literally the case in the motor system, where a higher tonic dopamine level produces faster movements. The addictive power of cocaine and amphetamines is a consequence of increased dopamine activity, hijacking the brain's internal motivation system. Reduced levels of dopamine lead to anhedonia, an inability to experience pleasure, and the loss of dopamine neurons results in Parkinson's Disease, an inability to initiate actions and thoughts.

TD learning is powerful because it combines information about value along many different dimensions, in effect comparing apples and oranges in achieving distant goals. This is important because rational decision-making is very difficult when there many variables and unknowns, so having an internal system that quickly delivers good guesses is a great advantage, and may make the difference between life and death when a quick decision is needed. TD learning depends on the sum of your life experiences. It extracts what is essential from these experiences long after the details of the individual experiences are no longer remembered.

TD learning also explains many of the experiments that were performed by psychologists who trained rats and pigeons on simple tasks. Reinforcement learning algorithms have traditionally been considered too weak to explain more complex behaviors because the feedback from the environment is sparse and minimal. Nonetheless reinforcement learning is universal among nearly all species and is responsible for some of the most complex forms of sensorimotor coordination, such as piano playing and speech. Reinforcement learning has been honed by hundreds of millions of years of evolution. It has served countless species well, in particular our own.

How complex a problem can TD learning solve? TD gammon is a computer program that learned how to play backgammon by playing itself. The difficulty with this approach is that the reward comes only at the end of the game, so it is not clear which were the good moves that led to the win. TD gammon started out with no knowledge of the game, except for the rules. By playing itself many times and applying TD learning to create a value function to evaluation game positions, TD gammon climbed from beginner to expert level, along the way picking up subtle strategies similar to ones that humans use. After playing itself a million times it reached championship level and was discovering new positional play that astonished human experts. Similar approaches to the game of Go have achieved impressive levels of performance and are on track to reach professional levels.

When there is a combinatorial explosion of possible outcomes, selective pruning is helpful. Attention and working memory allow us to focus on most the important parts of a problem. Reinforcement learning is also supercharged by our declarative memory system, which tracks unique objects and events. When large brains evolved in primates, the increased memory capacity greatly enhanced the ability to make more complex decisions, leading to longer sequences of actions to achieve goals. We are the only species to create an educational system and to consign ourselves to years of instruction and tests. Delayed gratification can extend into the distant future, in some cases extending into an imagined afterlife, a tribute to the power of dopamine to control behavior.

At the beginning of the cognitive revolution in the 1960s the brightest minds could not imagine that reinforcement learning could underlie intelligent behavior. Minds are not reliable. Nature is more clever than we are.

In essence, this particular theory proposes a biological link to why belief systems are so strong. It is, in the end, a theory, but one based on observation, and I find it intriguing. For my colleagues at Klemmer and Associates, I offer this as some biological backing for why "Harry the Worm" is so doggone persistent; in my own experience, I find that knowing why something works is often another tool in the toolbox for those who are about taking personal responsibility for their choices.

Support Wind Power



The Old Wolf

Latest Month

July 2018


Powered by LiveJournal.com