WAMM - Positive Reinforcement by Ruth Kellogg

Positive Reinforcement - by Ruth Kellogg

Teaching our dogs through the use of positive reinforcement is the most popular method of operant conditioning. The biggest reason that it has become so widespread, I feel, is that it enables a person to teach a dog what is desired in an easy way that creates pleasure for both. Many of us do not enjoy causing discomfort in our companions or teaching them in such a way that creates fear and insecurity.

People who use positive reinforcement in their dog training and daily life must become aware of the positive effects that are created. Who among us does not like receiving a “well done”, a reward/surprise or the opportunity to enjoy a favorite activity when we have done a task well. Our dogs are no different from us. They, too, like the praise, rewards, activities, and recognition that we strive for in our daily lives.

To review, in positive reinforcement (of operant conditioning) a subject chooses to change its behavior to receive a positive effect (such as praise, reward, favorite activity) in its environment. A reinforcement occurs during or immediately after the behavior is done. In positive reinforcement, the subject wants to achieve the reward and changes its behavior until it gets it. With repeated attempts using similar behavior changes which yields rewards, the behavior is changed. This is pure behavior modification using incentives.

What can be used as reinforcements? The answer is simple: whatever works! The reinforcement must be species appropriate (eg. while a dolphin would work happily for a fish, a horse wouldn’t) and be something that the subject enjoys. Attempting to use food as an incentive for a ball-crazy dog is as useless as trying to use a ball as an incentive for a strongly food orientated dog. Changing the reinforcements — even during a training session — adds variety and increases the attention and enjoyment factor of the subject.

The size of the reinforcement in a training session is very important. The premise is to use as small an amount of the motivator as possible to achieve the result of the subject (eg. dog) doing the behavior. Small amounts of the motivator increase the subject’s interest in getting more of the motivator as well as keeping its focus in the training session. When using food, use tiny amounts, as the idea is to get as many behaviors as possible, not feed a meal during the training session. If the dog becomes satiated, he won’t have the same desire to work, When using a toy or activity as the motivation, keep the sessions very brief otherwise the dog won’t be interested in working at all.

There is one exception to using small amounts of the motivator. Karen Pryor terms this exception a “jackpot”. These are much larger than the normal motivator. (For instance, instead of just giving a tiny piece of a hot dog, a whole hot dog is given.) These are big surprises and should not be over-used to keep the surprise effect. Jackpots are wonderful in marking a large breakthrough in a training session and are extremely effective.

Timing of giving the reinforcements is extremely important. Generally, if a new trainer is having difficulty in teaching his dog, the problem is usually with the timing of the reinforcement. Given too early, it doesn’t reinforce the actual behavior desired but rather the preceding behavior. Too early a reinforcement is also bribery, which is highly ineffective. If the reinforcement is given too late, the opportunity for acknowledging the actual behavior desired has been lost. Again, it is also ineffective. Correctly timed reinforcements do change behavior. Reinforcements communicate information to the subject about its behavior and must be given during or immediately after the desired behavior is achieved.

There are three types of schedules that outline how reinforcements should be given to the subject.

The first schedule, constant reinforcement, is to be used in the learning phase. This means the subject (dog) receives the positive reinforcement (treat) each time it changes its behavior on cue (lies down). The reinforcer acts as information and the dog must learn that when the cue (in this case, “down”) is given, he must change his behavior (i.e. lie down) in order to receive the positive reinforcement (treat). It has been shown through studies that while the subject continues to learn at a steady and moderate rate, overall there is a gradual slowing of the subject’s response times with brief and unpredictable pauses between the reward and the next behavior shown.

The second type of reinforcement schedule is the fixed ratio schedule. There is a fixed number of correct responses required by the subject before it receives a reward. This could be as often as two responses to one reward or, more infrequent, such as five responses to one reward. Studies have shown that the subject responds at a high and steady rate except immediately after the reinforcement before the next behavior is given. This is termed the post-reinforcement pause. The more responses the subject must make before it is rewarded, the longer the post- reinforcement pauses become.

The third schedule is the variable reinforcement schedule. In this, there is no set ratio of responses required for rewards. It allows great spontaneity for the trainer in rewarding the dog. For example, such a schedule might be one response/one reward; three responses/one reward; one response/one reward; five responses/one reward; two responses/one reward.... The strength of the variable ratio schedule is its unpredictability. Studies have shown that the subject responds at a high steady rate with a minimal post-reinforcement pause. A classic example is a slot machine.

Generally speaking, once a subject (eg. dog) has learned a behavior, it should be put on a variable ratio schedule. There is, however, one exception to this. Any time a subject must make a choice or solve a puzzle (eg. scent discrimination), the subject must be rewarded every time (constant reinforcement schedule). This is the only exception and should be adhered to without question.

Coupling the positive reinforcement techniques with an attitude of kindness, love, and clarity of purpose will give the trainer an obedient and educated dog. But this dog, unlike those trained exclusively with negative reinforcement, will be an individual who is confident, able to be flexible, has a developed sense of humor, can think and reason, and has a desire to learn more.

When a trainer has developed a clear and concise method of communicating what exactly is desired to his dog, then their education will take a quantum leap forward. Such a method is clicker training, which when used with shaping techniques, enables a trainer to teach his dog easily and without force. “Shaping” will be discussed in the next article.