The operant method of teaching dogs is training based on the laws of the formation of an operant (conditioned) reflex. Sometimes the same reflex is called an instrumental reflex, sometimes a conditioned reflex of the II type, and sometimes even a heterogeneous conditioned reflex.
From the classic conditioned reflex named after I.P. Pavlova, this reflex differs in that it is based on the active purposeful activity of the animal, caused by any need. And the reinforcement is the result of this very active and purposeful activity. Whereas in the classical conditioned reflex, the reinforcement is the unconditioned or simply the second most effective stimulus.
Operant learning was discovered by the American scientist E.L. Thorndike thanks to the cleverness of cats and dogs. The fact is that Thorndike, finding out the ability of animals to learn, designed a special cage equipped with a door with a simple lock. Closing the cats and dogs in this cage, he watched with the healthy gloating joy of a scientist as his smaller brothers learned to open this door. And the smaller brothers and sisters learned to open the door by making various attempts, some of which were successful, and some were not. That is why Thorndike called the form of learning he discovered “the trial and error method.”
A reflex, however, this form of learning was christened much later by another famous American scientist – B.F. Skinner, who devoted his entire scientific life to him. That is why, among several fathers of the operant reflex, Skinner is considered the main father. However, in fairness, we note that for the first time in the world training based on operant learning was described by our wonderful trainer Vladimir Durov in his book “Animal Training. Psychological observations of animals trained according to my method. 40 years of experience ”. Thus, you can read about the Russian version of operant training in the book by Vladimir Durov, and the American version of operant training is well described in the book by psychologist and trainer Karen Pryor “Don’t growl at the dog!”, Which, by the way, I also advise you to read.
The general technique of Skinner operant training can be described in the following steps:
The stage of deprivation. This is what Skinner called this stage in the 30s of the last century. However, this stage should now be referred to as the “basic need selection and creation stage”.
In the formation of an operant conditioned reflex, almost all of the needs known to dogs can be used, but Skinner more often used the nutritional need. And the point of the deprivation phase was that Skinner either underfed the animals for a while or starved them out. It was believed that food reinforcement only became meaningful for the animal and effective for learning when the animal lost about 20% of its live weight. About times, about morals!
The stage of the formation of conditional food reinforcement. In his research, Skinner used automatic feeders, the sound of which was supposed to be a signal for the animals to create a pellet. And that took time. The stage was considered completed when, in response to the sound of the feed mechanism, the rat immediately ran to the trough.
In fact, this stage is the formation of a classical conditioned sound reflex with food reinforcement. It also serves as the basis for so-called clicker training, a training method using conditioned auditory food positive reinforcement.
And we have to admit that the school of operant training favorably distinguishes from domestic traditional training the attention that operant training pays to the issue of reinforcement. Especially positive and probabilistic reinforcement.
Reaction formation stage. As a model behavior, Skinner taught his rats to press the pedal and the pigeons to peck the key. The formation of the reaction of pressing the pedal was carried out in one of three ways: by trial and error (spontaneous formation), by directional or sequential formation, and by the method of the target.
The spontaneous formation consisted in the fact that the animal, while traveling through Skinner’s box, accidentally pressed the pedal and gradually associated pressing it with the activation of the auto-feeder.
With directional formation, the researcher turned on the auto-feeder, first reinforcing any orientation towards the pedal, then approaching it and, finally, pressing it. Why not clicker training!
And the method of the target was that a feed pellet was glued to the key, attempts to tear off which led to pressing the lever.
The modern technique of operant training to initiate the desired behavior allows the use of almost all known methods of influencing an animal. However, it is considered ineffective to use aversive (leading to pain or discomfort) stimuli.
Bringing behavior under stimulus control or introducing a differentiating stimulus. In other words, the introduction of a conditioned stimulus or command.
Skinner and his supporters believed that the formation of action and the simultaneous parallel development of its connection with a conditioned stimulus (command) are two different processes. And the simultaneous assimilation of two different things makes learning difficult. Therefore, traditional operants first shape the behavior and then enter the command.
It should be emphasized that in operant learning, the differentiating stimulus is by and large not a command in our understanding. A command is like an order, isn’t it? This is how we usually interpret it. And the differentiating stimulus is information that it is now the performance of the behavior that is most effective and generally possible. Thus, the “command” in operant training has the function of allowing and allowing behavior.
To make it clearer, let us examine the introduction of a light bulb into the experiment as a differentiating stimulus. So, the rat has learned to step on the pedal and press on it when it wants to eat. The researcher turns on the light bulb for a couple of seconds and creates conditions under which pressing the pedal only when the light bulb is on leads to feeding. And when the light turns off, no matter how much you press, you will have a combination of three fingers! That is, turning on a light bulb creates, separates, distinguishes, differentiates different conditions. And the rat soon begins to understand this. And since she really wants to eat (she has a nutritional need!), When she sees the light bulb is on, she immediately runs to the pedal and, well, press it! From the side, it seems that the light bulb is on makes the rat, orders it to press the pedal. But now you understand that it is not so. The included light says: now you can press the pedal. But only!
Reinforcement of behavior. Consolidation of the formed behavior before the skill is carried out by repetition using probabilistic reinforcement. It is also useful for this to use different needs and, accordingly, apply different reinforcements.
The domestic version of the operant training method, originating from Vladimir Durov, differs only in that it allows you to immediately enter an executive stimulus (a command, a differentiating stimulus, a conditioned stimulus). Practice shows that the skill is formed no slower than with the imported method. And since it eliminates an entire step, it saves time. So it makes sense to support the domestic manufacturer of training methods!