Slot Machine Variable Reward

  

Rewards

Turn yourself into a slot machine. One major reason why is the #1 psychological ingredient in slot machines: intermittent variable rewards (he links to Wikipedia for more on this). The average person checks their phone 150 times a day. Why do we do this? The City of Sin is home to 197,144 slot machines in total. And on average, these devices have a 93.77% payback rate. This means every time someone inserts a buck and pulls the lever, they will get $0.9377 back on average, effectively losing 6% with each pull. So if this is such a losing bet, why on earth do people keep coming back to them?

By closely monitoring the occurrence of behaviors and the frequency of rewards, Skinner was able to look for patterns. Receiving a reward each time the lever is pressed would be an example of continuous reinforcement. But Skinner also wanted to know how behavior might change if the reward wasn’t always present. This is known as intermittent reinforcement (or partial reinforcement). By tracking the accumulated behavioral responses of animals in his operant boxes over time, Skinner could see how different reward schedules influenced the timing and frequency of behavior. Though each of these approaches could be varied in countless ways, there were 4 general types of schedules that Skinner tested.

Fixed-Ratio (The Vending Machine)

A fixed-ratio schedule follows a consistent pattern of reinforcing a certain number of behaviors. This may come in the form of rewarding every behavior (1:1) or only rewarding every 5th response (5:1), according to some set rule. Just as nobody continuously feeds coins to a broken vending machine, when the set ratio is violated (like when each level press no longer delivers food), animals quickly learn to reduce their behavior.

Variable-Ratio (The Slot Machine)

A variable-ratio schedule rewards a particular behavior but does so in an unpredictable fashion. The reinforcement may come after the 1st level press or the 15th, and then may follow immediately with the next press or perhaps not follow for another 10 presses. The unpredictable nature of a variable-ratio schedule can lead to a high frequency of behavior, as the animal (or human) may believe that the next press will “be the one” that delivers the reward.

This is the type of reinforcement seen in gambling, as each next play could provide the big payoff. Skinner found that behaviors rewarded with a variable-ratio schedule were most resistant to extinction. To illustrate this, consider a broken vending machine (fixed ratio) versus a broken slot machine (variable-ratio). How long would you keep putting money into a broken vending machine? You’d probably give up after your first or maybe second try didn’t result in a delicious Snickers bar. But now imagine playing a slot machine that is broken and unable to pay out (though everything else appears to be working). You might play 15 times or more before you cease your coin-inserting and button-pressing behavior.

Slot Machine Variable RewardSlot Machine Variable Reward

Fixed-Interval (The Paycheck)

In a fixed-interval schedule, reinforcement for a behavior is provided only at fixed time intervals. The reward may be given after 1 minute, every 5 minutes, once an hour, etc. What Skinner found when implementing this schedule was that the frequency of behavior would increase as the time for the reward approached (ensuring that the animal gets the reward), but would then decrease immediately following the reward, as if the animal knew that another reward wouldn’t be arriving any time soon.

This may be of concern for human fixed-interval situations like biweekly or monthly paychecks, as work effort may be reduced immediately after a paycheck has been received (just as most students reduce studying effort in the days immediately following exams, because the next exams aren’t coming for a while).

Xtra Rewards Casino Slot Machines

Variable-Interval (The Pop-Quiz)

In a variable-interval schedule, reinforcement of a behavior is provided at a varying time interval since the last reinforcement. This means a pigeon might be rewarded for pecking after 10 seconds, or it might be rewarded after 1 minute, then after 5 minutes, then 5 seconds and the time interval between reinforcements is always changing. This schedule produces a slow and steady rate of response. The pigeon pecks steadily so it doesn’t miss any opportunities for reinforcement but there’s no need to rush, since that won’t influence the length of delays.

A human comparison might be a class with pop-quizzes for extra credit given at varying and unpredictable times. These would encourage students to study a little each day to always be prepared to earn some points, though they probably wouldn’t cram for hours and hours every night.

Superstitious Minds

Skinner also tried rewarding the animals at random, dropping food into the box at unpredictable times that didn’t correspond to any particular desired behavior. Rather than doing nothing and just waiting for the food to arrive, the animals who were rewarded randomly developed bizarre “superstitious” behaviors.

If the animal was lifting a leg or turning his head in the moment preceding the reward, this behavior would be reinforced, making it more likely to be repeated. If, by chance, this behavior was repeated as the reward was delivered again (randomly), this would further serve to reinforce the behavior. As a result, Skinner found pigeons turning in circles or hopping on one leg, simply as a result of this random reinforcement. From this we may view all sorts of superstitious human behaviors, from rain dances to lucky charms to salt thrown over the shoulder, as the result of chance occurrences of reinforcement.

Vegas Slots Rewards

Looking for more information on learning theory and behaviorism? This post is an excerpt from the learning chapter of Master Introductory Psychology. You can find this chapter in the ebook version of Volume 2 or in the complete print edition.