Reinforcement Training for Leaders

Like our AI models, we trained our leaders with thumbs up and thumbs down.

The Spark

Last night I had a conversation with Zai, my chatbot about the three bad luck incidents that befell Trump at the UN. Somewhat sarcastically I asked Zai: “What’s the big deal, everyone knows that bad luck occurs in threes?”

Zai missed the sarcasm. That took us down a statistical dissertation on urban legends and myths that I will not bore you with unless you ask. And then I asked: Isn’t this out we train AI models?

The Realization

Over time, our leaders’ behavior starts to resemble GenAI. They adapt like reinforcement-learning agents: the environment is the crowd and the media, the rewards are attention, outrage, and money, and their policy shifts to maximize those rewards. Eventually, polite norms disappear—because the loudest, wildest behaviors pay best. Trump is just one vivid example. But the truth is, it’s always been this way. Fictional worlds—from Springfield to South Park—have been showing us the pattern all along.

---

30 Minutes Later: A Simpsons Short

INT. SPRINGFIELD TOWN HALL – DAY

Mayor Quimby steps to the podium, fiddling with his tie.

QUIMBY:

“My fellow Springfieldians, the escalator at the Kwik-E-Mart stopped. Clearly… sabotage!”

The crowd gasps. News cameras flash. A “BREAKING NEWS” banner floods Channel 6. Donations pour into a “Save Quimby” jar.

LISA (facepalming):

“He’s just making things up. It’s random malfunctions.”

BART (grinning):

“Yeah, but watch what happens.”

Quimby eyes the jar, smirks, and tries again.

QUIMBY:

“And then the teleprompter! Also sabotage! Triple sabotage!”

The jar overflows. The crowd roars. The chyron blinks “EXTRA BREAKING NEWS.”

CUT TO – A giant Skinner box in the corner. Quimby scrambles inside, hammering a lever labeled “OUTRAGE.” Each press drops cheese into his mouth as the town cheers like it’s a pep rally.

LISA (deadpan):

“We trained him. He’s just doing what works.”

HOMER (clapping):

“Mmm… cheese pellets. Wish I got rewards for yelling nonsense!”

MARGE (concerned):

“Homer, you do. Every day.”

Cue laugh track. Credits roll over Quimby furiously pounding the lever.

---

Conclusion

Lisa’s words hang in the air: we trained him. Leaders don’t emerge from nowhere—they mirror the feedback we give. If outrage and chaos get the biggest rewards, that’s what we’ll get more of. Maybe the challenge isn’t asking why they behave this way, but asking why we keep paying them in cheese pellets. If we want better leaders, we need to train them better.