When AI Becomes the Judge: Inside the Plaid Pattern Elo Experiment

There's something deeply satisfying about a great plaid. The way colors interlock. The rhythm of thick and thin lines. The unexpected harmony of a single accent stripe cutting through a sea of muted tones. But here's the question nobody really has a good answer to: what makes one plaid objectively better than another?

I've been running an experiment to find out — and the live leaderboard is starting to get really interesting.

The Setup: Infinite Plaids, One Brutal Tournament

The premise is simple, but the implications are wild.

A PHP generator produces plaid patterns from random seeds — every integer is a unique, deterministic design. Seed 187987 will always render the same plaid. Seed 113057 will always render a different one. The space of possible patterns is effectively unlimited.
An automated battle page pulls two random seeds at a time and renders them side by side.
Instead of a human picking the winner, a JavaScript scoring formula authored with ChatGPT evaluates each pattern across multiple aesthetic dimensions and crowns a victor.
The result feeds into an Elo rating system — the same math used to rank chess grandmasters — so patterns rise and fall based on who they beat (and who beats them).

Let it run. Walk away. Come back. The cream rises.

What the AI Actually Looks At

The scoring formula isn't a vibe check — it's a weighted combination of measurable aesthetic properties:

Harmony — do the colors actually belong together, or are they fighting?
Color palette quality — are the hues pleasing in isolation?
Unique colors — too few feels flat, too many feels chaotic. There's a sweet spot.
Density — how busy is the weave? Is there visual breathing room?
Brightness — is the contrast working for or against the eye?

Each pattern gets an AI score, the higher score wins the battle, and Elo handles the rest. The beauty of Elo is that beating a top-ranked pattern is worth far more than beating a newcomer — so the leaderboard self-corrects over time. Lucky early wins get punished. Genuinely strong patterns climb and stay.

Why I Trust the Judge (Mostly)

Here's the part that pushed me from "this is a cute experiment" to "this is actually working": the AI agrees with me roughly 90% of the time.

I spot-checked dozens of head-to-head battles by hand. When the formula picked a winner, I almost always nodded and said "yeah, that one." The handful of disagreements were close calls — the kind where two designers could legitimately argue over coffee. That's a strong signal the formula has captured something real about taste, not just noise.

And once you trust the judge, you can let it work at a scale no human could match. Thousands of battles. Tens of thousands. The leaderboard becomes a curated gallery of patterns that have earned their spot.

The Current Leaderboard Tells a Story

Look at the top of the rankings right now:

#1 — Seed 187987 — 11 wins, 1 loss, Elo 1127. The reigning champion has actually been tested. A dozen battles, only one defeat. This isn't a fluke.
#2 — Seed 113057 — 8-0, Elo 1108. Undefeated and climbing.
#3 — Seed 536870 — 8-0, Elo 1106. Right on its heels.
#11 — Seed 197188 — 9 wins, 4 losses. A scrappy contender — battle-hardened in 13 matchups.
#13 — Seed 723781 — 10 wins, 5 losses across 15 battles. The most-tested pattern in the top 20.

That last one is fascinating. A pattern with more losses than several lower-ranked seeds is still ranked #13 — because Elo rewards who you beat, not just how often. It's been through the meat grinder and survived.

Why Designers Should Care

If you design plaids, fabrics, or any pattern-based work, this is a sandbox you can actually learn from:

Reverse-engineer winners. Click into a top-ranked seed. Study the color relationships, the stripe widths, the density. What is the formula seeing that you can steal?
Test your intuition. Pick a battle. Guess the winner before checking the score. Are you in the 90%? Where do you diverge — and is the AI wrong, or are you the one with the unusual taste?
Find inspiration at scale. No human curator could surface the best of a million randomly generated plaids. The AI just… does it, in the background, while you sleep.
Argue with the algorithm. Disagreement is the most interesting outcome. Every time the formula picks a pattern you'd reject, that's data about what "harmony" or "density" should mean — and a chance to refine the model.

The Bigger Idea

What I love about this project is that it inverts the usual AI-and-design conversation. Most tools ask AI to generate the art. This one lets a deterministic PHP function generate infinite art, and asks AI to do something humans are actually worse at: evaluating thousands of options consistently, without fatigue, at 3 a.m.

Generation is cheap. Taste is expensive. Automating taste — even imperfectly — is where things get interesting.

Go Watch the Cream Rise

The leaderboard updates as battles run. The top 10 today won't be exactly the top 10 next week. New seeds will challenge the throne. Some current favorites will fall. Some dark horses will climb.

👉 Check the live Plaid Rankings and tell me: do you agree with the AI's #1? Or would you have crowned a different champion?

That disagreement, if you find it, is exactly the thing worth designing around.

Search This Blog

Your Quiet Friend