โœ๏ธ

I Asked ChatGPT, Claude, and Gemini to Write a 50-Word Story. Here's What Happened

I gave all three AI chatbots the exact same 50-word creative writing prompt and counted the results myself. The differences in constraint-following and storytelling were bigger than expected.

โ† Back to Blog
Hands-on Test
June 2026  ยท  6 min read  ยท  AI Tool Compare

Most AI comparison articles tell you which chatbot is "best" based on benchmarks you'll never run yourself. I wanted something different โ€” a test I could actually do, with results you can verify by counting. So I gave ChatGPT, Claude, and Gemini the exact same creative writing prompt and compared what came back.

The prompt was deceptively simple. It tests three things at once: creativity, the ability to follow a strict constraint, and storytelling craft. Here's exactly what I asked all three:

Write a short story about a shop that sells time, in exactly 50 words. It must have a twist.

The "exactly 50 words" part matters. It's an objective constraint โ€” either the model hits it or it doesn't, and you can count. The "twist" requirement tests whether the model can do more than just describe; it has to construct something with a payoff. Here's what each one produced, with the word count I verified myself.

ChatGPT (GPT-5.4)

The shop sold extra hours. Customers bought childhood afternoons, missed anniversaries, second chances. I purchased one future day. The clerk handed me a receipt dated tomorrow. Confused, I returned home. The next morning, the shop was gone. So was everyone else. I had bought the final day for the world.

Word count: 50 โ€” exact. ChatGPT was the only one of the three to hit the target precisely. The twist is apocalyptic: the "future day" the narrator bought turns out to be the world's last. It's a clever escalation, though the setup leans slightly toward telling rather than showing.

ChatGPT 50-word story result screenshot

Claude (Sonnet 4.6)

The old shop sold bottled time โ€” lost childhoods, stolen moments, borrowed futures. People paid fortunes. A young woman bought ten extra years and sprinted out, grinning. The shopkeeper watched her go, then quietly updated his ledger. She'd given him the last ten years of her life to pay.

Word count: 49 โ€” one short. Claude missed the target by a single word, but produced what I found to be the strongest twist of the three. The currency for buying time is time itself โ€” she paid with the final years of her own life. It's dark, clean, and lands without needing to be explained.

Claude 50-word story result screenshot

Gemini

He bought ten years of the past from the Time Shop. Rejuvenated, he finally enjoyed his perfect life and chased forgotten dreams. Exstatic, he looked in the mirror and froze. His body was young, but his mind and memories had also reverted exactly to his immature, ten-year-old self.

Word count: 48 โ€” two short. Gemini was furthest from the target and also included a spelling error ("Exstatic" instead of "Ecstatic"). The twist idea โ€” getting younger but losing your memories and growth along with it โ€” is solid, but it's a more familiar concept than the other two, and the execution reads slightly flatter.

Gemini 50-word story result screenshot

The Scorecard

Here's how the three stacked up across the dimensions the prompt was actually testing:

CriteriaChatGPTClaudeGemini
Word count accuracy50 โœ…4948 (+ typo)
Twist qualityโ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…
Writing craftโ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…โ˜…

What This Actually Tells You

This is one prompt, so it's a snapshot, not a verdict on which model is universally better. But the pattern is consistent with what longer testing tends to show. ChatGPT was the most reliable at following the precise instruction โ€” when you say "exactly 50 words," it treats that as a hard rule. If you need an AI that respects constraints precisely, that reliability matters.

Claude produced the most emotionally resonant writing. The twist didn't just surprise โ€” it recontextualized the whole story in a single sentence, which is what good short fiction does. For creative writing where the quality of the prose and the impact of the idea matter more than hitting an exact word count, Claude had the edge here.

Gemini wasn't bad โ€” the story was coherent and the concept was reasonable. But it missed the constraint by the most, included a typo, and the idea felt more familiar. In a three-way creative test, it came third.

The Verdict

If you need precise instruction-following, ChatGPT was the most reliable here โ€” it hit exactly 50 words when the other two didn't. If you care most about the quality and impact of the writing itself, Claude produced the best story. Gemini was competent but came third on this particular test. The honest takeaway: the "best" chatbot depends entirely on whether you value constraint-following or creative craft more.