r/wallstreetbets May 3, 04:34 PM
Every Layer of the AI Money Printer Got Front-Run. Except One. Everyone called GPU bulls idiots in 2022. NVDA went from $150 to $950. Those idiots bought boats.
Then memory. Turns out every AI forward pass is basically a memory bandwidth problem with a GPU stapled to it. MU ran, SK Hynix became the most important company on earth and muricans still can’t spell it. Then it was powers turn. VST CEG NRG went stupid because someone finally asked what the fuck these data centers actually plug into. Then it was onto cooling. Then helium of all things, because chip fabs need it at every advanced node and Qatar just got blown up.
Every single time: one guy figures out the constraint, buys early, posts gains, everyone else piles in six months late.
every layer of this AI money machine has been discovered and front-run at this point,except the data layer, where RDDT is the biggest swinging dick. The human-generated, unfiltered, sometimes-completely-unhinged signal that makes these models not completely brain-dead. You cannot fake this stuff synthetically without the model quality going to shit.
You’re literally generating it right now by reading this.
“okay dumbass what about YouTube and Wikipedia” you ask? YouTube transcripts are garbage-in garbage-out. Wikipedia is a curated encyclopedia written by cautious virgins, useful but static and thin. Reddit generates Wikipedia’s entire catalogue every single month. Steve Huffman said it himself in the earnings call last Thursday. And it’s not just volume, it’s the structure. Actual threaded debate with domain experts in niche subs saying things they’d never put on a resume. Upvotes naturally filtering signal from noise. That’s exactly what frontier model training needs and you can’t replicate it anywhere else.
So why hasn’t the market figured this out?Because there’s no physical bottleneck to photograph. You can’t show someone an empty helium tank or a GPU waitlist. So Wall Street just models Reddit as a mid ad-tech company with a weird side hustle and moves on.
Early Google licensing deals got signed before anyone understood what this data was worth. Those renewals are coming. Reddit is also actively suing Anthropic and Perplexity for scraping without paying. If they win or settle favorably, it doesn’t just help Reddit, it sets legal precedent that reprices training data across the entire industry.
The frontier labs are data-constrained now, not compute-constrained. Altman has basically said it out loud. Reddit generates exactly what they need, every day, forever.
Every bottleneck had its moment, datas next and there’s only one pure play.
Position: 5000 shares
submitted by /u/early-retirement-plz
[link] [comments]