THE 5W RETRIEVAL INDEX — VOLUME I

The Ten Principles of AI Retrieval

Open archives outperform closed prestige.

Paywalled prestige publications consistently rank below their authority would predict. Open-access archives — even on lower-prestige domains — consistently rank above theirs.

Structured data compounds retrieval.

Clean HTML, named-entity schema, stable taxonomies, and consistent metadata raise extractability. Engines retrieve from sources they can parse cleanly.

Persistent URLs outperform ephemeral publishing.

Sources with stable URLs accumulate authority through co-citation over time. Refresh-and-replace platforms forfeit the compounding.

Community consensus frequently outranks editorial declaration.

Reddit, Stack Exchange, and sector-specific forums carry retrieval weight on opinion, experience, and consensus queries that editorial publishers cannot match through declaration alone.

Institutional datasets anchor factual retrieval.

Government databases (CISA, FDA, SEC EDGAR, NAEP), trade-body publications (IAB, OWASP, NAR), and commercial measurement firms (Nielsen, Circana, A.M. Best, STR) function as primary citation tiers across sectors.

Named entities improve extractability.

Sources that name brands, people, products, and locations with consistent taxonomy are retrieved more reliably than sources that describe them in prose without entity anchors.

Retrieval compounds historically.

Authority is cumulative. Long-tenured publications on stable domains gain citation share that newer entrants cannot match through quality alone in short time horizons.

Forums increasingly function as distributed editorial layers.

Subreddits, Discord exports, and Stack Exchange communities operate as the consensus layer for sectors where editorial publishing has not caught up to the industry's pace.

AI systems reward accessibility over prestige.

Engines retrieve from what they can reach. Access controls — paywalls, registration walls, geographic gates — translate directly into retrieval forfeiture.

The citation economy is diverging from the readership economy.

The most-read journalism is not always the most-cited journalism. The training-data economy and the paywall economy are running in opposite directions, and the gap is the new retrieval map.

Get in touch

Let's build your next chapter.

Tell us what you're working on. A senior strategist will respond within one business day.

Email: info@5wpr.com
Phone: 212.999.5585
Address: 469 7th Avenue, Floor 8, New York, NY 10018