All Articles

THE
KNOWLEDGE
BASE.

Every article in one place. Structured articles on how search engines crawl, index, rank, and everything in between.

Diagnosing Crawl Problems: A Complete Audit Workflow Using Search Console, Log Files, and Third-Party Tools
Crawling18 min

Diagnosing Crawl Problems: A Complete Audit Workflow Using Search Console, Log Files, and Third-Party Tools

Crawl problems are invisible from the outside. A site can look functional to users while Googlebot is silently wasting budget on redirect chains, failing to reach important pages buried six clicks deep, or crawling JavaScript shells...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
Internal Link Architecture Explained: Hub-and-Spoke, Link Depth, and PageRank Flow
Crawling11 min

Internal Link Architecture Explained: Hub-and-Spoke, Link Depth, and PageRank Flow

Site architecture is the mechanism by which a website distributes two resources that are always finite: PageRank and crawl budget. Every internal link is both a crawl pathway and an authority transfer. The hub-and-spoke model...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
JavaScript SEO Explained: Googlebot's Two-Phase Crawl, SSR, and Dynamic Rendering
Crawling11 min

JavaScript SEO Explained: Googlebot's Two-Phase Crawl, SSR, and Dynamic Rendering

Googlebot can execute JavaScript. That fact alone has misled more development teams than almost any other statement in SEO. The ability to render is not the same as reliable, timely indexing. Googlebot crawls and renders in two...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
Near-Duplicate Detection Explained: Hashing, Shingling, and Canonical Consolidation
Crawling11 min

Near-Duplicate Detection Explained: Hashing, Shingling, and Canonical Consolidation

Search engines cannot afford to store or rank multiple copies of the same content. By some estimates, as many as 40 percent of pages on the web are duplicates or near-duplicates of other pages. Crawlers solve this at scale using two...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
XML Sitemaps Explained: Schema, What to Include, What to Exclude, and Submission
Crawling10 min

XML Sitemaps Explained: Schema, What to Include, What to Exclude, and Submission

An XML sitemap is a structured file that lists the URLs you want search engines to consider for crawling and indexing. It is a declaration of intent, not a command. Google's own documentation is explicit on this point: submitting a...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
The robots.txt Protocol Explained: History, Syntax, Logic, and Real-World Traps
Crawling11 min

The robots.txt Protocol Explained: History, Syntax, Logic, and Real-World Traps

robots.txt is a plain text file at the root of a domain that instructs web crawlers which paths they are permitted to fetch. Proposed by Dutch software engineer Martijn Koster in February 1994 and refined into an IETF standard 28...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
Crawl Budget Explained: Rate Limit, Crawl Demand, and What Wastes It
Crawling10 min

Crawl Budget Explained: Rate Limit, Crawl Demand, and What Wastes It

Crawl budget is the number of URLs Googlebot can and wants to crawl on your site within a given period. It is not a fixed number, and it is not a setting you can directly configure. It emerges from the interaction of two forces: how...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
URL Discovery Explained: How Googlebot Finds Pages Through Links, Sitemaps, and Search Console
Crawling10 min

URL Discovery Explained: How Googlebot Finds Pages Through Links, Sitemaps, and Search Console

There is no central registry of web pages. Googlebot must continuously search for new and updated URLs on its own, using a process Google calls "URL discovery." There are three pathways into Googlebot's crawl frontier: following...

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
Crawl Strategies Explained: Breadth-First, Depth-First, and Focused Crawling
Crawling10 min

Crawl Strategies Explained: Breadth-First, Depth-First, and Focused Crawling

A web crawler's strategy determines which pages it discovers and in what order. The three foundational approaches (breadth-first, depth-first, and focused crawling) produce radically different outcomes from the exact same seed URL....

FoundationPracticalOfficial docInteractive
Jun 8, 2026Read ->
How Web Crawlers Work: Seeds, URL Frontiers & Crawl Rate
Crawling18 min

How Web Crawlers Work: Seeds, URL Frontiers & Crawl Rate

A web crawler is a program that discovers pages on the web by fetching URLs, reading their HTML, extracting links, and adding those new links to a queue of pages to visit next. Tha...

FoundationPracticalInteractiveOfficial doc
May 26, 2026Read ->
The Ethics of Search, the Business Model That Funds It, and What SEO Actually Is
SEO Basics22 min

The Ethics of Search, the Business Model That Funds It, and What SEO Actually Is

Brin and Page wrote in 1998 that ad-funded search engines have incentives misaligned with user quality. Google's guidelines explicitly separate organic ranking (algorithmic, unpaid) from ads. This lesson covers the ethical framework of SEO, quality, user experience, long-term trust, and debunks the most persistent myths before they take root. It also sets up Google Search Console as the student's ground-truth monitoring tool.

FoundationPracticalOfficial docData studyInteractive
May 20, 2026Read ->
MAP, MRR, and NDCG: The Metrics That Define What “Better Rankings” Actually Mean
SEO Basics18 min

MAP, MRR, and NDCG: The Metrics That Define What “Better Rankings” Actually Mean

Before you can improve a ranking system you need to measure it. Mean Average Precision (MAP) averages precision at every recall level. Mean Reciprocal Rank (MRR) measures how high the first correct result appears. Normalized Discounted Cumulative Gain (NDCG) accounts for graded relevance, a result in position 1 is worth more than position 5. These metrics drive every A/B test at Google and every LambdaRank training objective.

FoundationTechnicalResearch paperData studyMath formulaInteractive
May 20, 2026Read ->
Contributors

Reviewed by people
who know the system.

All Authors ->