All Series

Structured topic
learning paths.

Browse each category as a reading series, with articles arranged into a clean path from first idea to deeper technical detail.

20 articles across 4 series

01

SEO Basics

A complete learning series for search engine basics, organized from information retrieval and document models to PageRank, ranking metrics, machine learning, and SEO ethics.

Open Series ->
1.1
What Is Information Retrieval? The Core Problem Every Search Engine Solves

What Is Information Retrieval? The Core Problem Every Search Engine Solves

Before search engines existed, IR researchers were solving the same core problem: how do you retrieve a relevant document from a large collection? This article defines the field's core concepts, precision, recall, relevance, and the recall-precision tradeoff, grounding every later topic in a rigorous framework that directly impact on your SEO Strategies and build your SEO thinking.

FoundationResearch paperInteractive
10 minMay 15, 2026
1.2
What Is the Vector Space Model? How Documents Become Numbers (and Why That Changes Everything)

What Is the Vector Space Model? How Documents Become Numbers (and Why That Changes Everything)

The Vector Space Model represents documents and queries as mathematical vectors, making it possible to compare meaning through distance, angle, and weighted terms instead of simple keyword presence.

FoundationTechnicalResearch paperData studyMath formulaInteractive
12 minMay 16, 2026
1.3
TF-IDF and BM25: The Mathematics of Keyword Relevance (And Why Repetition Stops Helping)

TF-IDF and BM25: The Mathematics of Keyword Relevance (And Why Repetition Stops Helping)

TF-IDF rewards terms that appear often in a document but rarely across the collection. BM25 (Best Match 25) extends this with diminishing returns on term frequency and document-length normalisation. Both remain the baseline every modern ranking model is measured against, and understanding them explains why keyword stuffing has never worked.

FoundationTechnicalResearch paperMath formulaInteractive
11 minMay 17, 2026
1.4
PageRank: How Brin and Page Replaced Word-Counting with Link-Counting

PageRank: How Brin and Page Replaced Word-Counting with Link-Counting

In 1998, Brin and Page made the leap from word-counting to link-counting. PageRank models a "random surfer" who clicks links with probability d (the damping factor, ~0.85) and occasionally jumps to a random page, the probability of ending up on any page is its rank score. A link from a high-PageRank page passes more authority than one from a low-PageRank page. This lesson covers the formula, convergence, and why this changed the web.

FoundationTechnicalResearch paperMath formulaInteractive
14 minMay 18, 2026
1.5
Hubs and Authorities: How Kleinberg’s HITS Algorithm Explains Why Niche Links Beat Generic Ones

Hubs and Authorities: How Kleinberg’s HITS Algorithm Explains Why Niche Links Beat Generic Ones

Published the same year as PageRank, HITS computes two scores per page iteratively: an authority score (pages pointed to by many good hubs) and a hub score (pages that point to many good authorities). The eigenvector update converges to a stable ranking. HITS explains why topical link clusters matter, and why a link from a domain authority in your niche outweighs a generic high-PR link.

FoundationTechnicalResearch paperPracticalInteractive
15 minMay 19, 2026
1.6
Crawl, Index, Rank: The Search Engine Pipeline That Decides Whether Your Page Exists to Google

Crawl, Index, Rank: The Search Engine Pipeline That Decides Whether Your Page Exists to Google

Google officially describes three stages: crawling (URL discovery and page fetching), indexing (analysis and storage), and serving (ranking and result delivery). This lesson treats the pipeline as an engineering system with inputs, processes, queues, and failure modes, not just a list of stages. Understanding the whole system before studying each part prevents the tunnel-vision that most SEO courses suffer from.

FoundationTechnicalOfficial docData studyPracticalInteractive
16 minMay 20, 2026
1.7
From Strings to Things: How Google’s Knowledge Graph and Hummingbird Update Changed What “Relevant” Means

From Strings to Things: How Google’s Knowledge Graph and Hummingbird Update Changed What “Relevant” Means

The 2012 Knowledge Graph and 2013 Hummingbird update marked the transition from keyword matching to entity understanding. Google now models people, places, organisations, and concepts as nodes in a graph, a query about "Einstein" retrieves the entity, not the string. This lesson explains what entity-based search means for content strategy: topic authority replaces keyword density.

FoundationTechnicalOfficial docData studyPracticalInteractive
16 minMay 20, 2026
1.8
Learning-to-Rank: How Machine Learning Replaced the 200-Factor Checklist

Learning-to-Rank: How Machine Learning Replaced the 200-Factor Checklist

Modern search engines don't hard-code ranking rules, they train machine-learning models on query-document pairs. The learning-to-rank (LTR) field divides into three approaches: pointwise (score each document independently), pairwise (learn which of two documents is better), and listwise (optimise the entire ranked list). RankNet (2005) was the first major neural pairwise model. This lesson introduces the framework that modules 4.4 and 4.5 build on.

FoundationTechnicalResearch paperPracticalInteractive
16 minMay 20, 2026
1.9
MAP, MRR, and NDCG: The Metrics That Define What “Better Rankings” Actually Mean

MAP, MRR, and NDCG: The Metrics That Define What “Better Rankings” Actually Mean

Before you can improve a ranking system you need to measure it. Mean Average Precision (MAP) averages precision at every recall level. Mean Reciprocal Rank (MRR) measures how high the first correct result appears. Normalized Discounted Cumulative Gain (NDCG) accounts for graded relevance, a result in position 1 is worth more than position 5. These metrics drive every A/B test at Google and every LambdaRank training objective.

FoundationTechnicalResearch paperData studyMath formulaInteractive
18 minMay 20, 2026
1.10
The Ethics of Search, the Business Model That Funds It, and What SEO Actually Is

The Ethics of Search, the Business Model That Funds It, and What SEO Actually Is

Brin and Page wrote in 1998 that ad-funded search engines have incentives misaligned with user quality. Google's guidelines explicitly separate organic ranking (algorithmic, unpaid) from ads. This lesson covers the ethical framework of SEO, quality, user experience, long-term trust, and debunks the most persistent myths before they take root. It also sets up Google Search Console as the student's ground-truth monitoring tool.

FoundationPracticalOfficial docData studyInteractive
22 minMay 20, 2026
02

Crawling

How Googlebot discovers your pages by following links across the web.

Open Series ->
2.1
How Web Crawlers Work: Seeds, URL Frontiers & Crawl Rate

How Web Crawlers Work: Seeds, URL Frontiers & Crawl Rate

A web crawler is a program that discovers pages on the web by fetching URLs, reading their HTML, extracting links, and adding those new links to a queue of pages to visit next. Tha...

FoundationPracticalInteractiveOfficial doc
18 minMay 26, 2026
2.2
Crawl Strategies Explained: Breadth-First, Depth-First, and Focused Crawling

Crawl Strategies Explained: Breadth-First, Depth-First, and Focused Crawling

A web crawler's strategy determines which pages it discovers and in what order. The three foundational approaches (breadth-first, depth-first, and focused crawling) produce radically different outcomes from the exact same seed URL....

FoundationPracticalOfficial docInteractive
10 minJun 8, 2026
2.3
URL Discovery Explained: How Googlebot Finds Pages Through Links, Sitemaps, and Search Console

URL Discovery Explained: How Googlebot Finds Pages Through Links, Sitemaps, and Search Console

There is no central registry of web pages. Googlebot must continuously search for new and updated URLs on its own, using a process Google calls "URL discovery." There are three pathways into Googlebot's crawl frontier: following...

FoundationPracticalOfficial docInteractive
10 minJun 8, 2026
2.4
Crawl Budget Explained: Rate Limit, Crawl Demand, and What Wastes It

Crawl Budget Explained: Rate Limit, Crawl Demand, and What Wastes It

Crawl budget is the number of URLs Googlebot can and wants to crawl on your site within a given period. It is not a fixed number, and it is not a setting you can directly configure. It emerges from the interaction of two forces: how...

FoundationPracticalOfficial docInteractive
10 minJun 8, 2026
2.5
The robots.txt Protocol Explained: History, Syntax, Logic, and Real-World Traps

The robots.txt Protocol Explained: History, Syntax, Logic, and Real-World Traps

robots.txt is a plain text file at the root of a domain that instructs web crawlers which paths they are permitted to fetch. Proposed by Dutch software engineer Martijn Koster in February 1994 and refined into an IETF standard 28...

FoundationPracticalOfficial docInteractive
11 minJun 8, 2026
2.6
XML Sitemaps Explained: Schema, What to Include, What to Exclude, and Submission

XML Sitemaps Explained: Schema, What to Include, What to Exclude, and Submission

An XML sitemap is a structured file that lists the URLs you want search engines to consider for crawling and indexing. It is a declaration of intent, not a command. Google's own documentation is explicit on this point: submitting a...

FoundationPracticalOfficial docInteractive
10 minJun 8, 2026
2.7
Near-Duplicate Detection Explained: Hashing, Shingling, and Canonical Consolidation

Near-Duplicate Detection Explained: Hashing, Shingling, and Canonical Consolidation

Search engines cannot afford to store or rank multiple copies of the same content. By some estimates, as many as 40 percent of pages on the web are duplicates or near-duplicates of other pages. Crawlers solve this at scale using two...

FoundationPracticalOfficial docInteractive
11 minJun 8, 2026
2.8
JavaScript SEO Explained: Googlebot's Two-Phase Crawl, SSR, and Dynamic Rendering

JavaScript SEO Explained: Googlebot's Two-Phase Crawl, SSR, and Dynamic Rendering

Googlebot can execute JavaScript. That fact alone has misled more development teams than almost any other statement in SEO. The ability to render is not the same as reliable, timely indexing. Googlebot crawls and renders in two...

FoundationPracticalOfficial docInteractive
11 minJun 8, 2026
2.9
Internal Link Architecture Explained: Hub-and-Spoke, Link Depth, and PageRank Flow

Internal Link Architecture Explained: Hub-and-Spoke, Link Depth, and PageRank Flow

Site architecture is the mechanism by which a website distributes two resources that are always finite: PageRank and crawl budget. Every internal link is both a crawl pathway and an authority transfer. The hub-and-spoke model...

FoundationPracticalOfficial docInteractive
11 minJun 8, 2026
2.10
Diagnosing Crawl Problems: A Complete Audit Workflow Using Search Console, Log Files, and Third-Party Tools

Diagnosing Crawl Problems: A Complete Audit Workflow Using Search Console, Log Files, and Third-Party Tools

Crawl problems are invisible from the outside. A site can look functional to users while Googlebot is silently wasting budget on redirect chains, failing to reach important pages buried six clicks deep, or crawling JavaScript shells...

FoundationPracticalOfficial docInteractive
18 minJun 8, 2026
03

Indexing

Coming Soon

What happens after a page is crawled — and why some pages never make it into the index.

Open Series ->
Articles will appear here when this series is ready.
04

Ranking

Coming Soon

The signals, weights, and machine learning systems that decide which page wins position #1.

Open Series ->
Articles will appear here when this series is ready.
Contributors

Reviewed by people
who know the system.

All Authors ->