Master MediaWiki database schema and write optimized queries. Covers all 64 core tables with field definitions, indexes, relationships, and query optimization techniques. Includes replica vs primary strategies, JOIN patterns, pagination, caching, and 50+ real-world examples for Wikimedia/MediaWiki development.
Master the MediaWiki database schema and write optimized queries for wiki data. This skill provides comprehensive documentation of all 64 core database tables, relationships, and best practices for querying wiki data efficiently.
Use this skill when you need to:
// For READ operations (use replicas)
$services = MediaWikiServices::getInstance();
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
// For WRITE operations (use primary)
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
// Simple SELECT with WHERE and LIMIT
$result = $dbr->select(
'page', // table
[ 'page_id', 'page_title' ], // fields to select
[ 'page_namespace' => 0 ], // WHERE conditions
__METHOD__, // method name for logging
[ 'LIMIT' => 10 ] // options
);
// Process results
foreach ( $result as $row ) {
echo $row->page_title . "\n";
}
// Get pages with their latest revision timestamp
$result = $dbr->select(
[ 'page', 'revision' ],
[ 'page_id', 'page_title', 'rev_timestamp' ],
[ 'page_namespace' => 0 ],
__METHOD__,
[],
[ 'revision' => [ 'LEFT JOIN', 'page_id = rev_page AND rev_id = page_latest' ] ]
);
$dbw->insert(
'page',
[
'page_namespace' => 0,
'page_title' => 'New_Page',
'page_is_redirect' => 0,
'page_latest' => 1,
'page_len' => 100,
'page_random' => wfRandom()
],
__METHOD__
);
MediaWiki's 64 tables are organized into logical categories:
Goal: Find the right table for your data and understand what it contains.
Steps:
references/schema-complete.mdExample: You need to find the page ID for a specific wiki page.
Data type: A wiki page
Table: page
Fields needed: page_id, page_namespace, page_title
Index to use: page_name_title (unique index on namespace + title)
Why: Pages are uniquely identified by namespace + title, not title alone.
The page_name_title index makes this lookup very fast.
Best Practices:
references/schema-complete.md before writing queriestext)*_deleted fields)Goal: Write queries that use indexes efficiently and return only needed data.
Steps:
SELECT *Example: Get recently edited pages in the main namespace
$result = $dbr->select(
'page',
[ 'page_id', 'page_title', 'page_touched' ], // Only needed columns
[
'page_namespace' => 0, // Use indexed column
'page_touched >= ' . $dbr->addQuotes(
wfTimestamp( TS_MW, time() - 86400 ) // Last 24 hours
)
],
__METHOD__,
[
'ORDER BY' => 'page_touched DESC',
'LIMIT' => 100 // Always limit
]
);
Performance Tips:
schema-complete.md for indexesWHERE YEAR(timestamp) = 2024 won't use indexCommon Anti-Patterns to Avoid:
SELECT * on large tables - wastes resourcesGoal: Use the right database connection for your operation.
Decision Tree:
Are you reading data?
├─ Yes, will read immediately after writing in same request?
│ └─ Use PRIMARY (replica lag consideration)
├─ Yes, just reading without writing?
│ └─ Use REPLICA (DB_REPLICA)
└─ No, you're writing/updating?
└─ Use PRIMARY (DB_PRIMARY)
In a transaction?
└─ Always use PRIMARY (keep transaction on same connection)
Why Replicas?
Why Primary?
Code Examples:
// Correct: Read from replica
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$row = $dbr->selectRow( 'page', '*', [ 'page_id' => 1 ] );
// Correct: Write to primary
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $pageData );
// Correct: Read immediately after write (same connection)
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $pageData );
$newRow = $dbw->selectRow( 'page', '*', [ 'page_id' => $newId ] );
// WRONG: Writing to replica
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$dbr->insert( 'page', $pageData ); // ERROR!
// WRONG: Assuming immediate replica consistency
$dbw->insert( 'page', $pageData );
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$row = $dbr->selectRow( 'page', '*', [ 'page_id' => $newId ] ); // May not exist yet!
Goal: Combine data from multiple tables efficiently.
Steps:
Example: Get a user's contributions with page titles
// Join: user → actor → revision → page
$result = $dbr->select(
[ 'actor', 'revision', 'page' ],
[ 'actor_name', 'rev_timestamp', 'page_namespace', 'page_title' ],
[ 'actor_name' => 'Example' ],
__METHOD__,
[ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 50 ],
[
'revision' => [ 'INNER JOIN', 'actor_id = rev_actor' ],
'page' => [ 'INNER JOIN', 'rev_page = page_id' ]
]
);
Common Join Patterns:
page_id = rev_pagepage_namespace, page_title match link targetrev_id = slot_revision_id → slot_content_id = content_iduser_id = actor_useractor_id = rev_actor or log_actorSee: references/table-relationships.md for more join patterns.
Goal: Identify slow queries and understand why they're slow.
Steps:
EXPLAIN Example:
// Run EXPLAIN on your query
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
// Get the query
$query = $dbr->selectQueryBuilder()
->select( [ 'page_id', 'page_title' ] )
->from( 'page' )
->where( [ 'page_namespace' => 0, 'page_is_redirect' => 0 ] )
->limit( 100 )
->getSQL();
// Run EXPLAIN on it
$explainResult = $dbr->query( "EXPLAIN " . $query );
// Check the type column:
// - "const" = one row (best)
// - "ref" = index lookup (good)
// - "range" = index range scan (okay)
// - "ALL" = full table scan (bad)
What to Look For:
Optimization Strategies:
type = ALL and you have a WHERE, add an index on the WHERE columnrows is very high, add LIMIT or more specific WHERE conditionsfiltered is low, your WHERE clause is inefficientGoal: Use proven query patterns for common tasks.
Common Patterns:
1. Get a page by title
$page = $dbr->selectRow(
'page',
[ 'page_id', 'page_latest', 'page_len' ],
[ 'page_namespace' => 0, 'page_title' => 'Main_Page' ],
__METHOD__
);
2. Get recent changes to a page
$revisions = $dbr->select(
[ 'revision', 'actor' ],
[ 'rev_id', 'rev_timestamp', 'actor_name' ],
[ 'rev_page' => $pageId ],
__METHOD__,
[ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 20 ],
[ 'actor' => [ 'JOIN', 'rev_actor = actor_id' ] ]
);
3. Get user contributions
$contributions = $dbr->select(
[ 'actor', 'revision', 'page' ],
[ 'rev_timestamp', 'page_namespace', 'page_title', 'rev_minor_edit' ],
[ 'actor_name' => $username ],
__METHOD__,
[ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 50 ],
[
'revision' => [ 'JOIN', 'actor_id = rev_actor' ],
'page' => [ 'JOIN', 'rev_page = page_id' ]
]
);
4. Get pages in a category
$pages = $dbr->select(
[ 'categorylinks', 'page' ],
[ 'page_id', 'page_namespace', 'page_title' ],
[ 'cl_to' => $categoryTitle ],
__METHOD__,
[ 'LIMIT' => 100 ],
[ 'page' => [ 'JOIN', 'cl_from = page_id' ] ]
);
5. Get all pages linking to a target
$links = $dbr->select(
[ 'pagelinks', 'page' ],
[ 'page_namespace', 'page_title' ],
[],
__METHOD__,
[ 'LIMIT' => 100 ],
[
'page' => [ 'JOIN', 'pl_from = page_id' ],
// Filter by target - use linktarget table
]
);
See: references/common-tables.md for detailed examples of each table.
// Good
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$result = $dbr->select( 'page', '*', [] );
// Avoid
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$result = $dbw->select( 'page', '*', [] ); // Unnecessary primary load
// Good
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $data );
// Avoid
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$dbr->insert( 'page', $data ); // Will fail - replicas are read-only
// Good
$dbr->select( 'page', [ 'page_id', 'page_title' ], [] );
// Avoid
$dbr->select( 'page', '*', [] ); // Wastes memory and bandwidth
// Avoid
$dbr->select( 'page', [ '*' ], [] ); // Same as above
// Good (uses index)
$dbr->select( 'page', '*', [ 'page_namespace' => 0, 'page_title' => 'Test' ] );
// Avoid (no index on page_touched for this query)
$dbr->select( 'page', '*', [ 'page_touched > ' . time() - 86400 ] );
// Better (add index or avoid condition)
$dbr->select( 'page', '*', [ 'page_is_redirect' => 0 ], __METHOD__, [ 'LIMIT' => 100 ] );
// Good (safe limit)
$dbr->select( 'page', '*', [], __METHOD__, [ 'LIMIT' => 100 ] );
// Avoid (no limit - could get millions of rows)
$dbr->select( 'page', '*', [] );
// Avoid (very large limit)
$dbr->select( 'page', '*', [], __METHOD__, [ 'LIMIT' => 1000000 ] );
// Good (uses index)
$dbr->select( 'revision', '*', [ 'rev_timestamp >= ' . $cutoff ] );
// Avoid (function prevents index usage)
$dbr->select( 'revision', '*', [ 'YEAR(rev_timestamp) = 2024' ] );
// Good alternative
$start = wfTimestamp( TS_MW, mktime( 0, 0, 0, 1, 1, 2024 ) );
$end = wfTimestamp( TS_MW, mktime( 0, 0, 0, 1, 1, 2025 ) );
$dbr->select( 'revision', '*', [
'rev_timestamp >= ' . $start,
'rev_timestamp < ' . $end
] );
// Slow (OFFSET scans all rows up to the offset)
// SELECT * FROM page LIMIT 10 OFFSET 5000; // Scans 5010 rows!
$dbr->select( 'page', '*', [], __METHOD__,
[ 'LIMIT' => 10, 'OFFSET' => 5000 ]
);
// Fast (keyset pagination - only scans needed rows)
// SELECT * FROM page WHERE page_id > ? LIMIT 10;
$dbr->select( 'page', '*', [ 'page_id >' . $lastSeenId ], __METHOD__,
[ 'LIMIT' => 10, 'ORDER BY' => 'page_id' ]
);
// Good (uses proper table naming)
$dbr->select( 'page', '*', [] );
// Also good (explicit table name)
$dbr->select( $dbr->tableName( 'page' ), '*', [] );
// Avoid (hardcoding prefix)
$dbr->select( 'wiki_page', '*', [] ); // What if prefix is different?
// Good (exclude deleted revisions)
$dbr->select( 'revision', '*', [ 'rev_deleted' => 0 ] );
// Good (include all, then check in PHP)
$result = $dbr->select( 'revision', [ 'rev_id', 'rev_deleted' ], [] );
foreach ( $result as $row ) {
if ( $row->rev_deleted ) continue; // Skip deleted
// Process row
}
// Deleted flags exist on many tables:
// - revision: rev_deleted
// - archive: ar_deleted
// - comment: comment_data (for suppressed text)
// - file: img_deleted, oi_deleted
// Good (includes method name for logging)
$dbr->select( 'page', '*', [], __METHOD__ );
// Less helpful (no method context)
$dbr->select( 'page', '*', [] );
MediaWiki's normalized design means understanding how tables connect is crucial.
Key Relationships:
See: references/table-relationships.md for visual diagrams and more examples.
Some tables are deprecated and should be avoided in new code:
content table instead (part of Modular Content Representation)filerevision tableMany queries can be cached to improve performance:
// Cache a query result
$cache = MediaWikiServices::getInstance()->getMainWANObjectCache();
$key = $cache->makeKey( 'page', 'title', $title );
$row = $cache->getWithSetCallback(
$key,
3600, // Cache for 1 hour
function() use ( $dbr, $title ) {
return $dbr->selectRow( 'page', '*', [ 'page_title' => $title ] );
}
);
This skill includes detailed reference documentation:
page_is_redirect flagpage_namespace + page_title*_deleted fieldscontent not textreferences/schema-complete.md to understand the tablesreferences/table-relationships.md to see how they connectreferences/common-tables.md for examples of common queriesreferences/optimization-guide.md for performance techniquesIf you encounter queries that don't work as expected, check: