Name: Mediawiki Database Tables
Author: santhoshtr

搵技能.../

Mediawiki Database Tables | Skills Pool

// For READ operations (use replicas)
$services = MediaWikiServices::getInstance();
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );

// For WRITE operations (use primary)
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );

// Simple SELECT with WHERE and LIMIT
$result = $dbr->select(
    'page',                           // table
    [ 'page_id', 'page_title' ],     // fields to select
    [ 'page_namespace' => 0 ],       // WHERE conditions
    __METHOD__,                       // method name for logging
    [ 'LIMIT' => 10 ]                // options
);

// Process results
foreach ( $result as $row ) {
    echo $row->page_title . "\n";
}

// Get pages with their latest revision timestamp
$result = $dbr->select(
    [ 'page', 'revision' ],
    [ 'page_id', 'page_title', 'rev_timestamp' ],
    [ 'page_namespace' => 0 ],
    __METHOD__,
    [],
    [ 'revision' => [ 'LEFT JOIN', 'page_id = rev_page AND rev_id = page_latest' ] ]
);

$dbw->insert(
    'page',
    [
        'page_namespace' => 0,
        'page_title' => 'New_Page',
        'page_is_redirect' => 0,
        'page_latest' => 1,
        'page_len' => 100,
        'page_random' => wfRandom()
    ],
    __METHOD__
);

Data type: A wiki page
Table: page
Fields needed: page_id, page_namespace, page_title
Index to use: page_name_title (unique index on namespace + title)

Why: Pages are uniquely identified by namespace + title, not title alone.
The page_name_title index makes this lookup very fast.

$result = $dbr->select(
    'page',
    [ 'page_id', 'page_title', 'page_touched' ],  // Only needed columns
    [
        'page_namespace' => 0,                      // Use indexed column
        'page_touched >= ' . $dbr->addQuotes(
            wfTimestamp( TS_MW, time() - 86400 )  // Last 24 hours
        )
    ],
    __METHOD__,
    [
        'ORDER BY' => 'page_touched DESC',
        'LIMIT' => 100                             // Always limit
    ]
);

Are you reading data?
├─ Yes, will read immediately after writing in same request?
│  └─ Use PRIMARY (replica lag consideration)
├─ Yes, just reading without writing?
│  └─ Use REPLICA (DB_REPLICA)
└─ No, you're writing/updating?
   └─ Use PRIMARY (DB_PRIMARY)

In a transaction?
└─ Always use PRIMARY (keep transaction on same connection)

// Correct: Read from replica
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$row = $dbr->selectRow( 'page', '*', [ 'page_id' => 1 ] );

// Correct: Write to primary
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $pageData );

// Correct: Read immediately after write (same connection)
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $pageData );
$newRow = $dbw->selectRow( 'page', '*', [ 'page_id' => $newId ] );

// WRONG: Writing to replica
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$dbr->insert( 'page', $pageData );  // ERROR!

// WRONG: Assuming immediate replica consistency
$dbw->insert( 'page', $pageData );
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$row = $dbr->selectRow( 'page', '*', [ 'page_id' => $newId ] );  // May not exist yet!

// Join: user → actor → revision → page
$result = $dbr->select(
    [ 'actor', 'revision', 'page' ],
    [ 'actor_name', 'rev_timestamp', 'page_namespace', 'page_title' ],
    [ 'actor_name' => 'Example' ],
    __METHOD__,
    [ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 50 ],
    [
        'revision' => [ 'INNER JOIN', 'actor_id = rev_actor' ],
        'page' => [ 'INNER JOIN', 'rev_page = page_id' ]
    ]
);

// Run EXPLAIN on your query
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );

// Get the query
$query = $dbr->selectQueryBuilder()
    ->select( [ 'page_id', 'page_title' ] )
    ->from( 'page' )
    ->where( [ 'page_namespace' => 0, 'page_is_redirect' => 0 ] )
    ->limit( 100 )
    ->getSQL();

// Run EXPLAIN on it
$explainResult = $dbr->query( "EXPLAIN " . $query );

// Check the type column:
// - "const" = one row (best)
// - "ref" = index lookup (good)
// - "range" = index range scan (okay)
// - "ALL" = full table scan (bad)

$page = $dbr->selectRow(
    'page',
    [ 'page_id', 'page_latest', 'page_len' ],
    [ 'page_namespace' => 0, 'page_title' => 'Main_Page' ],
    __METHOD__
);

$revisions = $dbr->select(
    [ 'revision', 'actor' ],
    [ 'rev_id', 'rev_timestamp', 'actor_name' ],
    [ 'rev_page' => $pageId ],
    __METHOD__,
    [ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 20 ],
    [ 'actor' => [ 'JOIN', 'rev_actor = actor_id' ] ]
);

$contributions = $dbr->select(
    [ 'actor', 'revision', 'page' ],
    [ 'rev_timestamp', 'page_namespace', 'page_title', 'rev_minor_edit' ],
    [ 'actor_name' => $username ],
    __METHOD__,
    [ 'ORDER BY' => 'rev_timestamp DESC', 'LIMIT' => 50 ],
    [
        'revision' => [ 'JOIN', 'actor_id = rev_actor' ],
        'page' => [ 'JOIN', 'rev_page = page_id' ]
    ]
);

$pages = $dbr->select(
    [ 'categorylinks', 'page' ],
    [ 'page_id', 'page_namespace', 'page_title' ],
    [ 'cl_to' => $categoryTitle ],
    __METHOD__,
    [ 'LIMIT' => 100 ],
    [ 'page' => [ 'JOIN', 'cl_from = page_id' ] ]
);

$links = $dbr->select(
    [ 'pagelinks', 'page' ],
    [ 'page_namespace', 'page_title' ],
    [],
    __METHOD__,
    [ 'LIMIT' => 100 ],
    [
        'page' => [ 'JOIN', 'pl_from = page_id' ],
        // Filter by target - use linktarget table
    ]
);

// Good
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$result = $dbr->select( 'page', '*', [] );

// Avoid
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$result = $dbw->select( 'page', '*', [] );  // Unnecessary primary load

// Good
$dbw = $services->getDBLoadBalancer()->getConnection( DB_PRIMARY );
$dbw->insert( 'page', $data );

// Avoid
$dbr = $services->getDBLoadBalancer()->getConnection( DB_REPLICA );
$dbr->insert( 'page', $data );  // Will fail - replicas are read-only

// Good
$dbr->select( 'page', [ 'page_id', 'page_title' ], [] );

// Avoid
$dbr->select( 'page', '*', [] );  // Wastes memory and bandwidth

// Avoid
$dbr->select( 'page', [ '*' ], [] );  // Same as above

// Good (uses index)
$dbr->select( 'page', '*', [ 'page_namespace' => 0, 'page_title' => 'Test' ] );

// Avoid (no index on page_touched for this query)
$dbr->select( 'page', '*', [ 'page_touched > ' . time() - 86400 ] );

// Better (add index or avoid condition)
$dbr->select( 'page', '*', [ 'page_is_redirect' => 0 ], __METHOD__, [ 'LIMIT' => 100 ] );

// Good (safe limit)
$dbr->select( 'page', '*', [], __METHOD__, [ 'LIMIT' => 100 ] );

// Avoid (no limit - could get millions of rows)
$dbr->select( 'page', '*', [] );

// Avoid (very large limit)
$dbr->select( 'page', '*', [], __METHOD__, [ 'LIMIT' => 1000000 ] );

// Good (uses index)
$dbr->select( 'revision', '*', [ 'rev_timestamp >= ' . $cutoff ] );

// Avoid (function prevents index usage)
$dbr->select( 'revision', '*', [ 'YEAR(rev_timestamp) = 2024' ] );

// Good alternative
$start = wfTimestamp( TS_MW, mktime( 0, 0, 0, 1, 1, 2024 ) );
$end = wfTimestamp( TS_MW, mktime( 0, 0, 0, 1, 1, 2025 ) );
$dbr->select( 'revision', '*', [
    'rev_timestamp >= ' . $start,
    'rev_timestamp < ' . $end
] );

// Slow (OFFSET scans all rows up to the offset)
// SELECT * FROM page LIMIT 10 OFFSET 5000;  // Scans 5010 rows!
$dbr->select( 'page', '*', [], __METHOD__,
    [ 'LIMIT' => 10, 'OFFSET' => 5000 ]
);

// Fast (keyset pagination - only scans needed rows)
// SELECT * FROM page WHERE page_id > ? LIMIT 10;
$dbr->select( 'page', '*', [ 'page_id >' . $lastSeenId ], __METHOD__,
    [ 'LIMIT' => 10, 'ORDER BY' => 'page_id' ]
);

// Good (uses proper table naming)
$dbr->select( 'page', '*', [] );

// Also good (explicit table name)
$dbr->select( $dbr->tableName( 'page' ), '*', [] );

// Avoid (hardcoding prefix)
$dbr->select( 'wiki_page', '*', [] );  // What if prefix is different?

// Good (exclude deleted revisions)
$dbr->select( 'revision', '*', [ 'rev_deleted' => 0 ] );

// Good (include all, then check in PHP)
$result = $dbr->select( 'revision', [ 'rev_id', 'rev_deleted' ], [] );
foreach ( $result as $row ) {
    if ( $row->rev_deleted ) continue;  // Skip deleted
    // Process row
}

// Deleted flags exist on many tables:
// - revision: rev_deleted
// - archive: ar_deleted
// - comment: comment_data (for suppressed text)
// - file: img_deleted, oi_deleted

// Good (includes method name for logging)
$dbr->select( 'page', '*', [], __METHOD__ );

// Less helpful (no method context)
$dbr->select( 'page', '*', [] );

// Cache a query result
$cache = MediaWikiServices::getInstance()->getMainWANObjectCache();
$key = $cache->makeKey( 'page', 'title', $title );

$row = $cache->getWithSetCallback(
    $key,
    3600,  // Cache for 1 hour
    function() use ( $dbr, $title ) {
        return $dbr->selectRow( 'page', '*', [ 'page_title' => $title ] );
    }
);

Mediawiki Database Tables

What You'll Learn

When to Use This Skill

Mediawiki Database Tables

What You'll Learn

When to Use This Skill

Who This Skill Is For

Quick Start

Get a Database Connection

Basic Query Pattern

Join Example

Insert Example

Table Organization

Core Content

User & Authentication

Links & References

Files & Media

Logging & Changes

Metadata & Properties

Search & Performance

User Management

Blocks & Restrictions

Comments & Text

System & Configuration

Core Workflows

Workflow 1: Understanding Table Structure

Workflow 2: Writing Optimized SELECT Queries

Workflow 3: Choosing Replica vs Primary Database

Workflow 4: Joining Tables Correctly

Workflow 5: Analyzing Query Performance

Workflow 6: Common Query Patterns

Critical Best Practices

1. Use Replicas for Reads

2. Use Primary for Writes

3. SELECT Specific Columns

4. Use Indexed Columns in WHERE

5. Use LIMIT

6. Avoid Functions on Indexed Columns

7. Use Keyset Pagination, Not OFFSET

8. Never Assume Table Prefixes

9. Check for Deleted/Suppressed Content

10. Always Include METHOD in Queries

Understanding Table Relationships

Deprecated Tables

Caching

Reference Files

Common Pitfalls

Next Steps

Additional Resources

Questions?

Notion

Feishu Wiki

Gemini

Obsidian Vault Maintainer

Openclaw Pr Maintainer

Wiki Maintainer