Perform apartment type takeoffs from construction GA plan PDFs. Extracts unit IDs and type codes from the PDF text layer, builds a full apartment register spreadsheet (.xlsx), and produces a takeoff summary with counts by type, building, and level. Use this skill whenever the user asks for an apartment takeoff, unit schedule, apartment register, unit count, dwelling mix, accommodation schedule, or wants to know how many apartments are on each level. Also triggers on "how many units", "apartment types", "dwelling schedule", "unit mix", "bedroom mix", "do a takeoff", or any request to count/catalogue apartments from architectural floor plans. Works with any residential project where GA plan PDFs have unit IDs in the text layer (e.g., BX.LL.UU format). Chains into drawing-markup skill for annotated PDF output. Requires PyMuPDF (fitz) and openpyxl.
Extract apartment data from construction GA plan PDFs, build a full register, and produce a takeoff summary. This skill reads the PDF text layer — not the visual render — which is faster and more accurate.
fitz): pip install PyMuPDFpip install openpyxlbrew install poppler — used for quick text extractionList all PDFs in the input folder. Read filenames to understand:
Use pdftotext for quick extraction of all unit IDs:
for f in /path/to/pdfs/*.pdf; do
echo "=== $(basename "$f") ==="
pdftotext -layout "$f" - | grep -oE 'B[1-4]\.\d+\.\d+' | sort -u
echo
done
This gives exact unit IDs per level per building. Adjust the regex pattern if the project uses a different naming convention.
Use pdftotext -raw to preserve spatial proximity between unit IDs and their type codes:
for f in /path/to/pdfs/*.pdf; do
echo "=== $(basename "$f") ==="
pdftotext -raw "$f" - | grep -E '(B[1-4]\.\d+\.\d+|[A-Z]*\.?[1-3]B[12]?B)' | paste - -
echo
done
The raw extraction pairs each unit ID with the type code that appears directly after it in the text stream (which is spatially below it on the drawing).
Common type code conventions in Australian residential:
| Code Pattern | Meaning |
|---|---|
1B1B | 1 Bedroom, 1 Bathroom |
2B1B | 2 Bedroom, 1 Bathroom |
2B2B | 2 Bedroom, 2 Bathroom |
3B1B1P | 3 Bedroom, 1 Bathroom, 1 Powder Room |
4B2B | 4 Bedroom, 2 Bathroom |
Prefixes:
DDA. — Accessible / DDA compliantF. — Facade variantTH. — Townhouse.M — Mirror image layout-NN — Variant numberClassification function:
def get_bedrooms(type_code):
clean = type_code.replace('DDA.','').replace('F.','').replace('TH.','')
base = clean.split('-')[0]
if '4B' in base: return 4
if '3B' in base: return 3
if '2B' in base: return 2
if '1B' in base: return 1
return 0
Many drawings represent multiple identical levels (e.g., "LEVEL 07-11"). The text layer only contains one set of unit IDs (for the base level), but the same apartments repeat on each level in the range.
When building the register, expand typical floors:
Create an xlsx with openpyxl containing:
Sheet 1: Apartment Register (every unit, one row each)
| Column | Content |
|---|---|
| # | Sequential number |
| Unit ID | e.g., B1.04.01 |
| Building | B1, B2, B3, B4 |
| Level | 01, 02, GL, etc. |
| Bedrooms | 1, 2, 3, 4 |
| Bathrooms | 1, 2 |
| Powder Room | 0, 1 |
| Type Code | Full type code |
| DDA Accessible | Yes / blank |
| Townhouse | Yes / blank |
| Apartment Class | "1 Bed", "2 Bed", etc. |
Sheet 2: Type Summary
Sheet 3: Type Codes
Output a text summary to the user showing:
After building the register, run a systematic QA audit. This is not optional — the initial extraction always has errors. The audit catches them before the user sees the output.
8a. Duplicate check
# Every unit ID must appear exactly once
duplicates = df[df.duplicated(subset='Unit ID', keep=False)]
Flag and remove any duplicates. Prefer the entry with a matched type code over one without.
8b. Gap check
For each building/level combination, check that unit numbers are sequential (01, 02, 03...). Missing numbers indicate a unit the text extraction missed. Go back to the PDF and re-extract that specific page with PyMuPDF get_text('dict') to find the missing unit.
8c. Typical floor consistency Typical floors (e.g., L07-11) must have identical unit counts and type distributions. If L08 has a different count than L07, something went wrong in the expansion. Fix by using the base level (L07) as the template for all levels in the range.
8d. Type code validation Every unit must have a type code. If any unit has a missing or unrecognised type:
pdftotext -raw for that specific page8e. Cross-check totals against PDF For each unique floor plan drawing, compare:
8f. Building termination check Verify that each building's units stop at the correct level:
8g. Bedroom mix sanity Flag unusual distributions for human review:
8h. Podium / non-typical level validation CRITICAL: Podium levels (typically L01-L03) often have communal rooms, amenities, retail, or breakout spaces that replace apartment positions. NEVER assume podium levels have the same unit count as typical floors above.
For each non-typical level:
Why this matters: Flemington 4CD had phantom units B3.3.06 and B3.3.07 on the podium Level 03 because the takeoff assumed 7 B3 units (same as L04+), but communal rooms occupied those positions. This inflated the total by 2.
8i. Dev Sum cross-reference (if available) If a Development Summary (Dev Sum) register exists, cross-reference per-level totals against it. The Dev Sum is the authoritative apartment count — it takes precedence over text extraction when they disagree.
8j. All-tabs verification After building the xlsx, read back EVERY tab (not just the main data tab) and verify:
8k. Fix and re-run After identifying errors:
--- QA AUDIT ---
Duplicates: 0 found
Gaps: 0 found
Typical floor consistency: PASS (all typical floors match)
Missing types: 0 units without type codes
Building termination: PASS
Total units verified: 415
Status: CLEAN
After the register passes QA, offer to run the drawing-markup skill to produce annotated PDFs with color-coded apartment badges. The type_map from this takeoff feeds directly into the markup config.
The default regex B[1-4]\.\d+\.\d+ covers the most common Australian convention. Adjust for:
APT-LL.UU or U.LL.UUB[1-9] or BLK[A-D]If the project uses different type naming:
pdftotext -raw output to identify the format1BR, 2BR, STUDIO, 1BED, 2BEDFilter out non-apartment spaces that might match the unit ID pattern:
COR, CORR, or C suffix)FS, STAIR).C, .FS, .P (plant), .S (services)The Step 8 QA audit is mandatory — never skip it, never present results as final before it passes. The audit exists because PDF text extraction is imperfect: labels get missed, type codes get mismatched, typical floor expansion can introduce errors. The audit catches these systematically rather than hoping the user spots them.
A clean audit means: