Package GIS layers into GeoParquet format with CRS metadata, Cape Town bbox in file metadata, and POPIA annotation if personal data present. Validates output with geopandas.read_parquet() smoke test. For batch processing and external tool integration.
Enable efficient large-dataset exchange between the PostGIS database, analysis tools (Python/GeoPandas), and external consumers. GeoParquet provides columnar storage with embedded CRS metadata, making it ideal for the GV Roll, cadastral parcels, and analysis result exports that are too large for GeoJSON serving.
Accept inputs:
exports/<name>.parquet)--include-popia-annotation if personal data presentimport geopandas as gpd
gdf = gpd.read_postgis(
f"SELECT * FROM {table} WHERE tenant_id = '{tenant_id}' AND ST_Within(geom, bbox)",
con=engine, geom_col='geom', crs='EPSG:4326'
)
Apply tenant_id filter (RLS enforcement at query level — Rule 4). Apply Cape Town bbox filter (Rule 9).
Add file metadata:
gdf.attrs['crs'] = 'EPSG:4326'
gdf.attrs['bbox'] = { 'west': 18.0, 'south': -34.5, 'east': 19.5, 'north': -33.0 }
gdf.attrs['source'] = '<provenance_id>'
gdf.attrs['generated_at'] = datetime.utcnow().isoformat()
If --include-popia-annotation:
Add to file metadata:
gdf.attrs['popia'] = {
'personal_data': '<list>',
'purpose': '<purpose>',
'lawful_basis': '<basis>',
'retention': '<period>'
}
Write GeoParquet:
gdf.to_parquet(output_path, index=False, compression='snappy')
Smoke test — validate output:
gdf_check = gpd.read_parquet(output_path)
assert len(gdf_check) == len(gdf), "Row count mismatch"
assert gdf_check.crs.to_epsg() == 4326, "CRS mismatch"
assert gdf_check.total_bounds[0] >= 18.0, "Bbox violation"
Report: output path, file size, row count, CRS, bbox, POPIA annotation status.
=== GEOPARQUET PACK ===
Layer: valuation_data (tenant: abc-corp)
Features: 45,832 | CRS: EPSG:4326 ✅ | Bbox: within Cape Town ✅
POPIA: annotated (personal_data: owner_name, property_value)
Output: exports/valuation-data-abc-corp-2026-03-14.parquet
File size: 8.2 MB (vs ~45 MB GeoJSON)
Smoke test: ✅ PASS (45,832 rows, EPSG:4326, bbox OK)