Name: Add Registry
Author: tcole333

搜索技能.../

Add Registry | Skills Pool

# Unified query tool (searches all ingested registries)
python tools/query_registry.py search "Entity Name"
python tools/query_registry.py officers "Person Name"
python tools/query_registry.py address "ADDRESS"
python tools/query_registry.py stats

# Existing ingesters
python tools/ingest_florida.py download && python tools/ingest_florida.py ingest
python tools/ingest_newyork.py search "Entity Name"
python tools/ingest_newyork.py ingest-batch "{primary_subject}" --with-filings

Table	Purpose
`registry_entities`	One row per corporate entity (name, type, status, addresses, EIN, dates)
`registry_officers`	Officers/directors/managers with titles and addresses
`registry_agents`	Registered agents with addresses
`registry_filings`	Filing/event history (amendments, annual reports, name changes, dissolutions)
`registry_name_history`	Track entity name changes over time
`registry_ingest_log`	What was ingested and when

Questions to answer:
- Does it have an API? (REST, SOAP, Socrata/SODA)
- Does it have bulk downloads? (FTP, SFTP, CSV, fixed-width)
- Is it a web portal that needs scraping? (Selenium/Playwright)
- What anti-scraping measures exist? (CAPTCHA, rate limiting, session tokens)
- What data fields are available? (name, officers, agent, addresses, filing history)
- What entity types are covered? (corp, LLC, LP, nonprofit, trust, foreign)
- Is officer/director information public? (not in Delaware)
- Are filing histories available? (with name changes, officer changes)

# Test common endpoint patterns
curl -s "https://api.example.com/search?q=test" | python3 -m json.tool | head -40

# Check for Socrata/SODA
curl -s "https://data.state.gov/api/views" | head -20

# Check for FTP/SFTP
curl -s "ftp://ftp.state.gov/" --list-only

# Use WebFetch to examine the portal's search page
# Look for: JavaScript API calls, form action URLs, XHR endpoints, Angular/React service URLs

# Write a discovery script that tries each candidate endpoint
# Record: HTTP status, response content-type, response structure
# A 200 with JSON is a confirmed endpoint; a 404 means try another path

# Connect and LIST the actual directory tree before writing parsers
# Record: directory names, file names, file sizes, sample first lines
# Only write field parsers after examining the actual file format

#!/usr/bin/env python3
"""
<State/Country> corporate registry ingester.
Feeds into registry.db via the unified schema.

Usage:
    python tools/ingest_<jurisdiction>.py search "query"
    python tools/ingest_<jurisdiction>.py ingest-entity <id>
    python tools/ingest_<jurisdiction>.py ingest-batch "query"
"""

import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent))
from query_registry import get_db, _rebuild_fts

# ... implement search, ingest-entity, ingest-batch commands
# Map source fields to unified schema:
#   source_jurisdiction = "<two-letter-code>"
#   source_id = <registry's own entity ID>
#   entity_name, entity_type, status, formation_date, etc.
#   INSERT OR REPLACE INTO registry_entities (...)
#   INSERT OR IGNORE INTO registry_officers (...)
#   INSERT OR IGNORE INTO registry_agents (...)
#   INSERT OR IGNORE INTO registry_filings (...)

Unified Field	Florida	New York	New Registry
entity_name	corp_name	current_entity_name	?
entity_type	filing_type	entity_type	?
status	status (A/I)	current_status	?
formation_date	file_date	initial_dos_filing_date	?
ein	fei_number	(not available)	?
officer_name	officer block	chairman_name	?
agent_name	ra_name	registered_agent_name	?

# Pull current investigation targets dynamically
python3 -c "
import sqlite3
db = sqlite3.connect('investigation.db')
rows = db.execute('''
    SELECT DISTINCT target_name FROM findings
    WHERE finding_type IN ('financial', 'corporate')
    UNION
    SELECT name FROM entities
    LIMIT 20
''').fetchall()
for r in rows:
    print(r[0])
"

# Verify the ingest
python tools/query_registry.py stats
python tools/query_registry.py jurisdictions

# Update CLAUDE.md data source inventory
# Update the search-all-sources skill to include registry queries

Jurisdiction	Code	Why	Access
Florida	`fl`	High-volume entity registrations	SFTP bulk (DONE)
New York	`ny`	Corporate headquarters, financial entities	SODA API (DONE)
US Virgin Islands	`vi`	Offshore-adjacent entities, trusts	Web portal (Playwright)
New Mexico	`nm`	Property-linked entities	REST API (DONE)
Panama	`pa`	Offshore shells, leaked registry data	Hybrid ICIJ+Aleph (DONE)
Delaware	`de`	Shell companies, privacy-favoring registrations	CAPTCHA (hard)
British Virgin Islands	`vg`	Offshore shells, nominee directors	Paid per-search
Bermuda	`bm`	Insurance vehicles, reinsurance structures	Limited access

Add Registry

Arguments

Context Loading

Context

Add Registry

Arguments

Context Loading

Context

Existing Registry Infrastructure

Unified Schema (registry.db)

Process

1. Research the Registry's Data Access

2. Profile the API/Portal (MANDATORY — live discovery, not guessing)

3. Build the Ingester

4. Map Fields to Unified Schema

5. Test with Investigation Targets

6. Log and Update

Priority Jurisdictions for Investigation

Sherlock

Domain Name Brainstormer

Bmad Domain Research

Claimable Postgres

Active Directory Attacks

Gws Gmail Watch