Name: Sonic Phoenix
Author: drajb

搵技能.../

Sonic Phoenix | Skills Pool

# Clone into your music folder (recommended) or anywhere else
cd /path/to/your/music
git clone https://github.com/drajb/sonic-phoenix.git
cd sonic-phoenix

# Create virtual environment with Python 3.12
python3.12 -m venv .venv

# Activate
source .venv/bin/activate          # macOS / Linux
# .venv\Scripts\activate           # Windows PowerShell
# .venv\Scripts\activate.bat       # Windows cmd

# Install dependencies
pip install -r requirements.txt

# REQUIRED — absolute path to the root of your music collection
MUSIC_ROOT=/path/to/your/music

# OPTIONAL — override any of these defaults:
# SORTED_ROOT=/path/to/your/music/Sorted
# DATA_DIR=/path/to/your/music/.data
# SHAZAM_CONCURRENCY=20
# ITUNES_COUNTRIES=US,GB

python config.py

<MUSIC_ROOT>/
├── sonic-phoenix/           # This repo
│   ├── config.py            # Central configuration (single source of truth)
│   ├── .env                 # Your local overrides (gitignored)
│   ├── 01A_extract_metadata.py
│   ├── 01D_shazam_all_files.py
│   ├── ...
│   └── config/
│       └── language_hints/  # Optional language classification overrides
│           └── examples/    # Templates to copy and customise
├── Sorted/                  # Pipeline output (auto-created)
│   ├── English/
│   │   ├── Adele/
│   │   │   └── 25/
│   │   │       └── Adele - Hello.mp3
│   │   └── ...
│   ├── Hindi/
│   └── <any language langdetect finds>/
├── .data/                   # Machine-generated JSON artifacts (auto-created)
│   ├── metadata_catalog.json
│   ├── shazam_final_results.json
│   ├── catalog.json
│   ├── final_catalog.json
│   └── ...
└── <your unsorted audio files>

python 01A_extract_metadata.py     # Extract ID3/FLAC tags
python 01D_shazam_all_files.py     # Acoustic fingerprint every file via Shazam
python 02A_catalog_music.py        # SHA-256 hash catalog + duplicate detection
python 02D_organize_music.py         # Sort into Language/Artist/Album hierarchy
python 03A_consolidate_by_artist.py --dry-run   # Preview artist folder merges
python 03A_consolidate_by_artist.py              # Execute merges
python 03D_titanium_resort.py      # Final structural enforcement pass
python 04I_polish_and_enrich_v6.py # Enrich tags via iTunes + embed artwork
python 05I_finalize_catalog.py     # Lock the master catalog

Script	Status	Purpose
`01A_extract_metadata.py`	CANONICAL	Reads existing ID3/FLAC/MP4 tags via Mutagen. Splits files into `tagged_files` and `untagged_files` in `metadata_catalog.json`.
`01B_shazam_identify.py`	UTILITY	Shazam-identify only the untagged files from 01A.
`01C_shazam_by_hash.py`	UTILITY	Propagate a single Shazam match to all byte-identical copies (via hash groups from Phase 2).
`01D_shazam_all_files.py`	CANONICAL	Force every audio file through Shazam regardless of existing tags. Use this when tags are unreliable. 20-way concurrent by default.
`01E_test_matching.py`	UTILITY	Spot-check a handful of files against Shazam to verify the pipeline is working.

Script	Status	Purpose
`02A_catalog_music.py`	CANONICAL	SHA-256 hash every audio file. Writes `catalog.json` with duplicate groups.
`02B_analyze_catalog.py`	UTILITY	Analyses the hash catalog and prints duplicate statistics. Read-only reporter.
`02C_organize_files.py`	UTILITY	Organises files using the hash catalog and final catalog. Alternative to 02D.
`02D_organize_music.py`	CANONICAL	The main sorter. Reads Shazam results and moves files into `<SORTED_ROOT>/<Language>/<Artist>/<Album>/`.

Script	Status	Purpose
`03A_consolidate_by_artist.py`	UTILITY	Merge fragmented artist folders (e.g. "Akon feat Eminem" into "Akon"). Supports `--dry-run`.
`03D_titanium_resort.py`	CANONICAL	Final structural enforcer. Any file not matching `Language/Artist/Album/` gets re-sorted.
`03F_reorganize_binary.py`	UTILITY	Resolve near-duplicate files via binary comparison.

Script	Status	Purpose
`04I_polish_and_enrich_v6.py`	CANONICAL	The single enrichment entry point. Cleans junk from tags (quality markers, description artifacts, miscellaneous noise), enriches via iTunes Search API (artist, title, album, release date, artwork), fetches synchronized lyrics from LrcLib, embeds cover art into ID3 tags.

Script	Status	Purpose
`06B_spotify_setup.py`	CANONICAL	OAuth handshake — run once to cache the token.
`06C_spotify_backup.py`	CANONICAL	Snapshot all existing playlists before making changes.
`06D_spotify_sync_engine.py`	CANONICAL	Creates per-language playlists mirroring your local library. Resumable.
`06E_spotify_discovery_sync.py`	CANONICAL	Cross-references local artists with Spotify listening history and generates genre-based "Essentials" playlists.

Create an app at https://developer.spotify.com/dashboard
Set Redirect URI to http://127.0.0.1:8888/callback

Add to .env:

SPOTIFY_CLIENT_ID=your_client_id
SPOTIFY_CLIENT_SECRET=your_client_secret

Run python 06B_spotify_setup.py — browser opens, approve scopes, token is cached.

config/language_hints/
├── examples/           # Copy and customise these templates
│   ├── English.json
│   ├── Hindi.json
│   └── README.md       # Full schema reference
├── Hindi.json          # Your custom hints (create from examples)
└── Spanish.json

{
  "artists": ["Arijit Singh", "A.R. Rahman", "Shreya Ghoshal"],
  "dna": ["bollywood", "bhangra", "desi"],
  "keywords": ["ishq", "dil", "pyaar"],
  "lang_codes": ["hi", "pa"]
}

Variable	Required	Default	Purpose
`MUSIC_ROOT`	Yes	Parent of repo dir	Root of your music collection
`SORTED_ROOT`	No	`<MUSIC_ROOT>/Sorted`	Destination for sorted files
`DATA_DIR`	No	`<MUSIC_ROOT>/.data`	Machine-generated JSON artifacts
`DUPLICATES_STAGING`	No	`<repo>/Duplicates_Staging`	Quarantine for suspected duplicates
`UNIDENTIFIED_DIR`	No	`<SORTED_ROOT>/Unidentified`	Files Shazam couldn't identify
`FFMPEG_BIN`	No	`<MUSIC_ROOT>/ffmpeg/bin`	Portable FFmpeg path (Windows)
`SHAZAM_CONCURRENCY`	No	`20`	Parallel Shazam calls (lower if rate-limited)
`ITUNES_COUNTRIES`	No	`US,GB`	Country codes for iTunes API rotation
`SPOTIFY_CLIENT_ID`	Phase 6 only	—	Spotify app client ID
`SPOTIFY_CLIENT_SECRET`	Phase 6 only	—	Spotify app client secret
`SPOTIFY_REDIRECT_URI`	No	`http://127.0.0.1:8888/callback`	Spotify OAuth redirect

Problem	Fix
`shazamio-core` fails to install	You're on Python 3.13+. Use Python 3.12: `python3.12 -m venv .venv`
Shazam returns HTTP 429	Lower `SHAZAM_CONCURRENCY` to 5-10 in `.env`
iTunes returns 403	Add more country codes: `ITUNES_COUNTRIES=US,GB,AU,CA,DE`
`MUSIC_ROOT does not exist`	Set `MUSIC_ROOT` in `.env` to the absolute path of your music folder
Tags still show junk after enrichment	Re-run `04I_polish_and_enrich_v6.py` — it's idempotent
Spotify 403 on playlist creation	Add your Spotify account under "Users and Access" in the Developer Dashboard (Development Mode restriction)
`langdetect` misclassifies a language	Create a hint file in `config/language_hints/` for that language
Empty artist/album folders after sorting	Run `05E_final_cleanup.py` or `05H_final_vacuum.py` to vacuum empties

Script	Purpose
`ultimate-music-manager/scripts/preflight.sh`	Validates Python 3.12, venv, dependencies, `.env`, `MUSIC_ROOT`, FFmpeg. Run before first pipeline execution.
`ultimate-music-manager/scripts/run-pipeline.sh`	Executes the happy-path sequence (Phases 1-5) with progress and timing. Supports `--skip-shazam`, `--spotify`, `--dry-run`.
`ultimate-music-manager/scripts/status.sh`	Dashboard showing file counts, language breakdown, data file status, and pipeline progress.

# Check environment is ready
bash ultimate-music-manager/scripts/preflight.sh

# Preview what the pipeline will do
bash ultimate-music-manager/scripts/run-pipeline.sh --dry-run

# Run the full pipeline
bash ultimate-music-manager/scripts/run-pipeline.sh

# Run including Spotify sync
bash ultimate-music-manager/scripts/run-pipeline.sh --spotify

# Check pipeline status at any time
bash ultimate-music-manager/scripts/status.sh

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "./ultimate-music-manager/hooks/safety-guard.sh"
      }]
    }]
  }
}

Document	Contents
`references/data-files.md`	Schema and lineage for every JSON artifact — which script writes it, which scripts read it, full data flow diagram.
`references/language-hints-guide.md`	How to create language hint files, full schema, examples for Hindi/Japanese/Spanish, tips for getting classification right.

Situation	Action
First-time setup	Environment Setup
Sort a messy music folder	Run Phases 1-5 in order
Identify unknown songs	Run Phase 1 (Shazam fingerprinting)
Find and remove duplicates	Run Phase 2 (SHA-256 catalog)
Fix bad/missing ID3 tags	Run Phase 4 (iTunes enrichment)
Sync library to Spotify	Run Phase 6 (requires Spotify app credentials)

Situation	Action
First-time setup	Environment Setup
Sort a messy music folder	Run Phases 1-5 in order
Identify unknown songs	Run Phase 1 (Shazam fingerprinting)
Find and remove duplicates	Run Phase 2 (SHA-256 catalog)
Fix bad/missing ID3 tags	Run Phase 4 (iTunes enrichment)
Sync library to Spotify	Run Phase 6 (requires Spotify app credentials)

Sonic Phoenix

Quick Reference

Sonic Phoenix

Quick Reference

Environment Setup

Prerequisites

Installation

Configuration

Directory Layout After Setup

The Pipeline

Happy Path (Minimum Viable Run)

Phase 1: Extraction and Identification

Phase 2: Cataloguing and Deduplication

Phase 3: Consolidation and Structural Cleanup

Phase 4: Metadata Enrichment

Phase 5: Finalization

Phase 6: Spotify Sync (Optional)

Language Classification

Language Hints (Optional)

Configuration Reference

Supported Audio Formats

Safety Guarantees

Troubleshooting

Scripts

Usage

Hook Integration

Setup (Claude Code)

References

Design Principles

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio