Safely remove intermediate files from completed research sessions while preserving important data
Remove intermediate files created during research workflow while preserving all important data.
Core principle: Conservative cleanup with user confirmation. Never delete anything important.
Use this skill when:
When NOT to use:
NEVER delete these (protected list):
Core outputs:
SUMMARY.md - Enhanced findings with methodologyrelevant-papers.json - Filtered relevant paperspapers-reviewed.json - Complete screening historypapers/ directory - All PDFs and supplementary filescitations/citation-graph.json - Citation relationshipsMethodology documentation:
screening-criteria.json - Rubric definition (if exists)test-set.json - Rubric validation papers (if exists)abstracts-cache.json - Cached abstracts for re-screening (if exists)rubric-changelog.md - Rubric version history (if exists)Auxiliary documentation (if exists):
README.md - Project overviewTOP_PRIORITY_PAPERS.md - Curated priority listevaluated-papers.json - Rich structured dataProject configuration:
.claude/ directory - Permissions and settings*.py helper scripts that were created - Keep for reproducibilityCandidates for removal (with confirmation):
Intermediate search results:
initial-search-results.json - Raw PubMed results before screening
Temporary files:
*.tmp files*.swp files (vim swap files).DS_Store (macOS)__pycache__/ (Python cache)*.pyc (Python compiled)Log files:
*.log filesdebug-*.txt filescd research-sessions/YYYY-MM-DD-description/
# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh
Identify files by category:
Show what will be deleted:
🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
✅ SUMMARY.md (45 KB)
✅ relevant-papers.json (12 KB)
✅ papers-reviewed.json (28 KB)
✅ papers/ (14 PDFs, 32 MB)
✅ citations/citation-graph.json (5 KB)
✅ screening-criteria.json (2 KB)
✅ abstracts-cache.json (156 KB)
Files that CAN be removed (intermediate):
🗑️ initial-search-results.json (8 KB) - Raw PubMed results
🗑️ .DS_Store (6 KB) - macOS metadata
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)
Options:
y - Delete intermediate filesn - Cancel cleanup, keep everythingreview - Show contents of each file before decidingBefore deleting ANY file:
Example confirmation:
About to delete:
- initial-search-results.json (8 KB)
This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.
Confirm deletion? (y/n)
Delete confirmed files:
# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/
# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/
# Or use rm if user confirms
rm initial-search-results.json
Report results:
✅ Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept
After cleanup, verify critical files:
# Check core files exist
test -f SUMMARY.md && echo "✓ SUMMARY.md"
test -f relevant-papers.json && echo "✓ relevant-papers.json"
test -f papers-reviewed.json && echo "✓ papers-reviewed.json"
test -d papers && echo "✓ papers/ directory"
# Verify JSON files are valid
jq empty relevant-papers.json && echo "✓ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "✓ papers-reviewed.json valid JSON"
Report to user:
✅ Integrity check passed
- All core files present
- All JSON files valid
- All PDFs intact
If abstracts-cache.json is very large (>100 MB):
⚠️ abstracts-cache.json is 256 MB
This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)
Choice? (1/2/3)
If user chooses compress:
gzip abstracts-cache.json
# Creates abstracts-cache.json.gz
echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"
If user created helper scripts during research:
📝 Found helper scripts:
- screen_papers.py (created for batch screening)
- deep_dive_papers.py (created for data extraction)
These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section
Keep scripts? (y/n)
If cleaning up multiple sessions:
# Find all research sessions
find research-sessions/ -maxdepth 1 -type d
# For each session:
for session in research-sessions/*/; do
echo "Analyzing: $session"
# Run cleanup analysis
done
Ask user:
Found 5 completed research sessions.
Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean
Maintain hardcoded list of patterns to NEVER delete:
PROTECTED_PATTERNS = [
'SUMMARY.md',
'relevant-papers.json',
'papers-reviewed.json',
'papers/*.pdf',
'papers/*.zip',
'citations/citation-graph.json',
'screening-criteria.json',
'test-set.json',
'abstracts-cache.json',
'rubric-changelog.md',
'README.md',
'TOP_PRIORITY_PAPERS.md',
'evaluated-papers.json',
'*.py', # Helper scripts
'.claude/*', # Project settings
]
Before deleting any file:
def is_protected(filepath):
"""Check if file matches any protected pattern"""
for pattern in PROTECTED_PATTERNS:
if fnmatch(filepath, pattern):
return True
return False
# Never delete protected files
if is_protected(file_to_delete):
print(f"⚠️ ERROR: {file_to_delete} is protected and cannot be deleted")
return
Always show what will be deleted before doing it:
# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"
for file in $candidate_files; do
if is_safe_to_delete "$file"; then
echo "Would delete: $file ($(du -h $file | cut -f1))"
fi
done
echo ""
echo "Proceed with actual deletion? (y/n)"
After answering-research-questions workflow:
Add to answering-research-questions Phase 8:
### Optional: Cleanup
After reviewing outputs, optionally clean up intermediate files:
"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."
If yes: Use `cleaning-up-research-sessions` skill
Deleting papers-reviewed.json: This is the deduplication database - NEVER delete → Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening → Ask user, default to keep Deleting helper scripts: Important for reproducibility → Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan → Always show dry run first Deleting without confirmation: Too risky → Always ask for final confirmation Not verifying after cleanup: Files could be corrupted → Run integrity check after deletion
| Action | Command |
|---|---|
| List all files with sizes | find . -type f -exec ls -lh {} \; | sort -rh |
| Check if JSON valid | jq empty file.json |
| Move to trash (macOS) | mv file ~/.Trash/ |
| Move to trash (Linux) | mv file ~/.local/share/Trash/files/ |
| Compress large file | gzip file |
| Calculate folder size | du -sh . |
Cleanup successful when:
User: "Can you clean up the research session folder?"
Agent: "I'll analyze the research session and show you what can be safely cleaned up."
[Analyzes folder]
Agent: "🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
✅ SUMMARY.md (45 KB) - Final findings
✅ relevant-papers.json (12 KB) - Filtered results
✅ papers-reviewed.json (28 KB) - Complete history
✅ papers/ (14 PDFs, 32 MB) - Downloaded papers
✅ citations/citation-graph.json (5 KB) - Citation network
✅ screening-criteria.json (2 KB) - Rubric definition
✅ abstracts-cache.json (156 KB) - Cached abstracts
Files that CAN be removed:
🗑️ initial-search-results.json (8 KB) - Raw PubMed results
→ Data is preserved in papers-reviewed.json
🗑️ .DS_Store (6 KB) - macOS metadata
→ Not needed
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)"
User: "y"
Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
All protected files will be preserved. Confirm? (y/n)"
User: "y"
Agent: [Deletes files]
"✅ Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Running integrity check...
✅ All core files intact
✅ All JSON files valid
✅ All 14 PDFs present
Research session is ready for archiving or sharing."
After cleanup: