Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations. Use when dereplication returned no matches, the compound is known to be novel, or you want to solve the structure from first principles.
Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations.
This skill performs FULL Computer-Assisted Structure Elucidation (CASE) without dereplication. Use this when:
Reference: For NMR background, peak picking strategy, symmetry detection, dereplication scoring, LSD reference, and ranking interpretation, see the main skill document:
skill/SKILL.md
This skill focuses on the CASE procedure (step-by-step execution). The main skill document contains all shared domain knowledge.
lucy --version || pip install lucy-ng
lucy lsd check # Must show LSD and outlsd available
| Data | Essential? | Purpose |
|---|---|---|
| Molecular formula | YES | From user (HRMS) |
| 13C spectrum | YES | All carbon positions |
| HSQC | YES | Direct C-H correlations |
| HMBC | YES | Long-range correlations |
| DEPT-135 | Recommended | Multiplicities (CH, CH2, CH3) |
| COSY | Optional | H-H correlations |
Supervisor integration: When running under supervisor control, write CASE-PROGRESS.md after each LSD iteration (see Step 7c). This enables the supervisor to detect loops and provide diagnostic guidance.
mkdir -p analysis
Document all steps in analysis/ as you proceed.
Always ask the user:
"Please provide the molecular formula for this unknown compound (typically from HRMS)."
Calculate key values from formula:
for dir in */; do
if [ -f "$dir/acqus" ]; then
nuc=$(grep "##\$NUC1=" "$dir/acqus" | head -1)
pp=$(grep "##\$PULPROG=" "$dir/acqus" | head -1)
echo "Exp $dir: $nuc | $pp"
fi
done
Map experiments:
zg*zgdc*, zgpg*dept*hsqc*hmbc*cosy*Compare expected vs observed signals:
lucy analyze symmetry <data_dir> <formula>
Or manually:
Document:
## Symmetry Analysis
- Expected carbons (from formula): X
- Observed 13C signals: Y
- Interpretation: [No symmetry / C2 symmetry / etc.]
lucy pick 1d <13c_experiment>
Or from peaklist.xml if binary data is poor:
<Peak1D F1="..."/> tagsDocument all peaks with proposed assignments:
| # | Shift (ppm) | Type (if known) |
|---|---|---|
| 1 | 187.8 | Carbonyl? |
| 2 | 152.5 | C-N? |
| ... | ... | ... |
Get raw HSQC peaks:
lucy pick hsqc <hsqc_exp> --format json
Apply DEPT-guided filtering (see skill/SKILL.md Section 3):
lucy pick 1d <dept135_exp> --format json
Document:
Get raw HMBC peaks:
lucy pick hmbc <hmbc_exp> --format json
Apply cross-validation filtering (see skill/SKILL.md Section 3):
Document all HMBC correlations:
| Carbon (ppm) | Proton (ppm) | Notes |
|---|---|---|
| 187.8 | 7.5 | Carbonyl to aromatic H |
| ... | ... | ... |
Write the LSD file directly using skill knowledge:
Reference:
Build the LSD file manually:
; LSD input for <FORMULA>
; Atom definitions (MULT atom# element hybridization H-count)
MULT 1 C 2 0 ; Carbonyl carbon, sp2, 0H (quaternary)
MULT 2 C 2 1 ; Aromatic CH, sp2, 1H
MULT 3 N 3 1 ; Amine nitrogen, sp3, 1H (NH)
MULT 4 O 2 0 ; Carbonyl oxygen, sp2, 0H
...
; HSQC correlations (MUST come before HMBC)
HSQC 2 2 ; C2 has H2 attached
HSQC 5 5 ; C5 has H5 attached
...
; HMBC correlations
HMBC 1 2 ; C1 correlates to H2
HMBC 1 5 ; C1 correlates to H5
...
; Heteroatom constraints (optional but helpful)
BOND 1 4 ; C1 bonded to O4 (carbonyl)
Critical checks before running:
ELIM command on first runCRITICAL: Do NOT add all HMBC correlations at once!
Adding too many HMBC correlations often leads to 0 solutions (over-constrained) due to:
Strategy: Gradually add HMBC correlations until solutions reach a minimum > 0
Workflow example:
# Start with base correlations
cp compound_base.lsd compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "47 solutions found"
# Add HMBC 4 9
echo "HMBC 4 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "12 solutions found"
# Add HMBC 5 9
echo "HMBC 5 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "1 solution found" ✓ IDEAL!
# If we add one more and get 0 solutions, remove it!
Tracking table (recommended):
| HMBC Count | Correlations Added | Solutions | Action |
|---|---|---|---|
| 5 | Base set | 47 | Add more |
| 7 | + C1→H7, C2→H10 | 12 | Add more |
| 8 | + C8→H10 | 6 | Add more |
| 9 | + C6→H9 | 6 | Add more |
| 10 | + C4→H9 | 5 | Add more |
| 11 | + C5→H9 | 1 | STOP - Ideal! |
| 12 | + C3→H4 | 0 | Remove last |
Key principles:
Prioritize correlations by:
After EVERY LSD iteration (including the baseline run), append an iteration entry to CASE-PROGRESS.md in the compound's working directory. This file is read by the supervisor agent to monitor progress, detect loops, and provide diagnostic guidance.
First iteration: Create the file with header section:
# CASE Progress Log
**Compound:** <compound_path>
**Formula:** <molecular_formula>
**Started:** <timestamp>
Each iteration: Append a new section:
---
## Iteration N: <brief description>
**Time:** <timestamp>
**LSD file:** <filename>.lsd
**Solution count:** <count>
**Constraints added:**
- <constraint and reasoning>
**Constraints removed:**
- <constraint and reasoning> (or "None")
**Why:** <natural language explanation of strategy for this iteration>
**Constraint effectiveness:** <% reduction from previous, or "baseline", or "over-constrained (0 solutions)">
**Confidence:** <qualitative assessment: too many solutions / converging / stuck / etc.>
**HMBC correlations used:** X/Y
**Notes:**
- sp2 count: <N> (<even/odd>) <check/warning>
- H budget: <matches/mismatch>
- <other observations>
Rules:
For the complete format specification with examples, see skill/supervisor/SKILL.md Section 7.
lucy lsd run compound.lsd
Or directly:
LSD compound.lsd
For solution count interpretation and troubleshooting, see skill/SKILL.md Section 5 (LSD Reference).
outlsd 5 < compound.sol > solutions.smi
lucy lsd rank solutions.smi --spectrum <13c_exp>
# Or with shift list:
lucy lsd rank solutions.smi --shifts "187.8,152.5,135.7,..."
For MAE score interpretation and ranking guidance, see skill/SKILL.md Section 6 (Ranking and Prediction).
After solving, use lucy lsd analyze to compute the actual J-coupling path lengths for all HMBC correlations:
lucy lsd analyze compound.sol compound.lsd
This command:
Example output:
Solution 2: 9× ²J 11× ³J (all ²J/³J, no ELIM needed)
HMBC Correlations:
-------------------------------------------------------
C# H# C (ppm) Path J-coupling
-------------------------------------------------------
1 7 131.29 1 ²J_CH
1 10 131.29 1 ²J_CH
2 7 124.71 2 ³J_CH
...
Interpretation:
JSON output for PDF generation:
lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json
Generate structure images with LSD atom numbering:
lucy lsd analyze compound.sol compound.lsd --draw solution_{n}.png
This generates a 2D structure image where each atom is labeled with its LSD index (C1, C2, ..., O11), making the HMBC table directly readable against the structure.
Generate publication-quality correlation diagrams with arrows:
For visualizing HMBC correlations directly on the structure with curved arrows and J-coupling labels:
# Generate correlation diagram with atom numbers and J-coupling labels
lucy visualize correlations \
--sol compound.sol \
--lsd-file compound.lsd \
--show-atom-numbers \
--show-j-coupling \
-o analysis/hmbc_diagram.svg
This creates a publication-quality SVG diagram showing:
Include the correlation diagram next to the HMBC table in your PDF report - it provides an immediate visual representation of how the HMBC correlations connect the molecular fragments.
## CASE Results
**Molecular Formula:** [formula]
**Degree of Unsaturation:** [DBE]
### Data Used
- 13C: [X] signals
- HSQC: [Y] correlations (Z protonated carbons)
- HMBC: [N] correlations
- Symmetry: [description]
### LSD Results
- Solutions found: [count]
- ELIM used: [Yes/No]
### Top Candidates
**Rank 1:** MAE = X.XX ppm ([Quality])
[SMILES]
- Key features: [description]
**Rank 2:** MAE = X.XX ppm ([Quality])
[SMILES]
- Differs from #1 in: [description]
### Confidence Assessment
[High/Medium/Low] - [reasoning]
### Recommendation
[Final structure proposal or need for additional data]
Always generate a PDF report with rendered structures and formatted tables at the end of every CASE analysis.
# Generate PDF report with structures and tables
python3 << 'EOF'
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle
from reportlab.lib.enums import TA_CENTER
import io
# Create the PDF document
doc = SimpleDocTemplate(
"analysis/CASE_Report.pdf",
pagesize=A4,
rightMargin=0.75*inch,
leftMargin=0.75*inch,
topMargin=0.75*inch,
bottomMargin=0.75*inch
)
# Styles
styles = getSampleStyleSheet()
title_style = ParagraphStyle('CustomTitle', parent=styles['Heading1'],
fontSize=20, spaceAfter=30, alignment=TA_CENTER)
heading_style = ParagraphStyle('CustomHeading', parent=styles['Heading2'],
fontSize=14, spaceBefore=20, spaceAfter=10)
normal_style = styles['Normal']
story = []
# Title
story.append(Paragraph("CASE Structure Elucidation Report", title_style))
story.append(Spacer(1, 0.25*inch))
# Summary table
story.append(Paragraph("Summary", heading_style))
summary_data = [
["Molecular Formula", "<FORMULA>"],
["Molecular Weight", "<MW> Da"],
["Degree of Unsaturation (DBE)", "<DBE>"],
["LSD Solutions Found", "<COUNT>"],
]
summary_table = Table(summary_data, colWidths=[2.5*inch, 3*inch])
summary_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 8),
]))
story.append(summary_table)
story.append(Spacer(1, 0.3*inch))
# 13C NMR Data Table
story.append(Paragraph("13C NMR Data", heading_style))
c13_data = [
["#", "Shift (ppm)", "Multiplicity", "Assignment"],
# Add rows for each carbon signal:
# ["1", "131.29", "C (quat)", "=C< olefinic"],
]
c13_table = Table(c13_data, colWidths=[0.4*inch, 1.2*inch, 1.2*inch, 2.5*inch])
c13_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(c13_table)
story.append(Spacer(1, 0.3*inch))
# Structure rendering function
def smiles_to_image(smiles, size=(400, 300)):
mol = Chem.MolFromSmiles(smiles)
AllChem.Compute2DCoords(mol)
img = Draw.MolToImage(mol, size=size)
img_buffer = io.BytesIO()
img.save(img_buffer, format='PNG')
img_buffer.seek(0)
return img_buffer
# For each candidate structure:
story.append(Paragraph("Structure Candidates", heading_style))
# candidate_smiles = ["SMILES1", "SMILES2", ...]
# for i, smi in enumerate(candidate_smiles, 1):
# story.append(Paragraph(f"<b>Rank {i}:</b> {name}", normal_style))
# story.append(Paragraph(f"MAE: {mae} ppm | SMILES: {smi}", normal_style))
# img = smiles_to_image(smi)
# story.append(Image(img, width=3*inch, height=2.25*inch))
# story.append(Spacer(1, 0.2*inch))
# Ranking comparison table
story.append(Paragraph("Ranking Comparison", heading_style))
rank_data = [
["Rank", "Structure", "MAE (ppm)", "Quality", "Within 3ppm"],
# ["1", "Name", "2.69", "Good", "6/10"],
]
rank_table = Table(rank_data, colWidths=[0.5*inch, 2.5*inch, 1*inch, 0.8*inch, 1*inch])
rank_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(rank_table)
# Build PDF
doc.build(story)
print("PDF report generated: analysis/CASE_Report.pdf")
EOF
CRITICAL: Use data from the successful analysis
Do NOT re-pick peaks for the PDF. Extract all data directly from the LSD file that produced successful solutions. The LSD file contains the exact peaks and correlations that were used.
The PDF report must include complete tables of ALL data used:
Summary table — formula, MW, DBE, solution count, recommended structure
Complete 13C NMR table — ALL carbons used in the LSD file:
Complete HSQC table — ALL direct C-H correlations from the LSD file:
HMBC Correlation Diagram (placed ABOVE the HMBC table):
lucy visualize correlations --sol compound.sol --lsd-file compound.lsd \
--show-atom-numbers -o analysis/hmbc_diagram.svg
import cairosvg
cairosvg.svg2png(url='analysis/hmbc_diagram.svg',
write_to='analysis/hmbc_diagram.png', scale=2.0)
Complete HMBC table (placed BELOW the diagram) — ALL long-range correlations from the LSD file:
lucy lsd analyze to calculate path lengths, do NOT guess!
lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json
This parses the OUTLSD section and uses BFS to compute actual bond distances.Paragraph() objects for cells with super/subscript. Use <super> and <sub> tags.Excluded signals section — Document WHY certain peaks were not used:
Structure candidates — Rendered 2D images (RDKit) with SMILES and MAE scores
Ranking comparison table — All candidates with MAE, quality rating, carbons within tolerance
Recommended structure — Larger image with SMILES and InChI, plus reasoning if not Rank #1
Required dependencies:
CRITICAL: Install missing dependencies - do NOT fall back to suboptimal solutions (like text placeholders instead of images).
# Core PDF generation (RDKit should already be installed)
pip install reportlab
# SVG to PNG conversion for embedding diagrams in PDF
pip install cairosvg
# cairosvg requires the Cairo system library - install if not present:
# macOS:
brew install cairo
# Then run Python with the library path if needed:
# DYLD_LIBRARY_PATH=/opt/homebrew/opt/cairo/lib:$DYLD_LIBRARY_PATH python3 script.py
# Linux (Debian/Ubuntu):
# sudo apt-get install libcairo2-dev
# Linux (RHEL/CentOS):
# sudo yum install cairo-devel
Before generating the PDF, verify all dependencies are working:
# Test imports - if any fail, install the missing package
from reportlab.platypus import SimpleDocTemplate
from rdkit import Chem
from rdkit.Chem import Draw
import cairosvg # For SVG→PNG conversion
If cairosvg import fails with "no library called cairo", install the system Cairo library as shown above.
For detailed troubleshooting guidance, see skill/SKILL.md Section 5 (LSD Reference) and Section 6 (Ranking and Prediction).
Quick checklist for 0 solutions: sp2 count is EVEN, hydrogen count matches formula, HMBC correlations correct, only then try ELIM 1 0.
# Full workflow
mkdir -p analysis
lucy pick 1d ./2 # 13C peaks
lucy pick hsqc ./5 ./3 --dept90 ./4 # HSQC + multiplicities
lucy pick hmbc ./6 ./2 ./5 --dept135 ./3 # HMBC correlations
lucy lsd generate . C16H10N2O2 -o analysis/compound.lsd # Generate LSD input
cd analysis && LSD compound.lsd # Solve
outlsd 5 < compound.sol > solutions.smi # Convert to SMILES
lucy lsd rank solutions.smi --spectrum ../2 # Rank by 13C prediction
lucy lsd analyze compound.sol compound.lsd --draw structure_{n}.png # Analyze with numbered structures
# Generate PDF report (see Step 13 for full template)
IMPORTANT: Always generate a PDF report at the end of every CASE analysis (Step 13).