Systematic literature review and synthesis for research papers. Use when building comprehensive related-work sections, synthesizing findings across papers, or identifying research gaps. For searching individual papers use arxiv-database; for writing manuscript sections use scientific-writing.
This skill provides structured methodology for conducting systematic literature reviews, synthesizing findings across papers, and building comprehensive related-work sections. It covers search strategies, inclusion/exclusion criteria, thematic synthesis, and gap analysis.
Use this skill when:
## Review Protocol
### Research Questions
1. Primary: What approaches exist for incorporating geometric structure into attention mechanisms?
2. Secondary: How has gauge theory been applied in deep learning?
3. Secondary: What variational methods have been used for attention computation?
### Inclusion Criteria
- Peer-reviewed or reputable preprint (arXiv with >N citations)
- Published within relevant timeframe
- Addresses geometric/algebraic structure in neural networks
- Relevant to at least one: gauge theory, information geometry, VFE, attention theory
### Exclusion Criteria
- Application-only papers without methodological contribution
- Non-English publications
- Duplicate or superseded versions
### Search Databases
- arXiv (cs.LG, cs.CL, stat.ML, hep-th, math-ph)
- Google Scholar
- Semantic Scholar API
- ACL Anthology (for NLP-specific work)
def systematic_search(databases, queries, criteria):
"""Execute systematic search across databases."""
results = {
'identified': [], # All papers found
'screened': [], # After title/abstract screening
'eligible': [], # After full-text screening
'included': [], # Final included set
'excluded_reasons': {} # Track exclusion reasons
}
return results
Organize papers into themes relevant to the Gauge-Transformer:
## Thematic Structure for GL(K) Attention Paper
### Theme 1: Geometric Deep Learning
- Equivariant neural networks (Cohen & Welling, 2016; Weiler et al., 2018)
- Gauge equivariant CNNs (Cohen et al., 2019)
- E(n)-equivariant networks (Satorras et al., 2021)
- **Gap**: No work applies gauge theory to attention/transformers specifically
### Theme 2: Information-Geometric Approaches
- Natural gradient methods (Amari, 1998)
- Fisher information in neural networks (Martens, 2020)
- Information geometry of attention (limited existing work)
- **Gap**: KL divergence as attention score is unexplored
### Theme 3: Variational Methods for Transformers
- Variational attention (Deng et al., 2018)
- Bayesian transformers (Wang et al., 2020)
- Free energy principle in ML (Friston et al., 2006; Millidge et al., 2021)
- **Gap**: VFE minimization as the core transformer objective
### Theme 4: Attention Mechanism Theory
- Attention as kernel methods (Tsai et al., 2019)
- Theoretical analysis of self-attention (Dong et al., 2021)
- Attention and alignment (Bahdanau et al., 2015)
- **Gap**: Lack of principled geometric foundation for attention
### Theme 5: Renormalization Group in ML
- RG and deep learning (Mehta & Schwab, 2014)
- Hierarchical coarse-graining in neural networks
- **Gap**: RG analysis of attention patterns across layers
Write flowing prose that connects papers thematically (use scientific-writing skill for actual prose generation):
## Synthesis Template
[Theme introduction — why this line of work matters]
[Foundational work — seminal papers that established the area]
[Key developments — how the field evolved]
[Current state — what recent work has achieved]
[Limitations/gaps — what remains unaddressed]
[Connection to our work — how the Gauge-Transformer addresses the gap]
| Method | Geometric Structure | Attention Type | Invariance | VFE |
|--------|-------------------|----------------|------------|-----|
| Standard Transformer | None | Dot-product | None | No |
| Gauge Equiv. CNN | Gauge fields | N/A (conv) | Gauge | No |
| Bayesian Transformer | Probabilistic | Learned | None | Partial |
| **Gauge-Transformer** | **GL(K) gauge** | **KL-divergence** | **GL(K)** | **Yes** |
## Research Gap Identification
### Methodological Gaps
1. No existing work combines gauge theory with attention mechanisms
2. KL divergence between belief distributions unused as attention score
3. VFE minimization not formulated as transformer objective
4. GL(K) invariance in attention is novel
### Empirical Gaps
1. No comparison of geometric vs. learned attention on language modeling
2. RG flow analysis of attention patterns not explored
3. Information-theoretic properties of attention under-measured
### Theoretical Gaps
1. Gauge-theoretic foundation for attention lacking
2. Connection between VFE and standard cross-entropy loss unexplored
3. Relationship between attention gauge invariance and generalization unknown
def organize_bibliography(papers, themes):
"""Organize papers into thematic groups for the bibliography."""
organized = {theme: [] for theme in themes}
for paper in papers:
for theme, keywords in themes.items():
if any(kw.lower() in paper['abstract'].lower() for kw in keywords):
organized[theme].append(paper)
return organized
# Example usage for Gauge-Transformer
themes = {
'geometric_dl': ['equivariant', 'gauge', 'geometric deep learning', 'fiber bundle'],
'information_geometry': ['Fisher information', 'natural gradient', 'information geometry'],
'variational': ['variational', 'free energy', 'ELBO', 'variational inference'],
'attention_theory': ['attention mechanism', 'self-attention', 'transformer theory'],
'rg_ml': ['renormalization', 'coarse-graining', 'multiscale'],
}
def merge_bibtex_entries(entries, output_path='references.bib'):
"""Merge and deduplicate BibTeX entries."""
seen_keys = set()
unique_entries = []
for entry in entries:
key = entry.split('{')[1].split(',')[0]
if key not in seen_keys:
seen_keys.add(key)
unique_entries.append(entry)
with open(output_path, 'w') as f:
f.write('\n\n'.join(unique_entries))
return len(unique_entries)
If conducting a formal systematic review, follow PRISMA guidelines:
# No additional dependencies for the methodology
# For automated searching, see arxiv-database skill