Name: Paper Reading Assistant
Author: wentorai

搵技能.../

Paper Reading Assistant | Skills Pool

# Paper Notes: [Short Title]

## Metadata
- **Title**: Full title
- **Authors**: First Author et al. (Year)
- **Venue**: Conference/Journal
- **DOI/URL**: link
- **Date read**: YYYY-MM-DD

## Summary (2-3 sentences)
What does this paper do, and what are the main findings?

## Problem
What problem does this paper address? Why is it important?

## Method
How do they approach the problem? Key technical details.

## Key Results
- Result 1: ...
- Result 2: ...
- Result 3: ...

## Strengths
- Strength 1: ...
- Strength 2: ...

## Weaknesses / Limitations
- Weakness 1: ...
- Weakness 2: ...

## Questions / Things I Don't Understand
- Question 1: ...

## Relevance to My Work
How does this connect to my research? What can I use?

## Key References to Follow Up
- [Author, Year] - Why it seems relevant

# Prompt template for paper summarization
summarize_prompt = """Read the following academic paper and provide:

1. ONE-SENTENCE SUMMARY: The core contribution in a single sentence.

2. KEY FINDINGS (3-5 bullet points):
   - Finding 1 with specific numbers/results
   - Finding 2 ...

3. METHODOLOGY: Describe the approach in 2-3 sentences.

4. LIMITATIONS: List 2-3 limitations acknowledged or unacknowledged.

5. RELEVANCE: How does this relate to [your research topic]?

Paper text:
{paper_text}
"""

# Prompt for critical analysis
critique_prompt = """Analyze the following paper critically:

1. VALIDITY: Are the experimental design and statistical analyses sound?
   Identify any threats to internal/external validity.

2. NOVELTY: What is genuinely new? What is incremental?

3. REPRODUCIBILITY: Could you replicate this study from the description given?
   What information is missing?

4. ALTERNATIVE EXPLANATIONS: Are there alternative interpretations
   of the results that the authors do not consider?

5. FOLLOW-UP QUESTIONS: What would you want to investigate next?

Paper text:
{paper_text}
"""

import fitz  # PyMuPDF

def extract_paper_text(pdf_path):
    """Extract structured text from an academic paper PDF."""
    doc = fitz.open(pdf_path)
    sections = []
    current_section = {"heading": "Preamble", "text": ""}

    for page_num, page in enumerate(doc):
        blocks = page.get_text("dict")["blocks"]
        for block in blocks:
            if "lines" not in block:
                continue
            for line in block["lines"]:
                text = "".join(span["text"] for span in line["spans"])
                font_size = max(span["size"] for span in line["spans"])
                is_bold = any("Bold" in span.get("font", "") for span in line["spans"])

                # Heuristic: detect section headings
                if is_bold and font_size > 11 and len(text.strip()) < 80:
                    if current_section["text"].strip():
                        sections.append(current_section)
                    current_section = {"heading": text.strip(), "text": ""}
                else:
                    current_section["text"] += text + " "

    if current_section["text"].strip():
        sections.append(current_section)

    doc.close()
    return sections

# Extract and display
sections = extract_paper_text("paper.pdf")
for s in sections:
    print(f"\n## {s['heading']}")
    print(s['text'][:200] + "...")

import os
import json

def process_paper_batch(pdf_dir, output_file):
    """Process a batch of papers and save structured notes."""
    results = []

    for filename in os.listdir(pdf_dir):
        if not filename.endswith(".pdf"):
            continue

        pdf_path = os.path.join(pdf_dir, filename)
        sections = extract_paper_text(pdf_path)

        # Find title (usually first bold text or first line)
        title = sections[0]["heading"] if sections else filename

        # Find abstract
        abstract = ""
        for s in sections:
            if "abstract" in s["heading"].lower():
                abstract = s["text"].strip()
                break

        results.append({
            "filename": filename,
            "title": title,
            "abstract": abstract,
            "num_sections": len(sections),
            "total_chars": sum(len(s["text"]) for s in sections)
        })

    with open(output_file, "w") as f:
        json.dump(results, f, indent=2)

    return results

Tool	Platform	Highlights	PDF Annotation	AI Features	Collaboration
Zotero + ZotFile	All	Reference management + PDF	Yes	No (plugins available)	Group libraries
Paperpile	Web/Chrome	Google Docs integration	Yes	No	Shared folders
ReadCube Papers	All	Smart citations	Yes	Recommendations	Shared libraries
Semantic Reader	Web	AI-augmented reading	Yes	Inline explanations, TLDRs	No
Elicit	Web	AI paper search	No	Automated extraction	Tables
Scholarcy	Web	Flashcard summaries	Yes	Auto-summarization	No

Paper Type	Focus On	Time Budget
Seminal paper	Full three-pass reading, understand every detail	3-4 hours
Survey/review	Section headings, taxonomy, open questions	1-2 hours
Methods paper	Algorithm/procedure sections, pseudocode, evaluation	1-2 hours
Results paper	Figures, tables, statistical tests, effect sizes	30-60 min
Position paper	Arguments, assumptions, counterarguments	30-60 min
Related work (peripheral)	Abstract + conclusion only (Pass 1)	5-10 min

Paper Reading Assistant

The Three-Pass Reading Method

Pass 1: Survey (5-10 minutes)

Paper Reading Assistant

The Three-Pass Reading Method

Pass 1: Survey (5-10 minutes)

Pass 2: Comprehension (30-60 minutes)

Pass 3: Recreation (1-4 hours)

Structured Note-Taking Template

AI-Assisted Paper Analysis

Summarization Prompts

PDF Processing Pipeline

Batch Paper Processing

Annotation Tools Comparison

Reading Strategies by Paper Type

Building a Paper Reading Habit

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing