Name: Openakita Skills Summarizer
Author: openakita

Search skills.../

Openakita Skills Summarizer | Skills Pool

# For advanced HTML parsing (optional)
pip install beautifulsoup4 requests

# For PDF text extraction (optional)
pip install PyPDF2
# or
pip install pdfplumber

pip install youtube-transcript-api

Input Type	How to Provide	Notes
URL (webpage)	Paste the URL	HTML content extracted automatically
URL (YouTube)	Paste YouTube link	Transcript extracted via API
Local file (text)	File path	`.txt`, `.md`, `.rst`, `.csv`
Local file (PDF)	File path	Requires PyPDF2 or pdfplumber
Local file (HTML)	File path	Parsed with BeautifulSoup
Local file (DOCX)	File path	Requires python-docx
Raw text	Paste directly	Any length
Clipboard	"Summarize my clipboard"	If clipboard access available

Input Analysis:
1. Is it a URL? → Fetch the content
2. Is it a file path? → Read the file
3. Is it raw text? → Use directly
4. Is it a YouTube link? → Extract transcript
5. Is it multiple sources? → Process each, then combine

import re

def classify_input(text: str) -> str:
    """Classify the input type."""
    text = text.strip()

    # YouTube URLs
    youtube_pattern = r'(youtube\.com|youtu\.be|youtube\.com/shorts)'
    if re.search(youtube_pattern, text):
        return 'youtube'

    # Bilibili URLs
    if 'bilibili.com' in text or 'b23.tv' in text:
        return 'bilibili'

    # General URLs
    if re.match(r'https?://', text):
        return 'url'

    # File paths
    if any(text.endswith(ext) for ext in ['.pdf', '.txt', '.md', '.html', '.docx', '.rst', '.csv']):
        return 'file'

    # Raw text
    return 'text'

from bs4 import BeautifulSoup
import requests

def extract_url_content(url: str) -> dict:
    """Extract main content from a URL."""
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (compatible; ContentSummarizer/1.0)'
    }, timeout=30)
    response.raise_for_status()

    soup = BeautifulSoup(response.text, 'html.parser')

    # Remove script, style, nav, footer elements
    for tag in soup(['script', 'style', 'nav', 'footer', 'header', 'aside']):
        tag.decompose()

    # Try to find the main article content
    article = soup.find('article') or soup.find('main') or soup.find('body')

    title = soup.find('title')
    title_text = title.get_text().strip() if title else 'Untitled'

    return {
        'title': title_text,
        'text': article.get_text(separator='\n', strip=True) if article else '',
        'url': url
    }

from pathlib import Path

def extract_file_content(filepath: str) -> dict:
    """Extract text from various file formats."""
    path = Path(filepath)
    suffix = path.suffix.lower()

    if suffix in ('.txt', '.md', '.rst', '.csv'):
        text = path.read_text(encoding='utf-8')
        return {'title': path.name, 'text': text, 'format': suffix}

    elif suffix == '.pdf':
        return extract_pdf(filepath)

    elif suffix == '.html':
        text = path.read_text(encoding='utf-8')
        soup = BeautifulSoup(text, 'html.parser')
        for tag in soup(['script', 'style']):
            tag.decompose()
        return {
            'title': path.name,
            'text': soup.get_text(separator='\n', strip=True),
            'format': 'html'
        }

    elif suffix == '.docx':
        return extract_docx(filepath)

    else:
        # Try reading as plain text
        try:
            text = path.read_text(encoding='utf-8')
            return {'title': path.name, 'text': text, 'format': 'unknown'}
        except UnicodeDecodeError:
            raise ValueError(f"Cannot read binary file: {filepath}")


def extract_pdf(filepath: str) -> dict:
    """Extract text from PDF using available libraries."""
    try:
        import pdfplumber
        with pdfplumber.open(filepath) as pdf:
            pages = [page.extract_text() or '' for page in pdf.pages]
            return {
                'title': Path(filepath).name,
                'text': '\n\n'.join(pages),
                'format': 'pdf',
                'pages': len(pdf.pages)
            }
    except ImportError:
        pass

    try:
        from PyPDF2 import PdfReader
        reader = PdfReader(filepath)
        pages = [page.extract_text() or '' for page in reader.pages]
        return {
            'title': Path(filepath).name,
            'text': '\n\n'.join(pages),
            'format': 'pdf',
            'pages': len(reader.pages)
        }
    except ImportError:
        raise RuntimeError("Install pdfplumber or PyPDF2 to read PDFs: pip install pdfplumber")


def extract_docx(filepath: str) -> dict:
    """Extract text from DOCX files."""
    try:
        from docx import Document
        doc = Document(filepath)
        paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
        return {
            'title': Path(filepath).name,
            'text': '\n\n'.join(paragraphs),
            'format': 'docx'
        }
    except ImportError:
        raise RuntimeError("Install python-docx to read DOCX files: pip install python-docx")

from youtube_transcript_api import YouTubeTranscriptApi

def extract_youtube_content(url: str) -> dict:
    """Extract transcript from YouTube video."""
    video_id = extract_video_id(url)  # See youtube-summarizer skill
    transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'zh-Hans', 'ja'])
    text = ' '.join(entry['text'] for entry in transcript)
    return {
        'title': f'YouTube Video {video_id}',
        'text': text,
        'format': 'youtube',
        'segments': transcript
    }

# Summary: [Title]

**Source**: [URL or filename]
**Length**: ~X words / X pages / X minutes

## Key Points
• [Most important finding/conclusion]
• [Second key point]
• [Third key point]
• [Fourth key point — include specific numbers/data if available]
• [Fifth key point]

## Notable Details
• [Interesting data point or quote]
• [Counter-argument or limitation mentioned]

Summarize the following content into 5-8 bullet points. Each bullet should:
- Be self-contained (understandable without reading the full text)
- Include specific numbers, names, or dates when relevant
- Be ordered by importance (most important first)
- Be concise (1-2 sentences max)

Content:
{content}

# Executive Summary: [Title]

**Source**: [URL/file] | **Date**: [if available] | **Read time**: ~X min

## Bottom Line
[1-2 sentences: the single most important takeaway]

## Context
[2-3 sentences: why this matters, background]

## Key Findings
1. [Finding with supporting data]
2. [Finding with supporting data]
3. [Finding with supporting data]

## Implications
[What this means for the reader/team/organization]

## Recommended Actions
1. [Action item]
2. [Action item]

Write an executive summary of the following content. Target audience: busy decision-makers
who need to understand the core message in under 2 minutes.

Structure:
1. Bottom Line (1-2 sentences — what's the one thing they need to know?)
2. Context (2-3 sentences — why does this matter?)
3. Key Findings (3-5 numbered points with data)
4. Implications (what this means going forward)
5. Recommended Actions (concrete next steps)

Content:
{content}

# Detailed Notes: [Title]

**Source**: [URL/file]
**Summary date**: [today]
**Original length**: ~X words

## Overview
[3-5 sentence comprehensive overview]

## Section 1: [Topic]
[Detailed notes preserving key information, quotes, data]
- Sub-point with specifics
- Sub-point with specifics

## Section 2: [Topic]
[Detailed notes]

## Section 3: [Topic]
[Detailed notes]

## Key Quotes
> "[Exact quote]" — [Source/Author]
> "[Exact quote]" — [Source/Author]

## Data & Statistics
| Metric | Value | Context |
|---|---|---|
| [metric] | [value] | [context] |

## References & Links
- [Reference mentioned in the content]

# Extracted Content: [Title]

**Source**: [URL/file]
**Extracted**: [timestamp]
**Word count**: X

---

[Full extracted text in clean markdown]

Parameter	Options	Default
Format	bullet, executive, detailed, extract-only	bullet
Length	brief, short, medium, detailed	medium
Language	Output language code	Same as source
Focus	Specific topic/aspect to emphasize	None (general)
Audience	technical, general, executive, academic	general
Include quotes	yes/no	yes for detailed
Include data	yes/no	yes
Max points	Number of bullet points	8

def chunk_text(text: str, max_chars: int = 30000) -> list[str]:
    """Split text into manageable chunks at paragraph boundaries."""
    paragraphs = text.split('\n\n')
    chunks = []
    current = []
    current_len = 0

    for para in paragraphs:
        if current_len + len(para) > max_chars and current:
            chunks.append('\n\n'.join(current))
            current = []
            current_len = 0
        current.append(para)
        current_len += len(para)

    if current:
        chunks.append('\n\n'.join(current))

    return chunks

def read_with_fallback(filepath: str) -> str:
    """Read file trying multiple encodings."""
    encodings = ['utf-8', 'utf-8-sig', 'gb2312', 'gbk', 'gb18030', 'shift-jis', 'latin-1']
    for enc in encodings:
        try:
            with open(filepath, 'r', encoding=enc) as f:
                return f.read()
        except (UnicodeDecodeError, UnicodeError):
            continue
    raise ValueError(f"Cannot decode {filepath} with any known encoding")

Model Capability	Recommended Use
Large context window (100K+)	Full document summarization in one pass
Standard context (8K-32K)	Chunked processing with merge step
Fast inference	Batch processing of multiple sources
Multi-language	Cross-language summary generation

Openakita Skills Summarizer

Universal Content Summarizer

When to Use This Skill

Prerequisites

Core Dependencies

For URL Content Extraction

Openakita Skills Summarizer

Universal Content Summarizer

When to Use This Skill

Prerequisites

Core Dependencies

For URL Content Extraction

For YouTube Videos

Supported Input Types

Instructions

Step 1: Identify the Content Source

Step 2: Extract Content

From URLs (Webpages)

From Local Files

From YouTube Videos

Step 3: Generate the Summary

Output Formats

Format 1: Bullet Points (Default)

Format 2: Executive Summary

Format 3: Detailed Notes

Format 4: Extract Only (No Summarization)

Workflows

Workflow 1: Quick URL Summary

Workflow 2: PDF Summary

Workflow 3: Custom Format Summary

Workflow 4: Multi-Source Synthesis

Workflow 5: Configurable Length

Workflow 6: Content Extraction Only

Workflow 7: YouTube/Video Summary

Configurable Options

Common Pitfalls

1. Paywalled or Login-Required Content

2. JavaScript-Rendered Content

3. Very Long Content

4. Non-Text Content

5. Encoding Issues

6. Summarization Quality

7. Rate Limits on URL Fetching

Multi-AI Model Support

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing