Interview management, transcription workflows, and source note-taking for journalists. Use when preparing for interviews, managing recordings, transcribing audio/video, organizing source notes, creating timestamped references, or building interview databases. Essential for reporters conducting interviews and managing source relationships.
Practical workflows for journalists managing interviews from preparation through publication.
Before recording starts, you should already know:
## Source prep for: [Name]
### Background
- Role/title:
- Organization:
- Why they're relevant to this story:
- Previous media appearances (note inconsistencies):
### Key questions (prioritized)
1. [Must-ask question]
2. [Must-ask question]
3. [If time permits]
### Documents to reference
- [ ] Bring/share [specific document]
- [ ] Ask about [specific claim/data point]
### Red lines
- Topics they'll likely avoid:
- Sensitive areas to approach carefully:
# Standard recording configuration
RECORDING_SETTINGS = {
'format': 'wav', # Lossless for transcription
'sample_rate': 44100, # Standard quality
'channels': 1, # Mono is fine for speech
'backup': True, # Always run backup recorder
}
# File naming convention
# YYYY-MM-DD_source-lastname_topic.wav
# Example: 2024-03-15_smith_budget-hearing.wav
Two-device rule: Always record on two devices. Phone as backup minimum.
from pathlib import Path
import subprocess
import json
def transcribe_interview(audio_path: str, output_dir: str = "./transcripts") -> dict:
"""
Transcribe using Whisper with speaker diarization.
Returns transcript with timestamps.
"""
Path(output_dir).mkdir(exist_ok=True)
# Use whisper.cpp or OpenAI Whisper
result = subprocess.run([
'whisper',
audio_path,
'--model', 'medium',
'--output_format', 'json',
'--output_dir', output_dir,
'--language', 'en',
'--word_timestamps', 'True'
], capture_output=True)
# Load and return structured transcript
json_path = Path(output_dir) / f"{Path(audio_path).stem}.json"
with open(json_path) as f:
return json.load(f)
def format_for_editing(transcript: dict) -> str:
"""Convert to journalist-friendly format with timestamps."""
lines = []
for segment in transcript.get('segments', []):
timestamp = format_timestamp(segment['start'])
text = segment['text'].strip()
lines.append(f"[{timestamp}] {text}")
return '\n\n'.join(lines)
def format_timestamp(seconds: float) -> str:
"""Convert seconds to HH:MM:SS format."""
h = int(seconds // 3600)
m = int((seconds % 3600) // 60)
s = int(seconds % 60)
return f"{h:02d}:{m:02d}:{s:02d}"
For sensitive interviews or when AI transcription fails:
## Transcript: [Source] - [Date]
**Recording file**: [filename]
**Duration**: [XX:XX]
**Transcribed by**: [name]
**Verified against recording**: [ ] Yes / [ ] No
---
[00:00:15] **Q**: [Your question]
[00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)]
[00:01:30] **Q**: [Follow-up]
[00:01:42] **A**: [Response]
---
## Notes
- [Anything not captured in audio: gestures, documents shown, etc.]
## Potential quotes
- [00:01:42] "Quote that stands out" - context: [why it matters]
from dataclasses import dataclass
from typing import Optional
import re
@dataclass
class Quote:
text: str
timestamp: str
speaker: str
context: str
verified: bool = False
used_in: Optional[str] = None
class QuoteBank:
"""Manage quotes from interview transcripts."""
def __init__(self):
self.quotes = []
def extract_quote(self, transcript: str, start_time: str,
end_time: str, speaker: str, context: str) -> Quote:
"""Extract and store a quote with metadata."""
# Pull text between timestamps
pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)'
match = re.search(pattern, transcript, re.DOTALL)
if match:
text = match.group(1).strip()
quote = Quote(
text=text,
timestamp=start_time,
speaker=speaker,
context=context
)
self.quotes.append(quote)
return quote
return None
def verify_quote(self, quote: Quote, audio_path: str) -> bool:
"""Mark quote as verified against original recording."""
# In practice: listen to audio at timestamp, confirm accuracy
quote.verified = True
return True
def export_for_story(self) -> str:
"""Export verified quotes ready for publication."""
output = []
for q in self.quotes:
if q.verified:
output.append(f'"{q.text}"\n— {q.speaker}\n[Timestamp: {q.timestamp}]')
return '\n\n'.join(output)
Before publishing any quote:
- [ ] Listened to original recording at timestamp
- [ ] Quote is verbatim (or clearly marked as paraphrased)
- [ ] Context preserved (not cherry-picked to change meaning)
- [ ] Speaker identified correctly
- [ ] Timestamp documented for fact-checker
- [ ] Source approved quote (if agreement made)
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum
class SourceStatus(Enum):
ACTIVE = "active" # Currently engaged
DORMANT = "dormant" # Not recently contacted
DECLINED = "declined" # Refused to participate
OFF_RECORD = "off_record" # Background only
class InterviewType(Enum):
ON_RECORD = "on_record"
BACKGROUND = "background"
DEEP_BACKGROUND = "deep_background"
OFF_RECORD = "off_record"
@dataclass
class Source:
name: str
organization: str
contact_info: dict # email, phone, signal, etc.
beat: str
status: SourceStatus = SourceStatus.ACTIVE
interviews: List['Interview'] = field(default_factory=list)
notes: str = ""
# Relationship tracking
first_contact: Optional[datetime] = None
trust_level: int = 1 # 1-5 scale
@dataclass
class Interview:
source: str
date: datetime
interview_type: InterviewType
recording_path: Optional[str] = None
transcript_path: Optional[str] = None
story_slug: Optional[str] = None
key_quotes: List[str] = field(default_factory=list)
follow_up_needed: bool = False
notes: str = ""
def find_sources_for_story(sources: List[Source], topic: str,
beat: str = None) -> List[Source]:
"""Find relevant sources for a new story."""
matches = []
for source in sources:
# Filter by beat if specified
if beat and source.beat != beat:
continue
# Only suggest active sources
if source.status != SourceStatus.ACTIVE:
continue
# Check if they've spoken on similar topics
for interview in source.interviews:
if topic.lower() in interview.notes.lower():
matches.append(source)
break
# Sort by trust level
return sorted(matches, key=lambda s: s.trust_level, reverse=True)
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
import json
def batch_transcribe(recordings_dir: str, output_dir: str) -> dict:
"""Process all recordings in a directory."""
recordings = list(Path(recordings_dir).glob('*.wav')) + \
list(Path(recordings_dir).glob('*.mp3')) + \
list(Path(recordings_dir).glob('*.m4a'))
results = {}
with ProcessPoolExecutor(max_workers=4) as executor:
futures = {
executor.submit(transcribe_interview, str(rec), output_dir): rec
for rec in recordings
}
for future in futures:
rec = futures[future]
try:
transcript = future.result()
results[rec.name] = {
'status': 'success',
'transcript': transcript
}
except Exception as e:
results[rec.name] = {
'status': 'error',
'error': str(e)
}
return results
import subprocess
def extract_audio_from_video(video_path: str, output_path: str = None) -> str:
"""Extract audio track from video for transcription."""
if output_path is None:
output_path = video_path.rsplit('.', 1)[0] + '.wav'
subprocess.run([
'ffmpeg', '-i', video_path,
'-vn', # No video
'-acodec', 'pcm_s16le', # WAV format
'-ar', '44100', # Sample rate
'-ac', '1', # Mono
output_path
], check=True)
return output_path
## Recording consent record
**Date**:
**Source name**:
**Recording type**: [ ] Audio [ ] Video
**Interview type**: [ ] On record [ ] Background [ ] Off record
### Consent obtained:
- [ ] Verbal consent recorded at start of interview
- [ ] Written consent form signed
- [ ] Email confirmation of consent
### Jurisdiction notes:
- Interview location state/country:
- One-party or two-party consent jurisdiction:
- Any specific restrictions agreed:
### Agreed terms:
- [ ] Full attribution allowed
- [ ] Organization attribution only
- [ ] Anonymous source
- [ ] Review quotes before publication
- [ ] Embargo until [date]:
California, Connecticut, Florida, Illinois, Maryland, Massachusetts, Michigan, Montana, Nevada, New Hampshire, Pennsylvania, Washington require all-party consent.
Always get explicit consent on recording regardless of jurisdiction.
| Tool | Purpose | Notes |
|---|---|---|
| Whisper | Local transcription | Free, accurate, private |
| Otter.ai | Cloud transcription | Real-time, speaker ID |
| Descript | Edit audio like text | Good for pulling clips |
| Rev | Human transcription | For sensitive/legal |
| Trint | Journalist-focused | Collaboration features |
| oTranscribe | Free web player | Manual transcription aid |
| Field | Value |
|---|---|
| Version | 1.0.0 |
| Created | 2025-12-26 |
| Author | Claude Skills for Journalism |
| Domain | Journalism, Research |
| Complexity | Intermediate |