Always read and use provided reference files for data before attempting external searches or fabricating information
This skill ensures agents correctly prioritize provided reference files over external data sources, preventing the use of fabricated or incorrect information when structured data is already available.
When reference files are provided in task context, you MUST read and prioritize them for data before attempting web searches or generating synthetic data.
Before taking any action, identify all files provided in the task context:
.xlsx, .csv, .json, .pdf, .docx, .txtRead all relevant reference files before any web searches:
# Example: Read provided Excel file
file_content = read_file(file_path="Massabama active listings.xlsx", filetype="xlsx")
# Example: Read provided CSV
file_content = read_file(file_path="data.csv", filetype="csv")
# Example: Read provided PDF
file_content = read_file(file_path="report.pdf", filetype="pdf")
Important: Handling DOCX Read Failures
If read_file(filetype='docx') fails with an error (common issue), use this fallback method:
# Method 1: Unzip and parse XML directly (docx is a zip archive)
mkdir -p temp_docx && cd temp_docx
unzip -o ../document.docx
# Extract text from word/document.xml
cat word/document.xml | grep -oP '(?<=>)[^<>]+' > extracted_text.txt
# Method 2: Use python-docx via shell
run_shell(command="python -c \"from docx import Document; doc = Document('document.docx'); print('\n'.join([p.text for p in doc.paragraphs]))\"")
# Method 3: Use shell_agent for robust extraction
extraction_result = shell_agent(task="Extract all text content from document.docx using any reliable method (unzip+XML, python-docx, or pandoc)")
After extracting via fallback:
Web searches should only occur when:
# Decision flow example
if has_reference_file("listings.xlsx"):
data = extract_from_excel("listings.xlsx")
# Use this data for output
Edit PDFs with natural-language instructions using the nano-pdf CLI.