Recipe extraction from URLs and images for OurTable. Covers HTML preprocessing, JSON-LD parsing, prompt engineering for Claude API, ingredient parsing, Hebrew recipe sites, and Edge Function patterns. Use when working on: 'recipe import', 'scrape-recipe', 'recipe extraction', 'ingredient parsing', 'recipe URL', 'recipe photo', 'meal planning AI'.
AI-powered recipe import from URLs and photos for OurTable.
Two extraction paths, both in supabase/functions/scrape-recipe/index.ts:
Client-side service: src/services/recipeImport.ts (handles both paths + CORS proxy fallback)
Always follow this order — stop at the first success:
@type: Recipe) — ~70% of major recipe sites have this. Parse it directly, no AI needed. Free and 100% reliable.itemtype="http://schema.org/Recipe") — older format, same structured data.Before sending HTML to Claude, ALWAYS preprocess:
1. Extract <main>, <article>, or [role="main"] content — skip nav/header/footer/sidebar
2. Remove: <script>, <style>, <nav>, <footer>, <header>, <aside>, <iframe>, ads divs
3. Remove: social share buttons, comment sections, "you might also like" blocks
4. Remove: inline styles, data-* attributes, class attributes (reduce noise)
5. Keep: headings, paragraphs, lists, images (src only), tables, time elements
6. Result should be ~3-8K chars for a typical recipe (vs 30K+ raw HTML)
Common noise selectors to remove:
.ad, .advertisement, .social-share, .comments, .related-posts#sidebar, #footer, #header, #nav[class*="share"], [class*="social"], [class*="comment"], [class*="ad-"]The schema.org Recipe type includes these key fields:
{
"@type": "Recipe",
"name": "Recipe Title",
"description": "Brief description",
"image": "https://...",
"recipeIngredient": ["1 cup flour", "2 eggs", ...],
"recipeInstructions": [
{ "@type": "HowToStep", "text": "Preheat oven..." },
{ "@type": "HowToStep", "text": "Mix ingredients..." }
],
"prepTime": "PT15M",
"cookTime": "PT30M",
"totalTime": "PT45M",
"recipeYield": "4 servings",
"recipeCategory": "Dinner",
"recipeCuisine": "Italian",
"nutrition": { "@type": "NutritionInformation", "calories": "350 calories" }
}
@graph wrapper: { "@graph": [{ "@type": "Recipe", ... }] }"@type": ["Recipe", "Article"]{ "@type": "WebPage", "mainEntity": { "@type": "Recipe", ... } }HowToSection grouping: instructions grouped by section (e.g., "For the dough", "For the filling")recipeInstructions as a single string instead of arrayYou are a recipe extraction expert. Extract structured recipe data from the provided content.
Rules:
- Extract ALL ingredients with precise quantities and units
- Preserve original language (Hebrew or English)
- Parse fraction quantities (1/2, 3/4) into decimal numbers
- Normalize units to standard forms (tablespoon→tbsp, cup, tsp, g, kg, ml, l, oz, lb)
- For Hebrew: recognize כוס (cup), כף (tbsp), כפית (tsp), גרם (g), קילו (kg), ליטר (l)
- If ingredient has no quantity (e.g., "salt to taste"), set quantity to null
- Instructions should be numbered steps, each on a new line
- If content is unclear or not a recipe, return {"error": "Not a recipe"}
Instead of asking for JSON in a user message and regex-parsing, use Claude's tool_use:
{
"tools": [{
"name": "extract_recipe",
"description": "Extract structured recipe data",
"input_schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"description": { "type": "string" },
"instructions": { "type": "string", "description": "Step-by-step, each step on new line" },
"image_url": { "type": ["string", "null"] },
"prep_time_min": { "type": ["integer", "null"] },
"cook_time_min": { "type": ["integer", "null"] },
"servings": { "type": ["integer", "null"] },
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"quantity": { "type": ["number", "null"] },
"unit": { "type": "string" }
},
"required": ["name"]
}
},
"tags": {
"type": "array",
"items": { "type": "string" },
"description": "Auto-detected tags: cuisine, diet, meal type"
}
},
"required": ["title", "ingredients"]
}
}],
"tool_choice": { "type": "tool", "name": "extract_recipe" }
}
This guarantees valid JSON output — no regex parsing needed.
"2 cups all-purpose flour" → { name: "all-purpose flour", quantity: 2, unit: "cup" }
"1/2 tsp salt" → { name: "salt", quantity: 0.5, unit: "tsp" }
"3 large eggs" → { name: "large eggs", quantity: 3, unit: "" }
"salt and pepper to taste" → { name: "salt and pepper", quantity: null, unit: "" }
"1 (14 oz) can diced tomatoes" → { name: "diced tomatoes", quantity: 1, unit: "can" }
"2 כוסות קמח" → { name: "קמח", quantity: 2, unit: "cup" }
"כף שמן זית" → { name: "שמן זית", quantity: 1, unit: "tbsp" }
"חצי כפית מלח" → { name: "מלח", quantity: 0.5, unit: "tsp" }
"200 גרם חזה עוף" → { name: "חזה עוף", quantity: 200, unit: "g" }
"מלח ופלפל לפי הטעם" → { name: "מלח ופלפל", quantity: null, unit: "" }
| Hebrew | Normalized | Notes |
|---|---|---|
| כוס/כוסות | cup | |
| כף/כפות | tbsp | |
| כפית/כפיות | tsp | |
| גרם | g | |
| קילו/ק"ג | kg | |
| ליטר | l | |
| מ"ל | ml | |
| יחידה/יחידות | piece | |
| חבילה/חבילות | pack | |
| פחית/פחיות | can | |
| חצי | 0.5 | Fraction word |
| שליש | 0.333 | Fraction word |
| רבע | 0.25 | Fraction word |
Migration 012 seeds 129 common ingredients with Hebrew names. When importing recipes, match extracted ingredients against this database for autocomplete consistency. See supabase/migrations/012_fixes_and_ingredients.sql.
For photo-based recipe extraction:
/9j/ = JPEG, iVBOR = PNG)_ai_usage metadata in response → logged to ai_usage tablereferences/site-patterns.md — Site-by-site JSON-LD presence, WordPress plugin selectors, Hebrew site patterns, bot protection levelsreferences/html-preprocessing.md — HTML cleaning pipeline, recipe area extraction, WordPress plugin containers, token budget comparisonreferences/prompt-templates.md — System prompts, tool definitions for structured output, few-shot examples for tricky ingredients (English + Hebrew)Test against these categories: