Run inference using Hugging Face models via Inference API
Run text generation, embeddings, and other inference tasks using Hugging Face models.
Call with prompt containing model ID and input text:
Free models (rate-limited):
mistralai/Mistral-7B-Instruct-v0.3 (7B, instruction-tuned)meta-llama/Llama-2-7b-chat-hf (7B, chat)codellama/CodeLlama-7b-hf (code generation)For higher throughput or larger models, configure dedicated Inference Endpoints.
You can specify in your prompt:
Example with parameters: "Using mistralai/Mistral-7B-Instruct-v0.3, with temperature=0.7 and max_tokens=500, generate: 'Write a haiku about code'"
Free Inference API: ~ few requests per minute. For production use:
HF_INFERENCE_ENDPOINT environment variable to point to your endpointHF_TOKEN: Hugging Face access token (required for gated models)HF_INFERENCE_ENDPOINT: Optional custom endpoint URL (defaults to api inference)