...
Julien Heiduk
Julien Heiduk

Julien Heiduk

Building AI systems & production pipelines. Writing about what actually works.

Evaluating RAG Pipelines with RAGAS: A Comprehensive Tutorial

RAGAS provides objective, LLM-powered metrics to evaluate every component of your RAG pipeline. Learn how to measure faithfulness, context precision, context recall, and more with Qwen2.5 served locally via Ollama — fully offline, no API key required.

Evaluating RAG Pipelines with RAGAS: A Comprehensive Tutorial

FastMCP Server with Hugging Face Hub Resources

Build a FastMCP server that exposes Hugging Face Hub models and datasets as queryable MCP resources for LLM agents.

FastMCP Server with Hugging Face Hub Resources

Querying the Hugging Face Hub with a Tiny LLM and FastMCP

Load Qwen2.5-0.5B-Instruct, connect it to a FastMCP server via context injection, and answer live questions about the Hugging Face Hub catalogue.

Querying the Hugging Face Hub with a Tiny LLM and FastMCP

Fine-Tune LLMs with QLoRA

Fine-tune a large language model with QLoRA on a single GPU, then serve it at high throughput using vLLM’s PagedAttention engine.

Fine-Tune LLMs with QLoRA

GCN with PyTorch for Co-Purchase Recommendation

Graph Convolutional Networks turn co-purchase data into a graph and learn item embeddings for recommendation. A full PyTorch Geometric implementation included.

GCN with PyTorch for Co-Purchase Recommendation