Docling Chunkers: Overview and Comparison Docling provides several chunkers for splitting documents into semantically meaningful pieces, each with different strategies and sophistication: 1. BaseChunker 2. HierarchicalChunker 3. HybridChunker Summary Table: Chunker Structure-Aware Sliding Window Semantic Coherence Use Case Sophistication BaseChunker No No Low Simple, unstructured docs Basic HierarchicalChunker Yes No...
Continue reading...AI
Converting the Mexican Constitution PDF to Markdown with Docling
This tutorial demonstrates how to use Docling to convert PDF documents to Markdown, JSON, and other formats. We’ll use the Mexican Constitution (Constitución Política de los Estados Unidos Mexicanos – CPEUM) as our practical case study. What is Docling? Docling is a powerful Python library for document processing that: Key...
Continue reading...🚀 Installing and Configuring MCPO for Open WebUI: A Complete Guide
📝 Introduction Today I successfully set up MCPO (MCP-to-OpenAPI) to work seamlessly with Open WebUI, providing access to powerful external APIs through the Model Context Protocol. This post documents the entire process, scripts, and configuration needed for a smooth installation. 🎯 What is MCPO? MCPO is a bridge that converts...
Continue reading...Installing AnythingLLM on Oracle ARM Ubuntu Server
A comprehensive guide for self-hosting AnythingLLM with Docker, Caddy, and Ollama 🤖 What is AnythingLLM? AnythingLLM is a powerful, self-hosted AI knowledge management and chat platform that transforms how you interact with your data and AI models. It’s designed to be your personal AI workspace where you can combine multiple AI...
Continue reading...Building Your First MCP Server: A Journey from API to AI Assistant
How I built a production-ready MCP server for Mexican economic data and what I learned along the way As someone who works at the intersection of data science and finance, I’m always looking for ways to make economic data more accessible. When I discovered the Model Context Protocol (MCP), I...
Continue reading...