Introduction In the era of AI and large language models, converting PDF documents to well-structured Markdown has become essential for creating embeddings and storing documents in vector databases like Qdrant or Pinecone. This comprehensive guide will walk you through using PyMuPDF, a powerful Python library for PDF manipulation, to convert...
Continue reading...Data Science
The Powerful COPY Command in DuckDB / MotherDuck: A Quick Reference Guide
The COPY command in DuckDB and MotherDuck is a versatile tool for importing and exporting data. This guide provides a concise overview of how to use COPY both from the DuckDB CLI (SQL only) and from Python, including workflows with Ibis and pandas. Use this as a quick reference for your data engineering tasks!...
Continue reading...Building a Complete DuckLake Solution: From Local Development to Cloud Production
Introduction DuckLake is revolutionizing the lakehouse architecture by combining the simplicity of DuckDB with the power of modern data lake formats. In this comprehensive guide, I’ll walk you through building a complete DuckLake solution in two parts: first creating a local development environment, then scaling it to a cloud-based production...
Continue reading...Unleash the Power of Symbolic Math in Python: A Data Scientist’s Quick Guide to SymPy
As a Data Science Masters student, I’m constantly working with mathematical concepts. From the calculus behind gradient descent to the linear algebra that powers PCA, math is the bedrock of everything we do. Recently, while tackling some homework, I stumbled upon a Python library that completely changed how I approach...
Continue reading...Guide: Self-Hosting QuickChart on ARM with Docker (Local Build) & Caddy
This guide provides step-by-step instructions to install a self-hosted QuickChart instance on an ARM-based server (like Oracle Cloud Ampere A1) running Ubuntu 24.04, using Docker Compose and Caddy. We will build the Docker image locally from the official source code. This approach ensures compatibility with the ARM64 architecture and bypasses...
Continue reading...