Introduction In the era of AI and large language models, converting PDF documents to well-structured Markdown has become essential for creating embeddings and storing documents in vector databases like Qdrant or Pinecone. This comprehensive guide will walk you through using PyMuPDF, a powerful Python library for PDF manipulation, to convert...
Continue reading...Python
The Powerful COPY Command in DuckDB / MotherDuck: A Quick Reference Guide
The COPY command in DuckDB and MotherDuck is a versatile tool for importing and exporting data. This guide provides a concise overview of how to use COPY both from the DuckDB CLI (SQL only) and from Python, including workflows with Ibis and pandas. Use this as a quick reference for your data engineering tasks!...
Continue reading...Python Practical Reference Guide
This is a Python reference guide that I wrote for myself, with code samples so I can remember how to write them. I embedded a Jupyter notebook inside this blog post to test them.
Continue reading...