Part 5 in the Hadoop and Hive Tutorial Series Introduction When I completed the installation of Hadoop 3.3.6 and Apache Hive 3.1.3 on my Ubuntu machine, I had everything running smoothly. But then came a practical question that every data engineer faces: How do I actually get data into this system...
Continue reading...Restic + MinIO for OpenClaw: What It Is, What It Solves, and the Quick Reference I Wanted Yesterday
A bit of personal context Yesterday I spent part of the day optimizing my OpenClaw setup and cleaning up the way I protect its operational state. At one point, I realized something important: the local workspace was no longer just “scratch space.” It already contained memory, credentials, agent configuration, scripts,...
Continue reading...Building a Modern Frontier Data Stack: Hadoop 3.4.3, Hive 4.2.0, and MinIO S3 Integration in 2026
A bit of personal context A few days ago, I published posts about how to install Hadoop 3.3.6 natively on Ubuntu. At that time, I thought it was the state of the art. But things in the Big Data world move fast. Fast foward a few days, and when I...
Continue reading...Apache Hive 3.1.3 on Ubuntu: Native Installation on Top of Hadoop 3.3.6
In Part 1 of this series, I installed Hadoop 3.3.6 natively on Ubuntu 24.04 and configured HDFS in pseudo-distributed mode. In Part 2, I configured YARN and ran the canonical WordCount job on War and Peace. In Part 3, I improved the text processing pipeline by normalizing words before counting them. The natural next...
Continue reading...Correcting Word Frequencies with Data Normalization: MapReduce Text Processing on War and Peace — Part 3
Introduction In Part 1 of this series, we installed Hadoop 3.3.6 natively on Ubuntu and configured HDFS for distributed storage. In Part 2, we configured YARN, wrote our first MapReduce program (WordCount), and executed it against the full text of War and Peace. However, Part 2’s analysis revealed a subtle but significant problem: words were...
Continue reading...