Hadoop Archives - CFOCoder

Building a Modern Frontier Data Stack: Hadoop 3.4.3, Hive 4.2.0, and MinIO S3 Integration in 2026

A bit of personal context A few days ago, I published posts about how to install Hadoop 3.3.6 natively on Ubuntu. At that time, I thought it was the state of the art. But things in the Big Data world move fast. Fast foward a few days, and when I...

March 12, 2026 by Hector Sanchez Hadoop

Apache Hive 3.1.3 on Ubuntu: Native Installation on Top of Hadoop 3.3.6

In Part 1 of this series, I installed Hadoop 3.3.6 natively on Ubuntu 24.04 and configured HDFS in pseudo-distributed mode. In Part 2, I configured YARN and ran the canonical WordCount job on War and Peace. In Part 3, I improved the text processing pipeline by normalizing words before counting them. The natural next...

March 10, 2026 by Hector Sanchez Hadoop

Correcting Word Frequencies with Data Normalization: MapReduce Text Processing on War and Peace — Part 3

Introduction In Part 1 of this series, we installed Hadoop 3.3.6 natively on Ubuntu and configured HDFS for distributed storage. In Part 2, we configured YARN, wrote our first MapReduce program (WordCount), and executed it against the full text of War and Peace. However, Part 2’s analysis revealed a subtle but significant problem: words were...

March 8, 2026 by Hector Sanchez Hadoop

Running Your First MapReduce Job on Hadoop: WordCount on War and Peace

In Part 1 of this series we installed Hadoop 3.3.6 natively on Ubuntu 24.04 and got HDFS running in pseudo-distributed mode. That gave us a working distributed file system, but Hadoop is much more than storage — its true power lies in processing large datasets in parallel using the MapReduce programming model. In this...

March 1, 2026 by Hector Sanchez Hadoop

Hadoop 3.3.6 on Ubuntu: Native Installation without Virtual Machine

When I started working with Hadoop in a learning environment, the course guide indicated using Linux Mint in a virtual machine. However, I already had Ubuntu 24.04 installed natively on my Dell Vostro with 32 GB of RAM, and it seemed smarter to leverage it directly. In this post I...

February 23, 2026 by Hector Sanchez Hadoop