Hector Sanchez

Hector Sanchez is a Public Accountant with over 30 years of experience as accountant and financial analyst, and with diplomas in Corporate Finance and Data Science.

Correcting Word Frequencies with Data Normalization: MapReduce Text Processing on War and Peace — Part 3

Introduction In Part 1 of this series, we installed Hadoop 3.3.6 natively on Ubuntu and configured HDFS for distributed storage. In Part 2, we configured YARN, wrote our first MapReduce program (WordCount), and executed it against the full text of War and Peace. However, Part 2’s analysis revealed a subtle but significant problem: words were...

Continue reading...