Correcting Word Frequencies with Data Normalization: MapReduce Text Processing on War and Peace — Part 3

Introduction In Part 1 of this series, we installed Hadoop 3.3.6 natively on Ubuntu and configured HDFS for distributed storage. In Part 2, we configured YARN, wrote our first MapReduce program (WordCount), and executed it against the full text of War and Peace. However, Part 2’s analysis revealed a subtle but significant problem: words were...

Continue reading...

Maximizing Value: How I Optimize GitHub Copilot Pro and Anthropic Subscriptions for Coding and Research

Context: Why Model and Platform Matter As a data scientist and developer, I rely on advanced LLMs (Large Language Models) like Claude Opus, Sonnet, GPT-4.1, and GPT-4o for both architectural planning and daily coding. But I quickly learned that the same model behaves differently depending on the platform—and that maximizing value...

Continue reading...