ThorneLabs

A Basic Understanding of MapReduce on Hadoop

• Updated May 20, 2019


If you read any article on big data, you will inevitably come across the terms MapReduce and Hadoop. If your technical focus is not in big data related technologies, the terms MapReduce and Hadoop are not exactly intuitive enough (though, what technical terms are) to understand what they are and do.

Back in June, Linux Journal published an Introduction to MapReduce with Hadoop on Linux that provides a clear explanation of what MapReduce is, with example code, and how it is used in Hadoop.

You might have also noticed how much of a buzz word “big data” has become, and, like anything in technology, there are frequently situations where a problem tries to be solved with an over engineered solution - in this case Hadoop. Hadoop was created to process hundreds of terabytes to petabytes of data, but it is often used on smaller datasets that can be processed much more easily and quickly on a modern workstation. Below are two articles that, very directly, talk about just that.