Crunch : MapReduce Pipelines made easy

Hadoop and  MapReduce paradigm provides ease of writing parallel data processing. However, many application require a number of Map-Reduce jobs that join, clean, aggregate, and analyze large volume of data. Such a set of connected jobs form a pipeline. Programming/managing such pipelines can be tricky and can cause major impediments to developer productivity.

[…]