Comparing Hadoop Data Storage (HDFS, HBase, Hive and Pig)

Hadoop ecosystem contains components such as HDFS, HBase, Hive and Pig that are used for data storage and data access. Sometimes these components are used as a replacement for existing data storage and sometimes as an extension to it. Each of these components is designed to address specific problems and has specific application; however they […]

How to Build a Recommendation Engine Using Apache Mahout

“Recommendation engines, or recommenders, are widely used by many applications for suggesting objects users may like. For example, an online shopping site will suggest products users may like depending on what they have bought and/or visited earlier. This session covers creation of a recommender for a consumer Web application. After attending this session, you would […]

Crunch : MapReduce Pipelines made easy

Hadoop and  MapReduce paradigm provides ease of writing parallel data processing. However, many application require a number of Map-Reduce jobs that join, clean, aggregate, and analyze large volume of data. Such a set of connected jobs form a pipeline. Programming/managing such pipelines can be tricky and can cause major impediments to developer productivity.

[…]