Hadoop related project and technology | Description | Download URL |
---|---|---|
Avro | • Avro is a framework for performing remote procedure calls and data serialization. | |
Flume | • Flume is a tool for harvesting, aggregating and moving large amounts of log data in and out of Hadoop. | |
HBase | • Based on Google’s Bigtable, HBase is an open-source, distributed, versioned, column-oriented store that sits on top of HDFS. HBase is column-based rather than row-based, which enables high-speed execution of operations performed over similar values across massive datasets. | |
HCatalog | • An incubator-level project at Apache, HCatalog is a metadata and table storage management service for HDFS. | |
Hive | • Hive provides a warehouse structure and SQL-like access for data in HDFS and other Hadoop input sources | |
Mahout | • Mahout is a scalable machine-learning and data mining library. | |
Oozie | • Oozie is a job coordinator and workflow manager for jobs executed in Hadoop, which can include non-MapReduce jobs. | |
Pig | • Pig is a framework consisting of a high-level scripting language (Pig Latin) and a run-time environment that allows users to execute MapReduce on a Hadoop cluster. | |
Sqoop | • Sqoop (SQL-to-Hadoop) is a tool which transfers data in both directions between relational systems and HDFS or other Hadoop data stores, e.g. Hive or HBase. | |
ZooKeeper | • ZooKeeper is a service for maintaining configuration information, naming, providing distributed synchronization and providing group services. | |
YARN | • YARN is a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications. | http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html |
Cascading | • Cascading is an alternative API to Hadoop MapReduce. Cascading now has support for reading and writing data to and from a HBase cluster. | |
Twitter Storm | • Twitter Storm is a free and open source distributed real time computation system. | |
High performance computing cluster (HPCC) | • HPCC is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions | |
Dremel | • Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data |