Hadoop

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Developed by Doug Cutting and Michael J Hadoop uses the Map Reduce programming model for faster storage and retrieval of data from its nodes. The framework is managed by Apache Software Foundation and is licensed under the Apache License 2.0. For years, while the processing power of application servers has been increasing manifold, databases have lagged behind due to their limited capacity and speed. However, today, as many applications are generating big data to be processed, Hadoop plays a significant role in providing a much-needed makeover to the database world. From a business point of view, too, there are direct and indirect benefits. By using open-source technology on inexpensive servers that are mostly in the cloud (and sometimes on-premises), organizations achieve significant cost savings.

Additionally, the ability to collect massive data, and the insights derived from crunching this data, results in better business decisions in the real-world—such as the ability to focus on the right consumer segment, weed out or fix erroneous processes, optimize floor operations, provide relevant search results, perform predictive analytics, and so on. Hadoop is not just one application, rather it is a platform with various integral components that enable distributed data storage and processing. These components together form the hadoop ecosystem. Some of these are core components, which form the foundation of the framework, while some are supplementary components that bring add-on functionalities into the hadoop world.