Major notebook by which we can write
Major Big Data ProjectsHadoop ecosystem refers to a family of Apache projects, commercial tools and solutions. The Apache Hadoop project supports many projects in order to make it easy to use and extend Hadoop’s capabilities. There are projects that create developments tools, manage hadoop data flow and processing etc… Here we discuss five important projects in the Apache Hadoop ecosystem. Apache Ambari Ambari is a management platform that helps us to provision hadoop cluster services, manage, secure and monitor Apache Hadoop clusters.
It makes the management of the Hadoop more easy by providing a web UI. Basically, Ambari have two major components, Ambari server and Ambari agent. Ambari server communicates with the agents and Apache agent sends health status to every node. Apache Flume Flume is a system that is distributed, reliable which can be used for collecting, gathering and moving large amounts of log data from many different sources to centralized data store. When the incoming data exceeds the speed of writing data into the storage, Flume mediates between the data creators and the file system to keep smooth data flow. Apache Kafka Kafka is a stream processing platform written in Java and Scala. Basically it is a fault tolerant messaging system which is scalable and fast. Kafka is used for real time streaming of live incoming data.
For example Twitter uses Kafka to stream live tweets in Hadoop. Apache Zeppelin Zeppelin is a notebook by which we can write interactive documents by SQL, Python, Scala etc… Main feature of Zeppelin is that it is web based. We can also use Zeppelin for visualization and data exploration. This browser based notebook also lets developers to share code and visualize results with in the document. Apache Sqoop Sqoop is a tool used for transferring data between Hadoop and relational databases.
Basically it is a command line application. When we start hadoop there won’t be any data in it. Sice much of traditional data resides on a relational database, Sqoop helps us to move data in between them. It can also be used for data extraction from hadoop and export it into relational databases. My Sources :Apache Software Foundation, “Apache Flume”, https://flume.apache.org