Differences

This shows you the differences between two versions of the page.

--- hadoop [2020/11/11 10:24] – [Overview] andonovj
+++ hadoop [2020/11/11 11:19] (current) – [Overview] andonovj
@@ Line 2: / Line 2: @@
 Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
-As such, hadoop stores the data into instances called Data Nodes, managed by a Name node as you can see below:
+Hadoop is a whole eco system and deverse a wiki on its own, but here we will address several components:
-{{ :hadooparch.png?600 |}}
+  * HDFS (Hadoop Distributed File system)
+  * HBase (Hadoop NoSQL Database)
+  * Yarn (Resource manager)
+You can see the whole eco system below:
-In that example the replication factor is set to 3, so we have 3 copy of a block in the whole cluster.
+{{ :hadoopecoarch.png?600 |}}
-=====Management=====
+In a nutshell, HDFS on its own is storing the data into datanodes which allow many reads but only once write, where the HBase is suitable for many read-write operation again using the HDFS
+=====Management=====
-====Start Hadoop====
+====Services====
 <Code:bash|Start DFS>
 [oracle@edvmr1p0 ~]$ start-dfs.sh
@@ Line 53: / Line 60: @@
-====Manage HDFS====
+====HDFS====
 <Code:Bash|Create / List / Delete HDFS Directories>
 [oracle@edvmr1p0 ~]$ hdfs dfs -mkdir /usr