Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hadoop [2020/11/11 10:15] andonovjhadoop [2020/11/11 11:19] (current) – [Overview] andonovj
Line 2: Line 2:
 Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
  
-=====Management=====+Hadoop is a whole eco system and deverse a wiki on its own, but here we will address several components:
  
 +  * HDFS (Hadoop Distributed File system)
 +  * HBase (Hadoop NoSQL Database)
 +  * Yarn (Resource manager)
 +
 +You can see the whole eco system below:
 +
 +{{ :hadoopecoarch.png?600 |}}
 +
 +In a nutshell, HDFS on its own is storing the data into datanodes which allow many reads but only once write, where the HBase is suitable for many read-write operation again using the HDFS
 +
 +
 +
 +=====Management=====
  
-====Start Hadoop====+====Services====
 <Code:bash|Start DFS> <Code:bash|Start DFS>
 [oracle@edvmr1p0 ~]$ start-dfs.sh [oracle@edvmr1p0 ~]$ start-dfs.sh
Line 47: Line 60:
  
  
-====Manage HDFS====+====HDFS====
 <Code:Bash|Create / List / Delete HDFS Directories> <Code:Bash|Create / List / Delete HDFS Directories>
 [oracle@edvmr1p0 ~]$ hdfs dfs -mkdir /usr [oracle@edvmr1p0 ~]$ hdfs dfs -mkdir /usr
  • hadoop.1605089732.txt.gz
  • Last modified: 2020/11/11 10:15
  • by andonovj