Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
hadoop [2020/11/11 10:24] – [Overview] andonovj | hadoop [2020/11/11 11:19] (current) – [Overview] andonovj | ||
---|---|---|---|
Line 2: | Line 2: | ||
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. | Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. | ||
- | As such, hadoop stores the data into instances called Data Nodes, managed by a Name node as you can see below: | + | Hadoop is a whole eco system and deverse a wiki on its own, but here we will address several components: |
- | {{ : | + | * HDFS (Hadoop Distributed File system) |
+ | * HBase (Hadoop NoSQL Database) | ||
+ | * Yarn (Resource manager) | ||
+ | You can see the whole eco system below: | ||
- | In that example the replication factor is set to 3, so we have 3 copy of a block in the whole cluster. | + | {{ : |
- | =====Management===== | + | |
+ | In a nutshell, HDFS on its own is storing the data into datanodes which allow many reads but only once write, where the HBase is suitable for many read-write operation again using the HDFS | ||
+ | |||
+ | |||
+ | |||
+ | =====Management===== | ||
- | ====Start Hadoop==== | + | ====Services==== |
< | < | ||
[oracle@edvmr1p0 ~]$ start-dfs.sh | [oracle@edvmr1p0 ~]$ start-dfs.sh | ||
Line 53: | Line 60: | ||
- | ====Manage | + | ====HDFS==== |
< | < | ||
[oracle@edvmr1p0 ~]$ hdfs dfs -mkdir /usr | [oracle@edvmr1p0 ~]$ hdfs dfs -mkdir /usr |