Overview
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
Management
Startup
Startup Flume Server - Terminal won't return to bash
[oracle@edvmr1p0 ~]$ flume-ng agent --conf /opt/flume/conf -f /opt/flume/conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1 Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access + exec /usr/java/latest/bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/
This terminal won't exit back to shell. Also you can use teh following command if you want more logging:
Startup Flume v2
[oracle@edvmr1p0 conf]$ flume-ng agent --conf /opt/flume/conf -f /opt/flume/conf/flume.conf -Dflume.root.logger=DEBUG,LOGFILE -n agent1 -Dorg.apache.flume.log.rawdata=true Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access + exec /usr/java/latest/bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=DEBUG,LOGFILE -Dorg.apache.flume.log.rawdata=true -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/ha
Transmit Data
Transmit Data (e.g. /etc/passwd in that case) - Terminal won't return to bash
[oracle@edvmr1p0 ~]$ flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access + exec /usr/java/latest/bin/java -Xmx20m -cp 'conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/
The transmitting of the data will be visible on the previous terminal as follows:
Review Transmitted Data
2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 70 75 6C 73 65 3A 78 3A 34 39 37 3A 34 39 33 3A pulse:x:497:493: } 2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 67 64 6D 3A 78 3A 34 32 3A 34 32 3A 3A 2F 76 61 gdm:x:42:42::/va } 2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 6F 72 61 63 6C 65 3A 78 3A 35 34 33 32 31 3A 35 oracle:x:54321:5 } 2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 70 61 63 68 65 3A 78 3A 34 38 3A 34 38 3A 41 apache:x:48:48:A } 2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 76 61 68 69 3A 78 3A 37 30 3A 37 30 3A 41 76 avahi:x:70:70:Av } 2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 76 61 68 69 2D 61 75 74 6F 69 70 64 3A 78 3A avahi-autoipd:x: }
Appendix
Flume Config
[oracle@edvmr1p0 conf]$ cat flume.conf # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' agent.sources = seqGenSrc agent.channels = memoryChannel agent.sinks = loggerSink # For each one of the sources, the type is defined agent.sources.seqGenSrc.type = seq # The channel can be defined as follows. agent.sources.seqGenSrc.channels = memoryChannel # Each sink's type must be defined agent.sinks.loggerSink.type = logger #Specify the channel the sink should use agent.sinks.loggerSink.channel = memoryChannel # Each channel's type is defined. agent.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent.channels.memoryChannel.capacity = 100 # Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory # # # Define an Avro source called avro-source1 on agent1 and tell it # # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.avro-source1.channels = ch1 agent1.sources.avro-source1.type = avro agent1.sources.avro-source1.bind = 0.0.0.0 agent1.sources.avro-source1.port = 41414 # # # Define a logger sink that simply logs all events it receives # # and connect it to the other end of the same channel. agent1.sinks.log-sink1.channel = ch1 agent1.sinks.log-sink1.type = logger # # # Finally, now that we've defined all of our components, tell # # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = avro-source1 agent1.sinks = log-sink1