Table of Contents

Overview

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.

Management

Startup

Startup Flume Server - Terminal won't return to bash

[oracle@edvmr1p0 ~]$ flume-ng agent --conf /opt/flume/conf -f /opt/flume/conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1
Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access
+ exec /usr/java/latest/bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=DEBUG,console -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/

This terminal won't exit back to shell. Also you can use teh following command if you want more logging:

Startup Flume v2

[oracle@edvmr1p0 conf]$ flume-ng agent --conf /opt/flume/conf -f /opt/flume/conf/flume.conf -Dflume.root.logger=DEBUG,LOGFILE -n agent1 -Dorg.apache.flume.log.rawdata=true
Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access
+ exec /usr/java/latest/bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=DEBUG,LOGFILE -Dorg.apache.flume.log.rawdata=true -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/ha

Transmit Data

Transmit Data (e.g. /etc/passwd in that case) - Terminal won't return to bash

[oracle@edvmr1p0 ~]$ flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd
Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access
+ exec /usr/java/latest/bin/java -Xmx20m -cp 'conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/

The transmitting of the data will be visible on the previous terminal as follows:

Review Transmitted Data

2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 70 75 6C 73 65 3A 78 3A 34 39 37 3A 34 39 33 3A pulse:x:497:493: }
2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 67 64 6D 3A 78 3A 34 32 3A 34 32 3A 3A 2F 76 61 gdm:x:42:42::/va }
2020-11-11 11:26:40,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 6F 72 61 63 6C 65 3A 78 3A 35 34 33 32 31 3A 35 oracle:x:54321:5 }
2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 70 61 63 68 65 3A 78 3A 34 38 3A 34 38 3A 41 apache:x:48:48:A }
2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 76 61 68 69 3A 78 3A 37 30 3A 37 30 3A 41 76 avahi:x:70:70:Av }
2020-11-11 11:26:40,704 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 76 61 68 69 2D 61 75 74 6F 69 70 64 3A 78 3A avahi-autoipd:x: }

Appendix

Flume Config

[oracle@edvmr1p0 conf]$ cat flume.conf
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = seq

# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100


# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
#  
#  # Define an Avro source called avro-source1 on agent1 and tell it
#  # to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
#   
#   # Define a logger sink that simply logs all events it receives
#   # and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger
#    
#    # Finally, now that we've defined all of our components, tell
#    # agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1