Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
oracle_goldengate_apache_flume [2020/11/12 11:49] – [Appendix] andonovjoracle_goldengate_apache_flume [2020/11/12 12:24] (current) – [Test] andonovj
Line 2: Line 2:
  
 Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
 +
 +We have to configure:
 +
 +  * Flume.properties file
 +  * Flume RPC client properties file
 +
 +====Configure Flume Properties====
 +In the target OGG for big data, create the following files
 +
 +<Code:bash|Create Flume.properties file (TRG_OGGHOME/dirprm/flume.properties)>
 +gg.handlerlist = flume
 +gg.handler.flume.type=flume
 +gg.handler.flume.RpcClientPropertiesFile=custom-flume-rpc.properties
 +gg.handler.flume.format=avro_op
 +gg.handler.flume.mode=op
 +gg.handler.flume.EventMapsTo=op
 +gg.handler.flume.PropagateSchema=true
 +gg.handler.flume.includeTokens=false
 +goldengate.userexit.timestamp=utc
 +goldengate.userexit.writers=javawriter
 +javawriter.stats.display=TRUE
 +javawriter.stats.full=TRUE
 +
 +gg.log=log4j
 +gg.log.level=INFO
 +
 +gg.report.time=30sec
 +
 +gg.classpath=dirprm/:/opt/flume/lib/*
 +jvm.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar
 +
 +</Code>
 +
 +====Configure Flume RPC Client====
 +To configure the client, we have to create custom-flume-rpc.properties in the same file:
 +
 +<Code:bash|Create customer RPC File (e.g. TRG_OGGHOME/dirprm/custom-flume-rpc.properties)>
 +client.type=default
 +hosts=h1
 +hosts.h1=localhost:41414
 +batch-size=100
 +connect-timeout=20000
 +request-timeout=20000
 +</Code>
 +
 +====Start Flume====
 +<Code:bash|Start Flume>
 +[oracle@edvmr1p0 conf]$ hdfs dfs -mkdir /user/oracle/flume
 +[oracle@edvmr1p0 conf]$ flume-ng agent --conf /opt/flume/conf -f /opt/flume/conf/flume.conf -Dflume.root.logger=DEBUG,LOGFILE -n agent1 -Dorg.apache.flume.log.rawdata=true
 +Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh
 +Info: Including Hadoop libraries found via (/opt/hadoop/bin/hadoop) for HDFS access
 +Info: Including HBASE libraries found via (/opt/hbase/bin/hbase) for HBASE access
 ++ exec /usr/java/latest/bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=DEBUG,LOGFILE -Dorg.apache.flume.log.rawdata=true -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/contrib/capacity-scheduler/*.jar:/opt/hbase/conf:/usr/java/latest/lib/tools.jar:/opt/hbase:/opt/hbase/lib/activation-1.1.jar:/opt/hbase/lib/aopalliance-1.0.jar:/opt/hbase/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hbase/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hbase/lib/api-asn1-api-1.0.0-M2
 +</Code>
 +
 +====Configure the GG Replicat====
 +Again, we have to configure the GG with the replicat:
 +
 +<Code:bash|Configure GoldenGate Replicat>
 +[oracle@edvmr1p0 dirprm]$ trg
 +[oracle@edvmr1p0 oggtrg]$ ggsci
 +
 +Oracle GoldenGate Command Interpreter
 +Version 12.2.0.1.160823 OGGCORE_OGGADP.12.2.0.1.0_PLATFORMS_161019.1437
 +Linux, x64, 64bit (optimized), Generic on Oct 19 2016 16:01:40
 +Operating system character set identified as UTF-8.
 +
 +Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved.
 +
 +
 +
 +GGSCI (edvmr1p0) 1> info all
 +
 +Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 +
 +MANAGER     RUNNING                                           
 +
 +
 +GGSCI (edvmr1p0) 2> edit param rflume
 +
 +REPLICAT rflume
 +TARGETDB LIBFILE libggjava.so SET property=dirprm/flume.properties
 +REPORTCOUNT EVERY 1 MINUTES, RATE
 +GROUPTRANSOPS 10000
 +MAP OGGSRC.*, TARGET OGGTRG.*;
 +:wq
 +
 +GGSCI (edvmr1p0) 3> add replicat rflume, exttrail ./dirdat/fl
 +REPLICAT added.
 +
 +
 +GGSCI (edvmr1p0) 4> start rflume
 +
 +Sending START request to MANAGER ...
 +REPLICAT RFLUME starting
 +
 +
 +GGSCI (edvmr1p0) 5> info all
 +
 +Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 +
 +MANAGER     RUNNING                                           
 +REPLICAT    RUNNING     RFLUME      00:00:00      00:00:07    
 +
 +</Code>
 +
 +=====Test=====
 +To test replication, we will simply insert some records and see if we can see them replicated in the log
 +
 +<Code:bash|Insert rows>
 +[oracle@edvmr1p0 oggsrc]$ sqlplus oggsrc/oracle@orcl
 +
 +SQL*Plus: Release 12.1.0.2.0 Production on Thu Nov 12 12:12:27 2020
 +
 +Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 +
 +Last Successful login time: Thu Nov 12 2020 11:40:44 +00:00
 +
 +Connected to:
 +Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
 +With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 +
 +SQL> insert into customer_prod select * from customer where customer_id < 21;
 +
 +20 rows created.
 +
 +SQL> commit;
 +
 +Commit complete.
 +
 +SQL> 
 +</Code>
 +
 +Then, we can check the stats from the GoldenGate
 +
 +<Code:bash|Check GoldenGate Stats>
 +--Source (Extract)
 +GGSCI (edvmr1p0) 5> send priex, stats
 +
 +Sending STATS request to EXTRACT PRIEX ...
 +
 +Start of Statistics at 2020-11-12 12:13:16.
 +
 +DDL replication statistics (for all trails):
 +
 +*** Total statistics since extract started     ***
 + Operations                           6.00
 +
 +Output to ./dirdat/in:
 +
 +Extracting from OGGSRC.CUSTOMER_PROD to OGGSRC.CUSTOMER_PROD:
 +
 +*** Total statistics since 2020-11-12 12:13:12 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Daily statistics since 2020-11-12 12:13:12 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Hourly statistics since 2020-11-12 12:13:12 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Latest statistics since 2020-11-12 12:13:12 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +End of Statistics.
 +
 +
 +GGSCI (edvmr1p0) 6> 
 +
 +
 +--Target (Replicat)
 +GGSCI (edvmr1p0) 23> send rflume, stats
 +
 +Sending STATS request to REPLICAT RFLUME ...
 +
 +Start of Statistics at 2020-11-12 12:13:26.
 +
 +Replicating from OGGSRC.CUSTOMER_PROD to OGGTRG.CUSTOMER_PROD:
 +
 +*** Total statistics since 2020-11-12 12:13:14 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Daily statistics since 2020-11-12 12:13:14 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Hourly statistics since 2020-11-12 12:13:14 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +*** Latest statistics since 2020-11-12 12:13:14 ***
 + Total inserts                              20.00
 + Total updates                               0.00
 + Total deletes                               0.00
 + Total discards                             0.00
 + Total operations                          20.00
 +
 +End of Statistics.
 +
 +
 +GGSCI (edvmr1p0) 24> 
 +</Code>
 +
 +Now, we can also check on the flume:
 +
 +<Code:bash|Check Apache Flume log>
 +[New I/O worker #1]
 +(org.apache.flume.source.AvroSource.appendBatch:378) - Avro source avro-source1: Received avro event batch of 1 events. [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.LoggerSink.process:95) - Event: {headers:{SCHEMA_NAME=OGGTRG, TABLE_NAME=CUSTOMER_PROD,SCHEMA_FINGERPRINT=1668461282719043198} body: 28 4F 47 47 54 52 47 2E 43 55 53 54 4F 4D 45 52 (OGGTRG.CUSTOMER 
 +</Code>
 +
 +We can of course check the files on the HDFS as well:
 +
 +<Code:bash|Check HDFS>
 +[oracle@edvmr1p0 oggsrc]$ hdfs dfs -ls /user/oracle/flume
 +Found 1 items
 +-rw-r--r--   1 oracle supergroup       2460 2020-11-12 12:13 /user/oracle/flume/FlumeData.1605183195022
 +[oracle@edvmr1p0 oggsrc]$ 
 +[oracle@edvmr1p0 oggsrc]$ hdfs dfs -cat /user/oracle/flume/FlumeData.1605183195022 | head -50
 +{
 +  "type" : "record",
 +  "name" : "CUSTOMER_PROD",
 +  "namespace" : "OGGTRG",
 +  "fields" : [ {
 +    "name" : "table",
 +    "type" : "string"
 +  }, {
 +    "name" : "op_type",
 +    "type" : "string"
 +  }, {
 +    "name" : "op_ts",
 +    "type" : "string"
 +  }, {
 +    "name" : "current_ts",
 +    "type" : "string"
 +  }, {
 +    "name" : "pos",
 +    "type" : "string"
 +  }, {
 +    "name" : "primary_keys",
 +
 +</Code>
 +
 +Niiice, so we have replicated 20 rows between: Oracle RDBMS -> Golden Gate (Extract) -> Golden Gate (Replicat) -> Apache Flume -> HDFS
 +
  
 =====Appendix===== =====Appendix=====
  • oracle_goldengate_apache_flume.1605181776.txt.gz
  • Last modified: 2020/11/12 11:49
  • by andonovj