Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
oracle_goldengate_apache_flume [2020/11/12 11:44] – created andonovj | oracle_goldengate_apache_flume [2020/11/12 12:24] (current) – [Test] andonovj | ||
---|---|---|---|
Line 2: | Line 2: | ||
Flume is a distributed, | Flume is a distributed, | ||
+ | |||
+ | We have to configure: | ||
+ | |||
+ | * Flume.properties file | ||
+ | * Flume RPC client properties file | ||
+ | |||
+ | ====Configure Flume Properties==== | ||
+ | In the target OGG for big data, create the following files | ||
+ | |||
+ | < | ||
+ | gg.handlerlist = flume | ||
+ | gg.handler.flume.type=flume | ||
+ | gg.handler.flume.RpcClientPropertiesFile=custom-flume-rpc.properties | ||
+ | gg.handler.flume.format=avro_op | ||
+ | gg.handler.flume.mode=op | ||
+ | gg.handler.flume.EventMapsTo=op | ||
+ | gg.handler.flume.PropagateSchema=true | ||
+ | gg.handler.flume.includeTokens=false | ||
+ | goldengate.userexit.timestamp=utc | ||
+ | goldengate.userexit.writers=javawriter | ||
+ | javawriter.stats.display=TRUE | ||
+ | javawriter.stats.full=TRUE | ||
+ | |||
+ | gg.log=log4j | ||
+ | gg.log.level=INFO | ||
+ | |||
+ | gg.report.time=30sec | ||
+ | |||
+ | gg.classpath=dirprm/:/ | ||
+ | jvm.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | ====Configure Flume RPC Client==== | ||
+ | To configure the client, we have to create custom-flume-rpc.properties in the same file: | ||
+ | |||
+ | < | ||
+ | client.type=default | ||
+ | hosts=h1 | ||
+ | hosts.h1=localhost: | ||
+ | batch-size=100 | ||
+ | connect-timeout=20000 | ||
+ | request-timeout=20000 | ||
+ | </ | ||
+ | |||
+ | ====Start Flume==== | ||
+ | < | ||
+ | [oracle@edvmr1p0 conf]$ hdfs dfs -mkdir / | ||
+ | [oracle@edvmr1p0 conf]$ flume-ng agent --conf / | ||
+ | Info: Sourcing environment configuration script / | ||
+ | Info: Including Hadoop libraries found via (/ | ||
+ | Info: Including HBASE libraries found via (/ | ||
+ | + exec / | ||
+ | </ | ||
+ | |||
+ | ====Configure the GG Replicat==== | ||
+ | Again, we have to configure the GG with the replicat: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 dirprm]$ trg | ||
+ | [oracle@edvmr1p0 oggtrg]$ ggsci | ||
+ | |||
+ | Oracle GoldenGate Command Interpreter | ||
+ | Version 12.2.0.1.160823 OGGCORE_OGGADP.12.2.0.1.0_PLATFORMS_161019.1437 | ||
+ | Linux, x64, 64bit (optimized), | ||
+ | Operating system character set identified as UTF-8. | ||
+ | |||
+ | Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 1> info all | ||
+ | |||
+ | Program | ||
+ | |||
+ | MANAGER | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 2> edit param rflume | ||
+ | |||
+ | REPLICAT rflume | ||
+ | TARGETDB LIBFILE libggjava.so SET property=dirprm/ | ||
+ | REPORTCOUNT EVERY 1 MINUTES, RATE | ||
+ | GROUPTRANSOPS 10000 | ||
+ | MAP OGGSRC.*, TARGET OGGTRG.*; | ||
+ | :wq | ||
+ | |||
+ | GGSCI (edvmr1p0) 3> add replicat rflume, exttrail ./dirdat/fl | ||
+ | REPLICAT added. | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 4> start rflume | ||
+ | |||
+ | Sending START request to MANAGER ... | ||
+ | REPLICAT RFLUME starting | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 5> info all | ||
+ | |||
+ | Program | ||
+ | |||
+ | MANAGER | ||
+ | REPLICAT | ||
+ | |||
+ | </ | ||
+ | |||
+ | =====Test===== | ||
+ | To test replication, | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggsrc]$ sqlplus oggsrc/ | ||
+ | |||
+ | SQL*Plus: Release 12.1.0.2.0 Production on Thu Nov 12 12:12:27 2020 | ||
+ | |||
+ | Copyright (c) 1982, 2014, Oracle. | ||
+ | |||
+ | Last Successful login time: Thu Nov 12 2020 11:40:44 +00:00 | ||
+ | |||
+ | Connected to: | ||
+ | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production | ||
+ | With the Partitioning, | ||
+ | |||
+ | SQL> insert into customer_prod select * from customer where customer_id < 21; | ||
+ | |||
+ | 20 rows created. | ||
+ | |||
+ | SQL> commit; | ||
+ | |||
+ | Commit complete. | ||
+ | |||
+ | SQL> | ||
+ | </ | ||
+ | |||
+ | Then, we can check the stats from the GoldenGate | ||
+ | |||
+ | < | ||
+ | --Source (Extract) | ||
+ | GGSCI (edvmr1p0) 5> send priex, stats | ||
+ | |||
+ | Sending STATS request to EXTRACT PRIEX ... | ||
+ | |||
+ | Start of Statistics at 2020-11-12 12:13:16. | ||
+ | |||
+ | DDL replication statistics (for all trails): | ||
+ | |||
+ | *** Total statistics since extract started | ||
+ | Operations | ||
+ | |||
+ | Output to ./ | ||
+ | |||
+ | Extracting from OGGSRC.CUSTOMER_PROD to OGGSRC.CUSTOMER_PROD: | ||
+ | |||
+ | *** Total statistics since 2020-11-12 12:13:12 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Daily statistics since 2020-11-12 12:13:12 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Hourly statistics since 2020-11-12 12:13:12 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Latest statistics since 2020-11-12 12:13:12 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | End of Statistics. | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 6> | ||
+ | |||
+ | |||
+ | --Target (Replicat) | ||
+ | GGSCI (edvmr1p0) 23> send rflume, stats | ||
+ | |||
+ | Sending STATS request to REPLICAT RFLUME ... | ||
+ | |||
+ | Start of Statistics at 2020-11-12 12:13:26. | ||
+ | |||
+ | Replicating from OGGSRC.CUSTOMER_PROD to OGGTRG.CUSTOMER_PROD: | ||
+ | |||
+ | *** Total statistics since 2020-11-12 12:13:14 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Daily statistics since 2020-11-12 12:13:14 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Hourly statistics since 2020-11-12 12:13:14 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | *** Latest statistics since 2020-11-12 12:13:14 *** | ||
+ | Total inserts | ||
+ | Total updates | ||
+ | Total deletes | ||
+ | Total discards | ||
+ | Total operations | ||
+ | |||
+ | End of Statistics. | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 24> | ||
+ | </ | ||
+ | |||
+ | Now, we can also check on the flume: | ||
+ | |||
+ | < | ||
+ | [New I/O worker #1] | ||
+ | (org.apache.flume.source.AvroSource.appendBatch: | ||
+ | </ | ||
+ | |||
+ | We can of course check the files on the HDFS as well: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs -ls / | ||
+ | Found 1 items | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggsrc]$ | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs -cat / | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | }, { | ||
+ | " | ||
+ | " | ||
+ | }, { | ||
+ | " | ||
+ | " | ||
+ | }, { | ||
+ | " | ||
+ | " | ||
+ | }, { | ||
+ | " | ||
+ | " | ||
+ | }, { | ||
+ | " | ||
+ | |||
+ | </ | ||
+ | |||
+ | Niiice, so we have replicated 20 rows between: Oracle RDBMS -> Golden Gate (Extract) -> Golden Gate (Replicat) -> Apache Flume -> HDFS | ||
+ | |||
=====Appendix===== | =====Appendix===== | ||
Line 72: | Line 342: | ||
agent1.channels = ch1 | agent1.channels = ch1 | ||
agent1.sources = avro-source1 | agent1.sources = avro-source1 | ||
- | agent1.sinks = log-sink1 | + | agent1.sinks.hdfs-sink.channel = ch1 |
+ | agent1.sinks.hdfs-sink.type = hdfs | ||
+ | agent1.sinks.hdfs-sink.hdfs.path = hdfs:// | ||
+ | agent1.sinks.hdfs-sink.hdfs.fileType = DataStream | ||
+ | agent1.sinks = log-sink1 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | # | ||
+ | # Licensed to the Apache Software Foundation (ASF) under one | ||
+ | # or more contributor license agreements. | ||
+ | # distributed with this work for additional information | ||
+ | # regarding copyright ownership. | ||
+ | # to you under the Apache License, Version 2.0 (the | ||
+ | # " | ||
+ | # with the License. | ||
+ | # | ||
+ | # http:// | ||
+ | # | ||
+ | # Unless required by applicable law or agreed to in writing, | ||
+ | # software distributed under the License is distributed on an | ||
+ | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
+ | # KIND, either express or implied. | ||
+ | # specific language governing permissions and limitations | ||
+ | # under the License. | ||
+ | # | ||
+ | |||
+ | # Define some default values that can be overridden by system properties. | ||
+ | # | ||
+ | # For testing, it may also be convenient to specify | ||
+ | # -Dflume.root.logger=DEBUG, | ||
+ | |||
+ | # | ||
+ | flume.root.logger=TRACE, | ||
+ | flume.log.dir=/ | ||
+ | flume.log.file=flume.log | ||
+ | |||
+ | log4j.logger.org.apache.flume.lifecycle = INFO | ||
+ | log4j.logger.org.jboss = WARN | ||
+ | log4j.logger.org.mortbay = INFO | ||
+ | log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN | ||
+ | log4j.logger.org.apache.hadoop = INFO | ||
+ | log4j.logger.org.apache.hadoop.hive = ERROR | ||
+ | |||
+ | # Define the root logger to the system property " | ||
+ | log4j.rootLogger=${flume.root.logger} | ||
+ | |||
+ | |||
+ | # Stock log4j rolling file appender | ||
+ | # Default log rotation configuration | ||
+ | log4j.appender.LOGFILE=org.apache.log4j.RollingFileAppender | ||
+ | log4j.appender.LOGFILE.MaxFileSize=100MB | ||
+ | log4j.appender.LOGFILE.MaxBackupIndex=10 | ||
+ | log4j.appender.LOGFILE.File=${flume.log.dir}/ | ||
+ | log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout | ||
+ | log4j.appender.LOGFILE.layout.ConversionPattern=%d{dd MMM yyyy HH: | ||
+ | |||
+ | |||
+ | # Warning: If you enable the following appender it will fill up your disk if you don't have a cleanup job! | ||
+ | # This uses the updated rolling file appender from log4j-extras that supports a reliable time-based rolling policy. | ||
+ | # See http:// | ||
+ | # Add " | ||
+ | log4j.appender.DAILY=org.apache.log4j.rolling.RollingFileAppender | ||
+ | log4j.appender.DAILY.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy | ||
+ | log4j.appender.DAILY.rollingPolicy.ActiveFileName=${flume.log.dir}/ | ||
+ | log4j.appender.DAILY.rollingPolicy.FileNamePattern=${flume.log.dir}/ | ||
+ | log4j.appender.DAILY.layout=org.apache.log4j.PatternLayout | ||
+ | log4j.appender.DAILY.layout.ConversionPattern=%d{dd MMM yyyy HH: | ||
+ | |||
+ | |||
+ | # console | ||
+ | # Add " | ||
+ | log4j.appender.console=org.apache.log4j.ConsoleAppender | ||
+ | log4j.appender.console.target=System.err | ||
+ | log4j.appender.console.layout=org.apache.log4j.PatternLayout | ||
+ | log4j.appender.console.layout.ConversionPattern=%d (%t) [%p - %l] %m%n | ||
</ | </ |