Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
oracle_goldengate_hadoop_hdfs [2020/11/11 14:26] – andonovj | oracle_goldengate_hadoop_hdfs [2020/11/11 15:33] (current) – andonovj | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | =====Overview===== | + | =====Configuration Destination===== |
In this section we will replicate from Oracle -> Golden Gate -> HDFS | In this section we will replicate from Oracle -> Golden Gate -> HDFS | ||
- | {{ : | + | {{: |
- | |||
- | |||
- | =====Configuration Destination===== | ||
< | < | ||
Line 37: | Line 34: | ||
</ | </ | ||
+ | Now bare in mind that since the following settings: | ||
+ | |||
+ | * rollOnMetadataChange is set to TRUE | ||
+ | * pkUpdateHandling is set to delete-insert | ||
+ | |||
+ | That means if the table structure changes (add/remove column), the HDFS will create new JSON. Furthermore, | ||
+ | Furthermore, | ||
+ | We will modify that at the end of the section to see different results. | ||
=====Create Replicat===== | =====Create Replicat===== | ||
Line 118: | Line 123: | ||
GGSCI (hostname) 7> | GGSCI (hostname) 7> | ||
</ | </ | ||
+ | |||
+ | We can check in the HDFS as well: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs ls / | ||
+ | Found 1 items | ||
+ | drwxr-xr-x | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs -ls / | ||
+ | Found 1 items | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs -cat / | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ............................................................. | ||
+ | </ | ||
+ | |||
+ | |||
+ | With the current configuration, | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggsrc]$ hdfs dfs -ls / | ||
+ | Found 2 items | ||
+ | drwxr-xr-x | ||
+ | drwxr-xr-x | ||
+ | [oracle@edvmr1p0 oggsrc]$ | ||
+ | </ | ||
+ | |||
+ | =====Modify Configuration===== | ||
+ | Let's test new configuration, | ||
+ | |||
+ | * gg.handler.hdfs.rollOnMetadataChange=false | ||
+ | * gg.handler.hdfs.format.pkUpdateHandling=update | ||
+ | |||
+ | In $TRG_GGHOME/ | ||
+ | |||
+ | That will force the HDFS to NOT create new JSON on metadata change and on update. Once we do that change we have to restart the replicat: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ ggsci | ||
+ | |||
+ | Oracle GoldenGate Command Interpreter | ||
+ | Version 12.2.0.1.160823 OGGCORE_OGGADP.12.2.0.1.0_PLATFORMS_161019.1437 | ||
+ | Linux, x64, 64bit (optimized), | ||
+ | Operating system character set identified as UTF-8. | ||
+ | |||
+ | Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 1> info all | ||
+ | |||
+ | Program | ||
+ | |||
+ | MANAGER | ||
+ | REPLICAT | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 2> stop replicat hdfsrp | ||
+ | |||
+ | Sending STOP request to REPLICAT HDFSRP ... | ||
+ | Request processed. | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 3> start hdfsrp | ||
+ | |||
+ | Sending START request to MANAGER ... | ||
+ | REPLICAT HDFSRP starting | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 4> info all | ||
+ | |||
+ | Program | ||
+ | |||
+ | MANAGER | ||
+ | REPLICAT | ||
+ | |||
+ | |||
+ | GGSCI (edvmr1p0) 5> | ||
+ | </ | ||
+ | |||
+ | Now, if we add new column or modify the data, we will not rollover new JSON file until we don't reach the 1G file size defined in: | ||
+ | |||
+ | * gg.handler.hdfs.maxFileSize=1g | ||
+ | |||
+ | |||
+ | These settings should cause the creation of a new JSON File: | ||
+ | |||
+ | < | ||
+ | SQL> alter table product_prod add third_column varchar2(20); | ||
+ | |||
+ | Table altered. | ||
+ | SQL> update product_prod set third_column=' | ||
+ | |||
+ | 1 row updated. | ||
+ | |||
+ | SQL> commit; | ||
+ | |||
+ | Commit complete. | ||
+ | |||
+ | SQL> | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -ls / | ||
+ | Found 3 items | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggtrg]$ | ||
+ | </ | ||
+ | |||
+ | After this point, whatever we do to the table, as long as we don't exceed the 1G of size, it should grow, let's test it. | ||
+ | We can examine the additional of a new column with a simple cat of the JSON | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -cat / | ||
+ | |||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | So now, if I add a new column (yet again) it SHOULDN' | ||
+ | |||
+ | < | ||
+ | SQL> alter table product_prod add constraint product_prod_pk primary key(PRODUCT_ID); | ||
+ | |||
+ | Table altered. | ||
+ | |||
+ | SQL> alter table product_prod add fourth_column number; | ||
+ | |||
+ | Table altered. | ||
+ | |||
+ | SQL> update product_prod set fourth_column=1 where product_id=1; | ||
+ | |||
+ | 1 row updated. | ||
+ | |||
+ | SQL> commit; | ||
+ | |||
+ | Commit complete. | ||
+ | |||
+ | SQL> exit | ||
+ | </ | ||
+ | |||
+ | Now, we can check again for the JSONs: | ||
+ | |||
+ | < | ||
+ | oracle@edvmr1p0 oggtrg]$ hdfs dfs -ls / | ||
+ | Found 3 items | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggtrg]$ | ||
+ | </ | ||
+ | |||
+ | No new JSON, let's examine again the last one: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -cat / | ||
+ | |||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Do you notice that in the last record, we have way less values than the rest, that is becasue I create a PK and HDFS wasn't forced to include all columns (in pseudo-PK column) but used the PK (Product_id) :) | ||
+ |