Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
oracle_goldengate_hadoop_hdfs [2020/11/11 15:07] – andonovj | oracle_goldengate_hadoop_hdfs [2020/11/11 15:33] (current) – andonovj | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | =====Overview===== | + | =====Configuration Destination===== |
In this section we will replicate from Oracle -> Golden Gate -> HDFS | In this section we will replicate from Oracle -> Golden Gate -> HDFS | ||
{{: | {{: | ||
- | |||
- | |||
- | =====Configuration Destination===== | ||
< | < | ||
Line 165: | Line 162: | ||
</ | </ | ||
- | =====Test new Configuration===== | + | =====Modify |
Let's test new configuration, | Let's test new configuration, | ||
* gg.handler.hdfs.rollOnMetadataChange=false | * gg.handler.hdfs.rollOnMetadataChange=false | ||
* gg.handler.hdfs.format.pkUpdateHandling=update | * gg.handler.hdfs.format.pkUpdateHandling=update | ||
+ | |||
+ | In $TRG_GGHOME/ | ||
That will force the HDFS to NOT create new JSON on metadata change and on update. Once we do that change we have to restart the replicat: | That will force the HDFS to NOT create new JSON on metadata change and on update. Once we do that change we have to restart the replicat: | ||
Line 216: | Line 215: | ||
</ | </ | ||
+ | Now, if we add new column or modify the data, we will not rollover new JSON file until we don't reach the 1G file size defined in: | ||
+ | |||
+ | * gg.handler.hdfs.maxFileSize=1g | ||
+ | |||
+ | |||
+ | These settings should cause the creation of a new JSON File: | ||
+ | |||
+ | < | ||
+ | SQL> alter table product_prod add third_column varchar2(20); | ||
+ | |||
+ | Table altered. | ||
+ | SQL> update product_prod set third_column=' | ||
+ | |||
+ | 1 row updated. | ||
+ | |||
+ | SQL> commit; | ||
+ | |||
+ | Commit complete. | ||
+ | |||
+ | SQL> | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -ls / | ||
+ | Found 3 items | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggtrg]$ | ||
+ | </ | ||
+ | |||
+ | After this point, whatever we do to the table, as long as we don't exceed the 1G of size, it should grow, let's test it. | ||
+ | We can examine the additional of a new column with a simple cat of the JSON | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -cat / | ||
+ | |||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | So now, if I add a new column (yet again) it SHOULDN' | ||
+ | |||
+ | < | ||
+ | SQL> alter table product_prod add constraint product_prod_pk primary key(PRODUCT_ID); | ||
+ | |||
+ | Table altered. | ||
+ | |||
+ | SQL> alter table product_prod add fourth_column number; | ||
+ | |||
+ | Table altered. | ||
+ | |||
+ | SQL> update product_prod set fourth_column=1 where product_id=1; | ||
+ | |||
+ | 1 row updated. | ||
+ | |||
+ | SQL> commit; | ||
+ | |||
+ | Commit complete. | ||
+ | |||
+ | SQL> exit | ||
+ | </ | ||
+ | |||
+ | Now, we can check again for the JSONs: | ||
+ | |||
+ | < | ||
+ | oracle@edvmr1p0 oggtrg]$ hdfs dfs -ls / | ||
+ | Found 3 items | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | [oracle@edvmr1p0 oggtrg]$ | ||
+ | </ | ||
+ | |||
+ | No new JSON, let's examine again the last one: | ||
+ | |||
+ | < | ||
+ | [oracle@edvmr1p0 oggtrg]$ hdfs dfs -cat / | ||
+ | |||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Do you notice that in the last record, we have way less values than the rest, that is becasue I create a PK and HDFS wasn't forced to include all columns (in pseudo-PK column) but used the PK (Product_id) :) | ||