Location: Business Intelligence

Discussion: Load of datawarehouseReported This is a featured thread

Showing 7 posts

patrickioset
Load of datawarehouse
Jan 20 2009, 9:30 AM EST | Post edited: Jan 20 2009, 9:30 AM EST
We have a datawarehouse that needs to be updated daily with the datachanges from our source system.
Our source system has about 4 tera of data.
Our datawarehouse is about 1 tera and we would need to renew its data every night within 2 to 3 hours.

Could you please enumerate the solutions that Oracle system (in house we have 10G Rel.2) offers for this kind of data load ?

Thank you very much.
Do you find this valuable?    
Keyword tags: None
Ferrarist
Ferrarist
1. RE: Load of datawarehouse
Jan 23 2009, 2:53 AM EST | Post edited: Jan 23 2009, 2:53 AM EST
Could you please be more specific. Do you really want to move all the (changed) data from source to DWH. You say you already have a datawarehouse running, which must mean there already is some ETL for populating the DWH. Can you tell more about your DWHs architecture. For instance is it layered, does it have a datastore, do you use datamarts? What kind of hardware is it running on? Do you use Oracle Warehouse Builder to maintain the ETL to populate the warehouse? You see, there are many ways to load your DWH, but without more information enumerating solutions would become a wild guess.

Regards,
Rob
Do you find this valuable?    

pat_ioset@hispeed.ch
2. RE: Load of datawarehouse
Jan 26 2009, 4:01 AM EST | Post edited: Jan 26 2009, 4:01 AM EST
Our databank is layered as follows:
- a datastore with the copies of the data from the source database (it has only the tabels and fields that we need);
- an etlbase- Lyaer, where the data are been cleansed;
- a datamart layer.
Each layer has an oracle shema. The database and the server is on a SOLARIS SUN machine.
We use Oracle Warehouse Builder to maintenain all the layers.

regards
Pat
Do you find this valuable?    
Ferrarist
Ferrarist
3. RE: Load of datawarehouse
Jan 26 2009, 5:44 AM EST | Post edited: Jan 26 2009, 5:44 AM EST
One thing that still isn't very clear to me is if you keep track of history in your DWH? Are all the layers truncated before a new complete load? I'm assuming you do have some history in your etlbase-layer in my following answer.

One of the options for speeding up the load would be to just load the 'changed' data into the datastore. Is there a way to keep track of the changes in your source database(s)? For instance through Change Data Capture (reading the redo log) or if this is not applicable by means of a last_update_date field in your sourcetables? One way or another you have to determine which records are changed or do you do that while loading data into the etlbase-layer? If so, wouldn't it be easier to load only the changed records into the datastore, this way you will not have to cleanse as much data as you do now while moving data from datastore to etlbase. After this you could do with an incremental load of your datamarts and you wouldn't have to totally renew the data.

Regards,
Rob
Do you find this valuable?    

pat_ioset@hispeed.ch
4. RE: Load of datawarehouse
Jan 26 2009, 8:56 AM EST | Post edited: Jan 26 2009, 8:56 AM EST
Thank you for your answer.
We have already experimented with the delta load process by using the Oracle Streams. It did not perform well on such big amount of data. Every day we have milions of records that are changed on the source system. Finally the streams were extrem slow to handel all of these transactions.
In our initial implementation of the streams we only selected the tables and fields that we need.
Unfortuntely we receive often demand for fields that are not in our datastore.

Actually we are looking at a solution with which we can "clone" the entire source Database at a minimal time . Our source database is about 4 teras.
Has Oracle highly performant mirror- process ? or other tools ?

Regards
Pat
Do you find this valuable?    
Ferrarist
Ferrarist
5. RE: Load of datawarehouse
Jan 26 2009, 10:16 AM EST | Post edited: Jan 26 2009, 10:16 AM EST
Did you try datapump (the new EXPort)? I'm sure you did.

Otherwise another option is to migrate to the Oracle 11g DBMS. I know this means you will have a lot of work migrating to a new version, but the 11g database has an option called Real Time Query Database?

Before 11g you had the Dataguard option, which basically was a shadow copy of your production database on another server which primary purpose was to be a reliable backup that could quickly be restored in case your production database wasn't usable anymore. This dataguard database could not be used to run query's against. Now in 11g there's the Real Time Query Database option. Which gives you the opportunity to query this shadow database. Maybe that'll be something you want to look at.

Regards,
Rob


Oracle 11g bied nu juist de mogelijkheid om deze schaduw database in te zetten als Real Time Query Database, je kunt er dus queries op los laten terwijl nieuwe gegevens uit de productie database direct doorgevoerd worden op de schaduw database, en je kunt de schaduw database gebruiken voor Real Application Testing zonder de verzekering buiten werking te zetten.
1  out of 1 found this valuable. Do you?    
ankitanshu
ankitanshu
6. RE: Load of datawarehouse
Jan 26 2009, 11:12 PM EST | Post edited: Jan 26 2009, 11:12 PM EST
For database of this size, you must be using RAW devices. The cloning can e done at disk level.
Which SAN are you using?
0  out of 1 found this valuable. Do you?