Location: Oracle Technology Network

Discussion: Find missing and duplicate records in over 500 million records.Reported This is a featured thread

Showing 3 posts

mansurshamsi
Find missing and duplicate records in over 500 million records.
Aug 7 2009, 3:35 PM EDT | Post edited: Aug 7 2009, 3:35 PM EDT
I have two databases on same server. In DB2 I have already 400 million records. I tried to load more 100 million records from DB1 to DB2 on top of existing 400 million records but after an hour I got the unique constraint error from DB2. The load has been failed after inserting 70% records.
Now I have three tasks. First I have to find the duplicate record between DB1 and DB2. Second I have to find out the remaining 30% records. Third I have to re-load the remaining 30% records.
I tried to find out the duplicates between DB1 and DB2 but my query is not ending. The unique constraint is based on composite key (charge_id, start_date). Can you give me the better and fast solution?
Do you find this valuable?    
Keyword tags: None

tolko
1. RE: Find missing and duplicate records in over 500 million records.
Aug 10 2009, 4:33 AM EDT | Post edited: Aug 10 2009, 4:33 AM EDT
to find duplicates:
SELECT COUNT(*), A, B FROM DBX
GROUP BY A, B
HAVING COUNT(*) > 2
re-loading
MERGE INTO DBX t
USING DBY a
ON (t.A= a.A)
WHEN NOT MATCHED THEN
INSERT (t.A, t.B)
VALUES (a.A, a.B)
Do you find this valuable?    

gokhan.kucuk
2. RE: Find missing and duplicate records in over 500 million records.
Aug 13 2009, 3:02 AM EDT | Post edited: Aug 13 2009, 3:02 AM EDT
Trying "Insert and if it fails Update" strategy could be a solution, but sure it will not be fast choice.. Maybe you should use an ETL tool for parallel processing...

Do you find this valuable?