Page 1 of 1

Removing duplicate records in a database

PostPosted: Mon May 23, 2005 5:41 pm
by beno
Hi,

Well my question is not about NB, is about DBs but I put it here, maybe someone can hekp me.

I have a NBDB with 11,000 records. Different persons use it and I'm affraid now it has duplicate records.

This database is a medical app wich stores patients records. Each record is for a particular patient. We have its name, sex, age, history, evolution, etc.

I know that if I sort the name field, duplicates will appear together... but if I have 11,000 it is going to be a bad idea to try to check the list manually.

There is no key for each record, just the fields described..

Mmmm, If I try to compare each one with the rest... this is going to be real slow...

Maybe I can compare each person name with the records that only starts with the same letter... as I type this some ideas are comming to my neurons...

Thanks for your time...

Saludos

beno

PostPosted: Mon May 23, 2005 6:43 pm
by Leos
hi Beno,

My suggestion:

in this case i use dbfquery command

....
dbfQuery "sample.dbf" "Name==Beno AND cidade==mexico"

if [dbfQueryResult]>0
... you have great chances that it is a duplicated regist

so, use

dbfDeleteRecord
dbfshowall
dbfNext


you can do this procedure with a 'loop', or 'while'

I think this is not to slow with 11.000 records (i do this with a database with over 70.000 records, without problems... (just about one hour)

THIS is one more useful command that i suggest to include in the next update from this GREAT plugin

...find duplicated regists

PostPosted: Mon May 23, 2005 6:47 pm
by Gaev
beno:
If I sort the name field, duplicates will appear together... If I try to compare each one with the rest... this is going to be real slow...
You only have to compare each current record with the previous (not all the rest) ... something like this ...

dbfSort "myDB" ...
SetVar "[Previous.Name]" ""
SetVar "[Previous.Age]" ""

Loop "1" "[dbfRecordCount]" "[counter]"

If "[myDB.Name]" "=" "[Previous.Name]"
If "[myDB.Age]" "=" "[Previous.Age]"
... same as previous record ... delete it ?
EndIf
EndIf

SetVar "[Previous.Name]" "[myDB.Name]"
SetVar "[Previous.Age]" "[myDB.Age]"
dbfNext "myDB"

EndLoop

... the key question is going to be which of the duplicate records should be deleted ? or are they 100% identical ?

If looping through 11,000 records takes too long ... the only thing I can suggest is ... a faster (make that super fast) machine.

PostPosted: Mon May 23, 2005 7:06 pm
by beno
Gaev,

You are right!

Once sorted the DB I have only to compare with previous and next.

Possibly do a loop if there are many duplicates.

Thanks a lot,

beno

PS thanks for your time too Leos!

PostPosted: Tue May 24, 2005 6:12 am
by Sam Cox
Hi beno,

Gaev's solution is a good one -- and it's now in my notebook!

Can you improve your medical record database program so that continued use will not result in more duplicate entries?

-- Sam

PostPosted: Tue May 24, 2005 6:56 am
by beno
Hi Sam,

Good idea, now I think I can add your idea: check the existence of records using the patient's name before adding it to the database...

Thanks for your help,

beno

PostPosted: Thu Aug 18, 2005 4:10 pm
by cp4w
Have not the patients a personal code, like fiscal code or medical public service code?

Bye
Licia