database - Efficient checking of possible duplicate entities -
A user database needs to prepare a list of potential duplicates before saving an entity and possibly warn them Duplicate
There are 7 criteria on which we should check the duplicates and if at least 3 matches we should flag the user. All the matches will be on the criteria ID, so no fuzzy string matching is necessary, but my problem comes from the fact that there are several possible ways to match for at least 3 items (99 ways if I have done my timeline. 7) List of prospects.
I do not want to do 99 different DB queries to search my search results nor do I want to bring back many DBs and filter on client side. We are probably talking about a few thousand records at present, but it will grow in millions because the system gets mature.
Has anyone found a good efficient way to do this? I was considering a simple or query to get those records where at least one field matches DB, and then the customer is doing some processing to filter some more, but some fields have very little cardinalis And will not really be less
or And case will work but quite inefficient, because they do not use the index May include.
You need to create UNION to index.
If a user enters names , phone , email and addresses databases , And you want to see all those records that match at least 3 of these areas, you issue:
SELECT i. * FROM (SELECT id, COUNT (*) from (t_info T by selection ID) where name = 'Eve Chianese' union t_info T WHERE phone = '+15558000042' UniOnS All Select ID FD TIFO T WHERE email = ' 42 @ example.com 'Union All Select ID FDT_INFO T WHERE Address = '42 North Lane') ID (*)> = 3) dq on JOIN t_info i.id = dq if the ID Group has an ID. Id This will use the index on these areas and the query will expire soon.
View this blog in my blog for details:
- : At least
3of4How to get a match-related record
Also see that the article is based on.
If you want to create a list of DISTINCT values in existing data, you wrap this query in a subquery:
SELECT i. * T_info i1 from where to select (Select from t_info t by selection ID where name = i1.name union selects all t_info T from ID where call = i1.phone union selects all t_info t where email = i1. Mail unius all select ID t_info T WHERE address = i1.address) q * (*)> = 3) ID that will be the ID by the group Note that this DISTINCT is not infected: if one matches b and b matches c , this does not mean that < Code> a match c .
Comments
Post a Comment