performance - SQL: Counting and Numbering Duplicates - Optimising Correlated Subquery -

March 15, 2014

I have a table in a SQLite database where I need to count the number of duplicates in some columns (i.e. the lines where there are 3 special columns) and then each of these cases (i.e. if there are 2 versions of a particular duplicate, They need to be counted as 1 and 2). I find it difficult to explain in words, so I will use a simple example below.

I have the same data as the following (first line header row, the table reference is in the form of "Idcountdata"):

  id match1 match2 match3 data 1 AbCde BC 0 data01 2 AbCde BC 0 data02 3 AbCde BC 1 data03 4 AbCde AB 0 data04 5 FGhiJ BC 0 data05 6 FGhiJ AB 0 data06 7 FGhiJ BC 1 data07 8 FGhiJ B.C. 1 data08 9 FGhiJ BC 2 data09 10 HkLMop BC 1 data10 11 HkLMop BC 1 data11 12 HkLMop BC 1 data12 13 HkLMop de 1 data13 14 HkLMop de 2 data14 15 HkLMop de 2 data15 16 HkLMop de 2 data 16 17 Hklmpe DE2 data 17

And for this I will be output output:

  id match1 match2 match3 data matchi d matchcount 1 ABCDE BC 0 data01 1 2 2 ABCDE BC 0 data02 2 2 3 ABCDE B.C. 1 data03 1 1 4 ABCDE AB 0 data04 1 1 5 FGhiJ BC 0 data05 1 1 6 FGhiJ AB 0 data06 1 1 7 FGhiJ B.C. 1 data07 1 to 2 FGhiJ B.C. 1 data08 2 2 9 FGhiJ BC 2 data09 1 1 10 HkLMop BC 1 data10 1 3 11 HkLMop BC 1 data11 2 3 12 HkLMop BC 1 data12 3 3 13 HkLMop de 1 data13 1 14 14 HkLMop de 2 data14 1 to 4 15 hklmpe de2 data 15 2 4 16 hklmpe de2 data 16 3 4 17 hklmop d2 data 17 4 4

Before that I was using some correlated subqueries to get it:

  SELECT id, match1, match2, match3, data, (selection number (*) from IDK Ountdata d2 ou d1.match1 = d2.match1 and d1.match2 = d2.match2 and d1.match3 = d2.match3 and d2.id & lt; = d1.id) as matchid, (selection count (*) idcountdata From d2 to ou d1 .match1 = d2.match1 and d1.match2 = d2.match2 and d1.match3 = d2.match3) from idcountadata d1 Elboks;

But there are more than 200,000 rows in the table (and the data length / content can have variables) and so it takes time to run. (Strangely, when I first returned the same query from the same data in mid-to-late 2013, it took minutes instead of the hour, but it is next to the point - even Even back I thought it was unusual and incompetent.)

I have already alter correlated subquery for a jointly combined "matchcount" for an unorganized subquery:

Include idcountdata

  SELECT d1.id, d1.match1, d1 .match2, d1.match3, d1.data, matchcount D1 (id, ma Select tch1, match2, match3, count (*) match1, match2, match3 by matchcount from idcountdata group (d1.match1 = d2) match1 and d1 match2 = d 2. match 2 and d1 Match 3 = D 2. Match 3);

So this is the only subkey for "mitigate" which I need some help in optimizing. In summary, the following queries run very slowly for large datasets:

  SELECT id, match1, match2, match3, data, (ID count from Data SELECT count (*) WHERE d1 Match1 = d2.match1 and d1.match2 = d2.match2 and d1.match3 = d2.match3 and D2.id & lt; = d1.id) IDCount data mailed to D1;

How can I improve the performance of the above queries?
It does not have to move in seconds but needs to be minutes instead of hours (about 200,000 rows)

A self joining can be faster than a correlated subquery

  select d1.id, d1.match1, d1.match2, d1.match3, d1.data, Count (*) matchid to idcountdata d1.match1 = d2.match1 and d1.match2 = d2.match2 and d1 include idcountdata d2 d1 .match3 = d2.match3 and d1.id & gt; = D2.id GROUP by d1.id, d1.match1, d1.match2, d1.match3, d1.data    This query is  (Mail 1, Match 2, Match 3, ID) can take advantage of a composite index




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




python - Overriding the save method in Django ModelForm -






March 15, 2014








    I'm having trouble overriding a  ModelForm  save method. I get this error I'm:    Exception type: TypeErier Exception Value: Save () found an unexpected keyword argument 'committed'    My intentions include a form 3 fields Submit several values for, then create an object for each combination of those areas, and to save each object   file  models.py     class collarsult type (models.Model): id = model. AutoField (db_column = 'icontact_result_code_type_id', primary_key = true) callResult = models.ForeignKey ( 'CallResult', db_column = 'icontact_result_code_id') campaign = models.ForeignKey ( 'campaign', db_column = 'icampaign_id') CALLTYPE = models.ForeignKey ( 'CALLTYPE', db_column = 'icall_type_id') agent = models.BooleanField (db_column = 'bagent', default = true) teamLeader = models.BooleanField (db_column = 'bTeamLeader', default = true) active = models.BooleanField (db_column = Django.form...





Read more





html - CSS autoheight, but fit content to height of div -






February 15, 2014








    I have a div in which there are three children- div in the left and middle div:    float: left    while true:    float: true    because it is my layout Actually will spoil, I've used ClearFix Hack:    .cf {zoom: 1; }. Cf: First,. Cf: {content: ""; Display: Table; }. Cf: After {clear: both; }    It still works, but I want to be the right div an indicator. So it should fill 100% of the parent unit of height.   How can I complete this?   PS is the full code:     & Lt; / Div & gt; & Lt; Div class = "mobile_content" & gt; & Lt; H5 & gt; {{Data.titel}} & lt; / H5> & Lt; Table & gt; & Lt; TR & gt; & Lt; Th & gt; Komponist: & lt; / Th & gt; & Lt; TD & gt; {{Data.komponist}} & lt; / TD & gt; & Lt; / TR & gt; & Lt; TR & gt; & Lt; Th & gt; Instrument: & lt; / Th & gt; & Lt; TD & gt; {{Data.instrumente}} & lt; / TD & gt; & Lt; / TR & gt;...





Read more





qt - How to prevent QAudioInput from automatically boosting the master
volume to 100%? -






July 15, 2014








    I have been trying to use Qt5 multimedia to record audio with QAudioInput, however, when I see If  QAudioInput  is started, it increases the master volume of my sound device to 100%.   How can I prevent QAudioInput from changing the master volume?   My current development platform is Linux with PalsAdio (with flat audio disabled).   How can I use  QAudioInput :    QAudioDeviceInfo device_info = QAudioDeviceInfo :: defaultInputDevice (); QAudio format format; Format.setSampleRate (44100); Format.setChannelCount (1); Format.setSampleSize (16); Format.setCodec ("Audio / PCM"); Format.setSampleType (QAudioFormat :: SignedInt); Format.setByteOrder (QAudioFormat :: LittleEndian); Std :: cout & lt; & Lt; Device_info.deviceName (). ToUtf8 (). ConstData () & lt; & Lt; Std :: endl; QAudioInput * default_device = new QAudioInput (device_info, format); QIODevice * default_io_device = default_device-> Start ();       a  QAudioInput.setVolume ()  method is not you I ...





Read more

Search This Blog

Raj T

performance - SQL: Counting and Numbering Duplicates - Optimising Correlated Subquery -

Comments

Post a Comment

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -