python - groupby to find row with max value is converting object to datetime -


I want to group two variables ['CIN', 'Calendar'] and return the group's line back Where the column MCelig is the largest in that particular group, it seems that there will be a maximum value in several rows, but I only need one line.

For example:

  AidCode CIN MCelig Calendar 0 None 1e 1 2014-03 -08 1 01 1e 2 2014-03-08 2 01 1e 3 2014- 05-08 3 None 2 E4 2014-06-08 4 01 2 E5 2014-06-08 Since the first two rows are a group, I line where MCelig = Want 2 
came to me with this line
  test = dfx.groupby (['CIN', 'calendar'], apali (lambda x: x.x [x.m.m.ilig.idxmax ( )]  

And it seemed to work, except that when I have 'None' or 'np.nan' column for all values ​​of a group , That column changes from time to time! Take a look at the examples given below and take the code from the object to the date on the date.

DT import as imported NPD = {'CIN': W CD series (['1E', '1e', '1e', '2e', '2e']), 'Adkoda': PD series ([NP NN, '01', '01', NPN , '01 ']),' Calendar ': PDSR ([DT Datetime (2014, 3, 8), DatetTime (2014, 3, 8), Datatetime (2014, 5, 8), DT Datetime (2014, 6, 8 ), Detdetime (2014, 6, 8)], 'MCLIG': PDSR ([1,2,3,4,5]) dfx = pd.DataFrame (d) # Checking that it is just Np.nan The problem was, it is not #dfx = dfx.where ((pd.notnull (dfx), none) test = dfx.groupby (['CIN', 'calendar'], group_keys = False) .apply (lambda x : X.ix [x.MCelig.idxmax ()])

output

  b Every [820]: AIDCode CIN Melining Calendar CIN Calendar 1e 2014-03-08 2015-01-01 1 E 2014-03-08 2014-05-08 2015-01-01 1e 3 2014-05-08 2e 2014- 06-08 2015-01-01 2 E5 2014-06-08  

Update:

Just fix this simple solution Resolved

  x = dfx.sort (['CIN', 'Calendar', 'MCLIF']). Group (["CIN", 'Calendar'], as_index = Because it works, I think I had chosen it for simplicity sake.   

Panda attempts to be helpful in identifying pillars and converting date to datetime64 dtype. It is very aggressive here.

Selecting Maximum Rows To generate a bullion mask for each grouping, go to change There will be an alternate solution to sum:

  DIF Hemax (x): Masks = NP.Joros (Lane (X), DTEP = 'Balls') IDX = NPRGamax (XAV) Mask [IDX] = 1 Return Mask DFX lock [DFX. ['CIN', 'Calendar']) ['MCLIG'] Transform (Onmax) .Stitch (Bull)]  

Production

  AIDCID CIN MCelig Calendar 1 01 1e 2 2014-03-08 2 01 1e 3 2014-05-08 4 01 2e 5 2014-06-08  

Technical Details: When Group-Aware Is used, when pasted back by Detafrem (applied function) back together in a Detafrem, Panda also objects such as columns date with that object dtype try guessing, and if so, when. If the values ​​are string, then it tries to parse them as a date using dateutil.parser :

better or worse, dateutil. Parser '01 'as a date:

 in  [37]: import dated as DP, [27]: DP. Pars ('01') Out [38]: Datetime.datetime (2015, 1, 1, 0, 0)  

For this reason Pandas tries to convert entire ADCODA column into dates is. Since no error has occurred, it seems that it helps you :)


Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -