python - groupby to find row with max value is converting object to datetime -
I want to group two variables ['CIN', 'Calendar'] and return the group's line back Where the column MCelig is the largest in that particular group, it seems that there will be a maximum value in several rows, but I only need one line.
For example:
AidCode CIN MCelig Calendar 0 None 1e 1 2014-03 -08 1 01 1e 2 2014-03-08 2 01 1e 3 2014- 05-08 3 None 2 E4 2014-06-08 4 01 2 E5 2014-06-08 Since the first two rows are a group, I line where MCelig = Want 2
came to me with this line test = dfx.groupby (['CIN', 'calendar'], apali (lambda x: x.x [x.m.m.ilig.idxmax ( )]
And it seemed to work, except that when I have 'None' or 'np.nan' column for all values of a group , That column changes from time to time! Take a look at the examples given below and take the code from the object to the date on the date.
DT import as imported NPD = {'CIN': W CD series (['1E', '1e', '1e', '2e', '2e']), 'Adkoda': PD series ([NP NN, '01', '01', NPN , '01 ']),' Calendar ': PDSR ([DT Datetime (2014, 3, 8), DatetTime (2014, 3, 8), Datatetime (2014, 5, 8), DT Datetime (2014, 6, 8 ), Detdetime (2014, 6, 8)], 'MCLIG': PDSR ([1,2,3,4,5]) dfx = pd.DataFrame (d) # Checking that it is just Np.nan The problem was, it is not #dfx = dfx.where ((pd.notnull (dfx), none) test = dfx.groupby (['CIN', 'calendar'], group_keys = False) .apply (lambda x : X.ix [x.MCelig.idxmax ()]) output
b Every [820]: AIDCode CIN Melining Calendar CIN Calendar 1e 2014-03-08 2015-01-01 1 E 2014-03-08 2014-05-08 2015-01-01 1e 3 2014-05-08 2e 2014- 06-08 2015-01-01 2 E5 2014-06-08 Update:
Just fix this simple solution Resolved
x = dfx.sort (['CIN', 'Calendar', 'MCLIF']). Group (["CIN", 'Calendar'], as_index = Because it works, I think I had chosen it for simplicity sake.
Panda attempts to be helpful in identifying pillars and converting date to datetime64 dtype. It is very aggressive here.
Selecting Maximum Rows To generate a bullion mask for each grouping, go to change There will be an alternate solution to sum:
DIF Hemax (x): Masks = NP.Joros (Lane (X), DTEP = 'Balls') IDX = NPRGamax (XAV) Mask [IDX] = 1 Return Mask DFX lock [DFX. ['CIN', 'Calendar']) ['MCLIG'] Transform (Onmax) .Stitch (Bull)]
Production
AIDCID CIN MCelig Calendar 1 01 1e 2 2014-03-08 2 01 1e 3 2014-05-08 4 01 2e 5 2014-06-08
Technical Details: When Group-Aware Is used, when pasted back by Detafrem (applied function) back together in a Detafrem, Panda also objects such as columns date with that object dtype try guessing, and if so, when. If the values are string, then it tries to parse them as a date using dateutil.parser :
better or worse, dateutil. Parser '01 'as a date: in [37]: import dated as DP, [27]: DP. Pars ('01') Out [38]: Datetime.datetime (2015, 1, 1, 0, 0)
For this reason Pandas tries to convert entire ADCODA column into dates is. Since no error has occurred, it seems that it helps you :)
Comments
Post a Comment