multiple columns - calculate diff() in python on subsets of data within a dataframe -

July 15, 2010

I'm new to Python and coming from SAS. I want to calculate the difference between the sequential lines using the interval (time difference), but every time I encounter a new person, then I want to resume the process. In SAS, this command is used by using DIC () or interval (), is there any way to use the python?

  PIT receiver broken length 1 1 2015 -01-21 12:00:00 1 1 2015-01-21 12:00:05 5 1 1 2015-01-21 12:00: 20 15 1 1 2015-01-21 12:00:30 10 1 1 2015-01-21 12:00:35 5 1 2 2015-01-22 12:00:35 86400 1 2 2015-01-22 12: 00:50 15 1 2 2015-01-22 12:00:55 5 1 2 2015-01-22 12:01:05 10 1 2 2015-01-22 12:01:10 5 2 1 2015-01-12 12:01:10 2 1 2015-01-12 12:01:15 5 2 2 2015-01-12 12:01:20 5 2 2 2015-01-12 12:01:25 5 2 2015-01-31 12 12:01:30 5

I tried to use this code:

  clear ['tottime'] = pd.to_datetime (Clean .tottime.values) #Convert tottime to datetime value tindex = Clean.tottime.values #Create Time values of a vector which will become part of a multi-index array = [Clean.PIT.G South, Taindeks] # Arejh define the object, in which multi-index index = PD. Multindex Both levels of frames_ arrays (array, name = ['PIT', 'broken']) # Declare multi level index Clean.index = index Clean ['lag'] = Clean Clear the calculation calculated in length between the # lines ('lag'] = clean ['lag'] / np.timedelta64 (1, 's') # This is a numerical (float 64) value' Lag '

But it creates something like this (i.e. works in the first line, but then does not recognize the new PIT value):

  PIT receiver broken length 1 1 2015-01-21 12:00:00 1 1 2015-01-21 12: 00: 05 5 1 1 2015-01-21 12:00:20 15 1 1 2015- 01-21 12:00:30 10 1 1 2015-01-21 12:00:35 5 1 2 2015-01- 22 12:00:35 86400 1 2 2015-01-22 12:00:50 15 1 2 2015-01-22 12:00:55 5 1 2 2015-01-22 12: 0 1:05 10 1 2 2015- 01-22 12:01:10 5 2 1 2015-01-12 12:01:10 -864000 2 1 2015-01-12 12:01:15 5 2 2015-01-31 12 12:01:20 5 2 2 2015-01-12 12:01:25 5 2 2 2015-01-12 12:01:30 5

Then reset it to the new PIT Has failed, and I'm getting a big negative number (10 days ago). After all, I want to be able to do it on PIT and receivers. But now this challenge has been grouped by PIT, which is to prepare this process again. Any suggestions to do this?

In addition, I suspect that this is a subset of a common problem (sub-processing), but I do not know how the question speaks in Python, so I'm not on the Stack Overflow site. Find

One way to do this is by

< P>

lag without the columns, in the format shown to you,

first, create a function, diff_func , which That apply by group

Again use the group () :

  def diff_func (df): return df.diff ( )

/ by default group column PIT by clear , column tottime Tells Panda to apply and then it will be called the new column Lag .

Search This Blog

Raj T

multiple columns - calculate diff() in python on subsets of data within a dataframe -

Comments

Post a Comment

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -