python - Pandas time series time between events -


how can calculate time (number of days) between "events" in pandas time series? example, if have below time series i'd know on each day in series how many days have passed since last true

            event 2010-01-01  false 2010-01-02   true 2010-01-03  false 2010-01-04  false 2010-01-05   true 2010-01-06  false 

the way i've done seems overcomplicated, i'm hoping more elegant. loop iterating on rows work, i'm looking vectorized (scalable) solution ideally. current attempt below:

date_range = pd.date_range('2010-01-01', '2010-01-06') df = pd.dataframe([false, true, false, false, true, false], index=date_range, columns=['event']) event_dates = df.index[df['event']] df2 = pd.dataframe(event_dates, index=event_dates, columns=['max_event_date']) df = df.join(df2) df['max_event_date'] = df['max_event_date'].cummax(axis=0, skipna=false) df['days_since_event'] = df.index - df['max_event_date']              event max_event_date  days_since_event 2010-01-01  false            nat               nat 2010-01-02   true     2010-01-02            0 days 2010-01-03  false     2010-01-02            1 days 2010-01-04  false     2010-01-02            2 days 2010-01-05   true     2010-01-05            0 days 2010-01-06  false     2010-01-05            1 days 

continuing improve on answer, , hoping comes in 'the' pythonic way. until then, think final update works best.

last = pd.to_datetime(np.nan) def elapsed(row):     if not row.event:         return row.name - last     else:         global last         last = row.name         return row.name-last  df['elapsed'] = df.apply(elapsed,axis=1)  df             event  elapsed 2010-01-01  false      nat 2010-01-02   true   0 days 2010-01-03  false   1 days 2010-01-04  false   2 days 2010-01-05   true   0 days 2010-01-06  false   1 days 

:::::::::::::

leaving previous answers below although sub-optimal

:::::::::

instead of making multiple passes through, seems easier to loop through indexes

df['elapsed'] = 0 in df.index[1:]:     if not df['event'][i]:         df['elapsed'][i] = df['elapsed'][i-1] + 1 

::::::::::::

let's 'trues' event of interest.

trues = df[df.event==true] trues.dates = trues.index #need because .diff() doesn't work on index trues.elapsed = trues.dates.diff() 

Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -