python - Pandas: How to resample dataframe such that each combination is present? -


assume have following data frame:

# data t = pd.to_datetime(pd.series(['2015-01-01', '2015-02-01', '2015-03-01', '2015-04-01', '2015-01-01', '2015-02-01'])) g = pd.series(['a', 'a', 'a', 'a', 'b', 'b']) v = pd.series([12.1, 14.2, 15.3, 16.2, 12.2, 13.7]) df = pd.dataframe({'time': t, 'group': g, 'value': v})  # show data >>> df         time group  value 0 2015-01-01      12.1  1 2015-02-01      14.2  2 2015-03-01      15.3  3 2015-04-01      16.2  4 2015-01-01  b     12.2  5 2015-02-01  b     13.7  

what have in end following data frame:

>>> df          time group  value  0 2015-01-01      12.1   1 2015-02-01      14.2   2 2015-03-01      15.3   3 2015-04-01      16.2   4 2015-01-01  b     12.2   5 2015-02-01  b     13.7  6 2015-03-01  b     13.7  7 2015-04-01  b     13.7 

the missing observations in group b should added , missing values should default last observed value.

how can achieve this? in advance!

you can use pivot reshaping, ffill nan (fillna method ffill) , reshape original unstack reset_index:

print (df.pivot(index='time',columns='group',values='value')          .ffill()          .unstack()          .reset_index(name='value'))    group       time  value 0     2015-01-01   12.1 1     2015-02-01   14.2 2     2015-03-01   15.3 3     2015-04-01   16.2 4     b 2015-01-01   12.2 5     b 2015-02-01   13.7 6     b 2015-03-01   13.7 7     b 2015-04-01   13.7 

another solution first find date_range min , max values of time. groupby resample d ffill:

notice:

i think forget parameter format='%y-%d-%m' in to_datetime, if last number month:

t = pd.to_datetime(pd.series(['2015-01-01', '2015-02-01', '2015-03-01',                               '2015-04-01', '2015-01-01', '2015-02-01']),                    format='%y-%d-%m')   idx = pd.date_range(df.time.min(), df.time.max()) print (idx)        datetimeindex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04'], dtype='datetime64[ns]', freq='d')  df1 = (df.groupby('group')          .apply(lambda x: x.set_index('time')          .reindex(idx))          .ffill()          .reset_index(level=0, drop=true)          .reset_index()          .rename(columns={'index':'time'}))  print (df1)          time group  value 0 2015-01-01       12.1 1 2015-01-02       14.2 2 2015-01-03       15.3 3 2015-01-04       16.2 4 2015-01-01     b   12.2 5 2015-01-02     b   13.7 6 2015-01-03     b   13.7 7 2015-01-04     b   13.7 

Comments

Popular posts from this blog

scala - 'wrong top statement declaration' when using slick in IntelliJ -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

PySide and Qt Properties: Connecting signals from Python to QML -