python - Getting a previous value Pandas -


i'm working on feature extraction machine learning model , every row need compare current price previous price. sort dataframe datetime column, iterate on rows , keep dictionary product id key , last price value. dataset big, around 5m 'sales' in training set , in test set. on small sample (about 250k products) taking long time , lot of memory. i've used vectorizing functions throughout other portions of code don't know how can make part more efficient. here's i'm doing right now:

data = data.sort_values('date_time') previous_price = {} data_list = [] index, value in data.iterrows():     if value['prop_id'] in previous_price.keys():         data_list.append(value['price_usd']-previous_price[value['prop_id']])     else:         data_list.append(0)     previous_price[value['prop_id']] = value['price_usd'] data['previous_price_diff'] = data_list 

it looks want previous value subtract against based on ids, can use groupby:

data.groupby('prop_id')['price_usd'].diff() 

so groups on 'prop_id' , returns inter-row difference


Comments

Popular posts from this blog

PySide and Qt Properties: Connecting signals from Python to QML -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -