python - Getting a previous value Pandas -
i'm working on feature extraction machine learning model , every row need compare current price previous price. sort dataframe datetime column, iterate on rows , keep dictionary product id key , last price value. dataset big, around 5m 'sales' in training set , in test set. on small sample (about 250k products) taking long time , lot of memory. i've used vectorizing functions throughout other portions of code don't know how can make part more efficient. here's i'm doing right now:
data = data.sort_values('date_time') previous_price = {} data_list = [] index, value in data.iterrows(): if value['prop_id'] in previous_price.keys(): data_list.append(value['price_usd']-previous_price[value['prop_id']]) else: data_list.append(0) previous_price[value['prop_id']] = value['price_usd'] data['previous_price_diff'] = data_list
it looks want previous value subtract against based on ids, can use groupby
:
data.groupby('prop_id')['price_usd'].diff()
so groups on 'prop_id' , returns inter-row difference
Comments
Post a Comment