How to properly apply a lambda function into a pandas data frame column -
i have pandas data frame, sample, 1 of columns called pr applying lambda function follows:
sample['pr'] = sample['pr'].apply(lambda x: nan if x < 90) i following syntax error message:
sample['pr'] = sample['pr'].apply(lambda x: nan if x < 90) ^ syntaxerror: invalid syntax what doing wrong?
you need mask:
sample['pr'] = sample['pr'].mask(sample['pr'] < 90, np.nan) another solution loc , boolean indexing:
sample.loc[sample['pr'] < 90, 'pr'] = np.nan sample:
import pandas pd import numpy np sample = pd.dataframe({'pr':[10,100,40] }) print (sample) pr 0 10 1 100 2 40 sample['pr'] = sample['pr'].mask(sample['pr'] < 90, np.nan) print (sample) pr 0 nan 1 100.0 2 nan sample.loc[sample['pr'] < 90, 'pr'] = np.nan print (sample) pr 0 nan 1 100.0 2 nan edit:
solution apply:
sample['pr'] = sample['pr'].apply(lambda x: np.nan if x < 90 else x) timings len(df)=300k:
sample = pd.concat([sample]*100000).reset_index(drop=true) in [853]: %timeit sample['pr'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop in [854]: %timeit sample['pr'].mask(sample['pr'] < 90, np.nan) slowest run took 4.28 times longer fastest. mean intermediate result being cached. 100 loops, best of 3: 3.71 ms per loop
Comments
Post a Comment