How to properly apply a lambda function into a pandas data frame column -
i have pandas data frame, sample
, 1 of columns called pr
applying lambda function follows:
sample['pr'] = sample['pr'].apply(lambda x: nan if x < 90)
i following syntax error message:
sample['pr'] = sample['pr'].apply(lambda x: nan if x < 90) ^ syntaxerror: invalid syntax
what doing wrong?
you need mask
:
sample['pr'] = sample['pr'].mask(sample['pr'] < 90, np.nan)
another solution loc
, boolean indexing
:
sample.loc[sample['pr'] < 90, 'pr'] = np.nan
sample:
import pandas pd import numpy np sample = pd.dataframe({'pr':[10,100,40] }) print (sample) pr 0 10 1 100 2 40 sample['pr'] = sample['pr'].mask(sample['pr'] < 90, np.nan) print (sample) pr 0 nan 1 100.0 2 nan
sample.loc[sample['pr'] < 90, 'pr'] = np.nan print (sample) pr 0 nan 1 100.0 2 nan
edit:
solution apply
:
sample['pr'] = sample['pr'].apply(lambda x: np.nan if x < 90 else x)
timings len(df)=300k
:
sample = pd.concat([sample]*100000).reset_index(drop=true) in [853]: %timeit sample['pr'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop in [854]: %timeit sample['pr'].mask(sample['pr'] < 90, np.nan) slowest run took 4.28 times longer fastest. mean intermediate result being cached. 100 loops, best of 3: 3.71 ms per loop
Comments
Post a Comment