python - Dummy variables when not all categories are present -
i have set of dataframes 1 of columns contains categorical variable. i'd convert several dummy variables, in case i'd use get_dummies
.
what happens get_dummies
looks @ data available in each dataframe find out how many categories there are, , create appropriate number of dummy variables. however, in problem i'm working right now, know in advance possible categories are. when looking @ each dataframe individually, not categories appear.
my question is: there way pass get_dummies
(or equivalent function) names of categories, that, categories don't appear in given dataframe, it'd create column of 0s?
something make this:
categories = ['a', 'b', 'c'] cat 1 2 b 3
become this:
cat_a cat_b cat_c 1 1 0 0 2 0 1 0 3 1 0 0
using transpose , reindex
import pandas pd cats = ['a', 'b', 'c'] df = pd.dataframe({'cat': ['a', 'b', 'a']}) dummies = pd.get_dummies(df, prefix='', prefix_sep='') dummies = dummies.t.reindex(cats).t.fillna(0) print dummies b c 0 1.0 0.0 0.0 1 0.0 1.0 0.0 2 1.0 0.0 0.0
Comments
Post a Comment