python - Dummy variables when not all categories are present -


i have set of dataframes 1 of columns contains categorical variable. i'd convert several dummy variables, in case i'd use get_dummies.

what happens get_dummies looks @ data available in each dataframe find out how many categories there are, , create appropriate number of dummy variables. however, in problem i'm working right now, know in advance possible categories are. when looking @ each dataframe individually, not categories appear.

my question is: there way pass get_dummies (or equivalent function) names of categories, that, categories don't appear in given dataframe, it'd create column of 0s?

something make this:

categories = ['a', 'b', 'c']     cat 1   2   b 3   

become this:

  cat_a  cat_b  cat_c 1   1      0      0 2   0      1      0 3   1      0      0 

using transpose , reindex

import pandas pd  cats = ['a', 'b', 'c'] df = pd.dataframe({'cat': ['a', 'b', 'a']})  dummies = pd.get_dummies(df, prefix='', prefix_sep='') dummies = dummies.t.reindex(cats).t.fillna(0)  print dummies         b    c 0  1.0  0.0  0.0 1  0.0  1.0  0.0 2  1.0  0.0  0.0 

Comments

Popular posts from this blog

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -

PySide and Qt Properties: Connecting signals from Python to QML -