python - Pandas: Create new dataframe that averages duplicates from another dataframe -

February 15, 2011

say have dataframe my_df column duplicates, e..g

foo bar foo hello 0   1   1   5 1   1   2   5 2   1   3   5

i create dataframe averages duplicates:

foo bar hello 0.5   1   5 1.5   1   5 2.5   1   5

how can in pandas?

so far have managed identify duplicates:

my_columns = my_df.columns my_duplicates = print [x x, y in collections.counter(my_columns).items() if y > 1]

by don't know how ask pandas average them.

you can groupby column index , take mean:

in [11]: df.groupby(level=0, axis=1).mean() out[11]:    bar  foo  hello 0    1  0.5      5 1    1  1.5      5 2    1  2.5      5

a trickier example if there non numeric column:

in [21]: df out[21]:    foo  bar  foo hello 0    0    1    1     1    1    1    2     2    2    1    3

the above raise: dataerror: no numeric types aggregate. not going win any prizes efficiency, here's generic method in case:

in [22]: dupes = df.columns.get_duplicates()  in [23]: dupes out[23]: ['foo']  in [24]: pd.dataframe({d: df[d] d in df.columns if d not in dupes}) out[24]:    bar hello 0    1     1    1     2    1      in [25]: pd.concat(df.xs(d, axis=1) d in dupes).groupby(level=0, axis=1).mean() out[25]:    foo 0  0.5 1  1.5 2  2.5  in [26]: pd.concat([out[24], out[25]], axis=1) out[26]:    foo  bar hello 0  0.5    1     1  1.5    1     2  2.5    1

i think thing take away avoid column duplicates... or perhaps don't know i'm doing.

Search This Blog

KHS

python - Pandas: Create new dataframe that averages duplicates from another dataframe -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -