python - Pandas group by operations on a data frame -
i have pandas data frame 1 below.
usrid jobnos 1 4 1 56 2 23 2 55 2 41 2 5 3 78 1 25 3 1 i group data frame based on usrid. grouped data frame conceptually below.
usrid jobnos 1 [4,56,25] 2 [23,55,41,5] 3 [78,1] now, i'm looking in-build api give me usrid maximum job count. above example, usrid-2 has maximum count.
update: instead of usrid maximum job count, want 'n' userids maximum job counts. above example, if n=2 output [2,1]. can done?
something df.groupby('usrid').jobnos.sum().idxmax() should it:
in [1]: import pandas pd in [2]: stringio import stringio in [3]: data = """usrid jobnos ...: 1 4 ...: 1 56 ...: 2 23 ...: 2 55 ...: 2 41 ...: 2 5 ...: 3 78 ...: 1 25 ...: 3 1""" in [4]: df = pd.read_csv(stringio(data), sep='\s+') in [5]: grouped = df.groupby('usrid') in [6]: grouped.jobnos.sum() out[6]: usrid 1 85 2 124 3 79 name: jobnos in [7]: grouped.jobnos.sum().idxmax() out[7]: 2 if want results based on number of items in each group:
in [8]: grouped.size() out[8]: usrid 1 3 2 4 3 2 in [9]: grouped.size().idxmax() out[9]: 2 update: ordered results can use .order method:
in [10]: grouped.jobnos.sum().order(ascending=false) out[10]: usrid 2 124 1 85 3 79 name: jobnos
Comments
Post a Comment