python - Pandas group by operations on a data frame -

February 15, 2010

i have pandas data frame 1 below.

usrid   jobnos  1       4  1       56  2       23   2       55  2       41  2       5  3       78  1       25  3       1

i group data frame based on usrid. grouped data frame conceptually below.

usrid   jobnos   1    [4,56,25]   2    [23,55,41,5]   3    [78,1]

now, i'm looking in-build api give me usrid maximum job count. above example, usrid-2 has maximum count.

update: instead of usrid maximum job count, want 'n' userids maximum job counts. above example, if n=2 output [2,1]. can done?

something df.groupby('usrid').jobnos.sum().idxmax() should it:

in [1]: import pandas pd  in [2]: stringio import stringio  in [3]: data = """usrid   jobnos    ...:  1       4    ...:  1       56    ...:  2       23     ...:  2       55    ...:  2       41    ...:  2       5    ...:  3       78    ...:  1       25    ...:  3       1"""  in [4]: df = pd.read_csv(stringio(data), sep='\s+')  in [5]: grouped = df.groupby('usrid')  in [6]: grouped.jobnos.sum() out[6]:  usrid 1         85 2        124 3         79 name: jobnos  in [7]: grouped.jobnos.sum().idxmax() out[7]: 2

if want results based on number of items in each group:

in [8]: grouped.size() out[8]:  usrid 1        3 2        4 3        2  in [9]: grouped.size().idxmax() out[9]: 2

update: ordered results can use .order method:

in [10]: grouped.jobnos.sum().order(ascending=false) out[10]:  usrid 2        124 1         85 3         79 name: jobnos

Search This Blog

KHS

python - Pandas group by operations on a data frame -

Comments

Post a Comment

Popular posts from this blog

user interface - Python attempting to create a simple gui, getting "AttributeError: 'MainMenu' object has no attribute 'intro_screen'" -

jquery - Common JavaScript snippet to share files on Google Drive, Dropbox, Box.net or SkyDrive -

Android Gson.fromJson error -