How to apply "first" and "last" functions to columns while using group by in pandas? -
i have info frame , grouping particular column (or, in other words, values particular column). can in next way: grouped = df.groupby(['columnname'])
.
i imagine result of operation table in cells can contain sets of values instead of single values. usual table (i.e. table in every cell contains 1 single value) need indicate function want utilize transform sets of values in cells single values.
for illustration can replace sets of values sum, or minimal or maximal value. can in next way: grouped.sum()
or grouped.min()
, on.
now want utilize different functions different columns. figured out can in next way: grouped.agg({'columnname1':sum, 'columnname2':min})
.
however, because of reasons cannot utilize first
. in more details, grouped.first()
works, grouped.agg({'columnname1':first, 'columnname2':first})
not work. result nameerror: nameerror: name 'first' not defined
. so, question is: why happen , how resolve problem.
added
here found next example:
grouped['d'].agg({'result1' : np.sum, 'result2' : np.mean})
may need utilize np
? in case python not recognize "np". should import it?
i think issue there 2 different first
methods share name deed differently, 1 groupby objects , another series/dataframe (to timeseries).
to replicate behaviour of groupby first
method on dataframe using agg
utilize iloc[0]
(which gets first row in each grouping (dataframe/series) index):
grouped.agg(lambda x: x.iloc[0])
for example:
in [1]: df = pd.dataframe([[1, 2], [3, 4]]) in [2]: g = df.groupby(0) in [3]: g.first() out[3]: 1 0 1 2 3 4 in [4]: g.agg(lambda x: x.iloc[0]) out[4]: 1 0 1 2 3 4
analogously can replicate last
using iloc[-1]
.
note: works column-wise, et al:
g.agg({1: lambda x: x.iloc[0]})
in older version of pandas utilize irow method (e.g. x.irow(0)
, see previous edits.
a couple of updated notes:
this improve done using nth
groupby method, much faster >=0.13:
g.nth(0) # first g.nth(-1) # lastly
you have take care little, default behaviour first
, last
ignores nan rows... , iirc dataframe groupbys broken pre-0.13... there's dropna
alternative nth
.
you can utilize strings rather built-ins (though iirc pandas spots it's sum
builtin , applies np.sum
):
grouped['d'].agg({'result1' : "sum", 'result2' : "mean"})
group-by pandas
No comments:
Post a Comment