PANDAS:汇总一列以创建一个非复杂的序列

我有以下数据框架my_df：

 name    timestamp     color
 ---------------------------
 John    2017-01-01    blue
 John    2017-01-02    blue
 John    2017-01-03    blue
 John    2017-01-04    yellow
 John    2017-01-05    red
 John    2017-01-06    red
 Ann     2017-01-04    green
 Ann     2017-01-05    orange
 Ann     2017-01-06    orange
 Ann     2017-01-07    red
 Ann     2017-01-08    black
 Dan     2017-01-11    blue
 Dan     2017-01-12    blue
 Dan     2017-01-13    green
 Dan     2017-01-14    yellow

然后，我使用以下代码查找每个人的颜色序列：

new_df = my_df.groupby(['name'], as_index=False).color 
    .agg({"color_list": lambda x: list(x)})

然后new_df看起来像：

   name        color_list
    -----------------------------------------------
    John        blue, blue, blue, yellow, red, red
    Ann         green, orange, orange,red, black
    Dan         blue, blue, green, yellow

但是，如果我想创建一个color_seq（无连接重复的颜色）而不是color_list，则如何修改上述代码？谢谢！

   name        color_seq
    -----------------------------------------------
    John        blue, yellow, red
    Ann         green, orange, red, black
    Dan         blue, green, yellow

如果允许非连续重复项，则必须仔细过滤。一种方法：

def filter(l):
    l.append(None)
    return ','.join([x for (i,x) in enumerate (l[:-1])
    if l[i] != l[i+1]])
out=df.groupby('name')['color'].apply(list).apply(filter)

name
Ann     green,orange,red,black
Dan          blue,green,yellow
John           blue,yellow,red
Name: color, dtype: object

相关内容

最新更新

热门标签：