如何将包含字符串列表的列与其他列一起解压缩到下面的行中



我首先需要按列分组,删除不需要的值,然后将其解压缩或解压缩到下一行。

我的数据集如下:

Text             tag
drink coke       mic
eat pizza        mic
eat fruits       yes
eat banana       yes
eat banana       mic
eat fruits       mic
eat pizza        no
eat pizza        mic
eat pizza        yes
drink coke       yes
drink coke       no
drink coke       no
drink coke       yes

我用这个函数来分组。

df = pd.DataFrame(df.groupby(['text'])['tag'].apply(lambda x: list(x.values)))
Text           labels               
eat pizza      [mic,no,mic,yes]    
eat fruits     [yes,mic]           
eat banana     [yes,mic]           
drink coke     [yes,yes,no,no,yes] 

如果列标签中有一个"no"和一个"yes",我需要从列标签中删除这些值,然后重新解压。

输出应该是这样的。

Text             tag
drink coke       mic
eat pizza        mic
eat fruits       yes
eat banana       yes
eat banana       mic
eat fruits       mic
eat pizza        mic

执行:

# Answer, does the group contain both yes and no?
contains_both = (df.groupby('Text')['tag']
.transform(lambda x: all(i in x.values for i in ('yes', 'no'))))
# We'll keep it if it doesn't contain both yes and no
# But if it does, remove the yes and no.
df = df[~contains_both | ~df.tag.isin(['yes', 'no'])]
print(df)

输出:

Text  tag
0  drink coke  mic
1   eat pizza  mic
2  eat fruits  yes
3  eat banana  yes
4  eat banana  mic
5  eat fruits  mic
7   eat pizza  mic

FYI,您的df计算可以缩短为:

df = df.groupby('Text', as_index=False)['tag'].agg(list)
# Output:
Text                      tag
0  drink coke  [mic, yes, no, no, yes]
1  eat banana               [yes, mic]
2  eat fruits               [yes, mic]
3   eat pizza      [mic, no, mic, yes]

最新更新