在我的数据帧中,我有一列数据作为列表,如[细胞,蛋白质,表达],我想将其转换为一组单词,如细胞,蛋白质,表达式,它应该适用于数据帧的整个列。 请建议可能的方法。
试试这个
data['column_name'] = data['column_name'].apply(lambda x: ', '.join(x))
问题是df['Final_Text']
不是一个列表,而是一个字符串。 首先尝试使用ast.literal_eval
:
import ast
from io import StringIO
# your sample df
s = """
,Final_Text
0,"['study', 'response', 'cell']"
1,"['cell', 'protein', 'effect']"
2,"['cell', 'patient', 'expression']"
3,"['patient', 'cell', 'study']"
4,"['study', 'cell', 'activity']"
"""
df = pd.read_csv(StringIO(s))
# convert you string of a list of to an actual list
df['Final_Text'] = df['Final_Text'].apply(ast.literal_eval)
# use a lambda expression with join to keep the text inside the list
df['Final_Text'] = df['Final_Text'].apply(lambda x: ', '.join(x))
Unnamed: 0 Final_Text
0 0 study, response, cell
1 1 cell, protein, effect
2 2 cell, patient, expression
3 3 patient, cell, study
4 4 study, cell, activity