我有一个这样的数据帧:
artid link ner_label
1 url1 "{('blanqui', 'Person'): 6, ('walter benjamin', 'Person'): 2}"
2 url2 "{('john', 'Person'): 8, ('steven', 'Person'): 3}"
ner_label的每一行的类型都是字符串。我想要这个:
artid link ner label score
1 url1 'blanqui' 'Person' 6
1 url1 'walter benjamin' 'Person' 2
2 url2 'john' 'Person' 8
2 url2 'steven' 'Person' 3
我该怎么做?我真的不知道该怎么做。
不是最有效的方法,但它会为您完成
from ast import literal_eval
df['ner'] = df['ner_label'].apply(lambda x: list(literal_eval(x).keys()))
df['score'] = df['ner_label'].apply(lambda x: list(literal_eval(x).values()))
df = df.set_index(['artid', 'link', 'ner_label']).apply(pd.Series.explode).reset_index()
df['label'] = [i[1] for i in df['ner']]
df['ner'] = [i[0] for i in df['ner']]
df.drop(['ner_label'], axis=1, inplace=True)
输出:
artid link ner score label
0 1 url1 blanqui 6 Person
1 1 url1 walter benjamin 2 Person
2 2 url2 john 8 Person
3 2 url2 steven 3 Person
以下是只有Panda 的解决方案
df = df.assign(ner_label=df['ner_label'].str.split(', (')).explode('ner_label')
df['ner_label']= df['ner_label'].str.replace('(','').str.replace(')','').
str.replace('{','').str.replace('}','').str.replace('"','')
df[['ner','score']] = df.ner_label.str.split(':', expand=True)
df[['ner','label']] = df.ner.str.split(',', expand=True)
df.drop(columns='ner_label', inplace=True)
输出:
artid link ner score label
0 1 url1 'blanqui' 6 'Person'
0 1 url1 'walter benjamin' 2 'Person'
1 2 url2 'john' 8 'Person'
1 2 url2 'steven' 3 'Person'