如果不是索引中的第一个，请重命名编辑字符串

我有pandas数据帧，包含以下格式的信息：

word_char>word_index//tr>oo[][/tr>B-bar[/tr>B-名称<1>//tr>B-name>B-名称<1>//tr>n<1>//tr>B-wsd<1td>3[/tr>>B序列

sentence_num	sent_word
0	foo	B-foo	1
0	foo	B-foo	1
0	foo	B-foo	1
0	[]	B-ws		2
0	bar	B-bar	B	3
0	bar	B-bar	a	3
0	bar	r	3
1	john	j
1	john	o	1
1	john	h
1	john	B-名称
1	[]	[]
1	doe	B序列
1	doe	B序列	o	3
1	doe	e	3

使用布尔索引：

# is word_char not the first letter?
# and sent_word is not "[ ]"
m = ( df['sent_word'].str[0].ne(df['word_char']) 
& df['sent_word'].ne('[ ]')
)
# for those rows, change the B into I
df.loc[m, 'tag'] = 'I'+df.loc[m, 'tag'].str[1:]

输出：

sentence_num sent_word     tag word_char  word_index
0              0       foo   B-foo         f           1
1              0       foo   I-foo         o           1
2              0       foo   I-foo         o           1
3              0       [ ]    B-ws       [ ]           2
4              0       bar   B-bar         b           3
5              0       bar   I-bar         a           3
6              0       bar   I-bar         r           3
7              1      john  B-name         j           1
8              1      john  I-name         o           1
9              1      john  I-name         h           1
10             1      john  I-name         n           1
11             1       [ ]    B-ws       [ ]           2
12             1       doe   B-sur         d           3
13             1       doe   I-sur         o           3
14             1       doe   I-sur         e           3

相关内容

最新更新

热门标签：