如果不是索引中的第一个,请重命名编辑字符串



我有pandas数据帧,包含以下格式的信息:

word_char>word_index//tr>oo[][/tr>B-bar[/tr>B-名称<1>//tr>B-name>B-名称<1>//tr>n<1>//tr>B-wsd<1td>3[/tr>>B序列
sentence_num sent_word
0 foo B-foo 1
0 foo B-foo1
0 foo B-foo1
0 [] B-ws2
0 bar B-bar B 3
0 bar B-bar a 3
0 barr3
1 johnj
1 johno1
1 johnh
1 john B-名称
1 [][]
1 doe B序列
1 doe B序列o3
1 doee3

使用布尔索引:

# is word_char not the first letter?
# and sent_word is not "[ ]"
m = ( df['sent_word'].str[0].ne(df['word_char']) 
& df['sent_word'].ne('[ ]')
)
# for those rows, change the B into I
df.loc[m, 'tag'] = 'I'+df.loc[m, 'tag'].str[1:]

输出:

sentence_num sent_word     tag word_char  word_index
0              0       foo   B-foo         f           1
1              0       foo   I-foo         o           1
2              0       foo   I-foo         o           1
3              0       [ ]    B-ws       [ ]           2
4              0       bar   B-bar         b           3
5              0       bar   I-bar         a           3
6              0       bar   I-bar         r           3
7              1      john  B-name         j           1
8              1      john  I-name         o           1
9              1      john  I-name         h           1
10             1      john  I-name         n           1
11             1       [ ]    B-ws       [ ]           2
12             1       doe   B-sur         d           3
13             1       doe   I-sur         o           3
14             1       doe   I-sur         e           3

最新更新