我需要在pandas DataFrame中的特定位置添加特定的行



我目前正在处理一个项目,每当标记的句子结束时,我都需要添加特定的行。每当"N"列等于1时,就意味着一个新句子开始了。我想为每个句子添加两行:一行在句子开头用"Pos"=START,另一行在每行结尾用"Pos"=End。这就是DataFrame的样子:

POSTAG = {
'N': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9],
'Name': ['ἐρᾷ','μὲν','ἁγνὸς','οὐρανὸς','τρῶσαι','χθόνα',',','ἔρως','δὲ','γαῖαν','λαμβάνει','γάμου','τυχεῖν','.','ὄμβρος','δ̓','ἀπ̓','εὐνάοντος','οὐρανοῦ','πεσὼν','ἔκυσε','γαῖαν','.','ἡ','δὲ','τίκτεται','βροτοῖς','μήλων','τε','βοσκὰς','καὶ','βίον','Δημήτριον','.','δενδρῶτις','ὥρα','δ̓','ἐκ','νοτίζοντος','γάμου','τέλειος','ἐστί','.'],
'Pos': ['VERB','ADV','ADJ','NOUN','VERB','NOUN','PUNCT','NOUN','CCONJ','NOUN','VERB','NOUN','VERB','PUNCT','NOUN','ADV','ADP','ADJ','NOUN','VERB','VERB','NOUN','PUNCT','DET','ADV','VERB','NOUN','NOUN','ADV','NOUN','CCONJ','NOUN','ADJ','PUNCT','NOUN','NOUN','ADV','ADP','VERB','NOUN','ADJ','VERB','PUNCT']
}
df = pd.DataFrame(POSTAG, columns = ['N', 'Name','Pos'])
print (df)

在这种情况下,我需要索引0和15处的[Nan,Nan,START]标记。以及索引14处的[Nan,Nan,END]标签。我需要为我所有的df制作它。我怎么能这么做?

分析您的数据帧,我只是假设您希望在列N中的值1之前插入START,并在列N中的最大连续值之后插入END。如果是这样的话,你可以按照

首先创建两个伪数据帧start_dfend_df

start_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})
end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})

然后在N列中拆分具有连续值的数据帧

mask = ~df['N'].diff().fillna(0).eq(1)
gb = df.groupby(mask.cumsum())
groups = [gb.get_group(x) for x in gb.groups]

此外,在每组前后插入伪数据帧

res = []
for group in groups:
res.append(start_df)
res.append(group)
res.append(end_df)

最后,通过在列表中插入数据帧来创建数据帧

df_ = pd.concat(res).reset_index(drop=True)
# print(df_)
N        Name      Pos
0    NaN         NaN  ->START
1    1.0         ἐρᾷ     VERB
2    2.0         μὲν      ADV
3    3.0       ἁγνὸς      ADJ
4    4.0     οὐρανὸς     NOUN
5    5.0      τρῶσαι     VERB
6    6.0       χθόνα     NOUN
7    7.0           ,    PUNCT
8    8.0        ἔρως     NOUN
9    9.0          δὲ    CCONJ
10  10.0       γαῖαν     NOUN
11  11.0    λαμβάνει     VERB
12  12.0       γάμου     NOUN
13  13.0      τυχεῖν     VERB
14  14.0           .    PUNCT
15   NaN         NaN    END<-
16   NaN         NaN  ->START
17   1.0      ὄμβρος     NOUN
18   2.0          δ̓      ADV
19   3.0         ἀπ̓      ADP
20   4.0   εὐνάοντος      ADJ
21   5.0     οὐρανοῦ     NOUN
22   6.0       πεσὼν     VERB
23   7.0       ἔκυσε     VERB
24   8.0       γαῖαν     NOUN
25   9.0           .    PUNCT
26   NaN         NaN    END<-
27   NaN         NaN  ->START
28   1.0           ἡ      DET
29   2.0          δὲ      ADV
30   3.0    τίκτεται     VERB
31   4.0     βροτοῖς     NOUN
32   5.0       μήλων     NOUN
33   6.0          τε      ADV
34   7.0      βοσκὰς     NOUN
35   8.0         καὶ    CCONJ
36   9.0        βίον     NOUN
37  10.0   Δημήτριον      ADJ
38  11.0           .    PUNCT
39   NaN         NaN    END<-
40   NaN         NaN  ->START
41   1.0   δενδρῶτις     NOUN
42   2.0         ὥρα     NOUN
43   3.0          δ̓      ADV
44   4.0          ἐκ      ADP
45   5.0  νοτίζοντος     VERB
46   6.0       γάμου     NOUN
47   7.0     τέλειος      ADJ
48   8.0        ἐστί     VERB
49   9.0           .    PUNCT
50   NaN         NaN    END<-

相关内容

  • 没有找到相关文章

最新更新