使用窗口和层次结构逻辑,使用Pandas创建计算列



给定以下示例,我如何创建计算列"parent_node";?

import pandas as pd
#create dataframe with just node column
df = pd.DataFrame({
"node": [
"N07 S40 G S06 S29 G N13", "N07", "N07 S28", "N07 S28 G N06 S16",
"N08 N05", "N07 S28 G N05", "N08 N05 G N27", "N07 S28 G N05 N03",
"N07 S28 G N05 N03 G S31", "N07 S28 G N06 S16 G S32"
]
})
#create column called count_of_spaces_in_node
def countSpaces(cell):
try:
return cell.count(" ")
except:
return 0
df["count_of_spaces_in_node"] = df["node"].apply(countSpaces)
#sort by count_of_spaces_in_node, then by node
df = df.sort_values(by=["count_of_spaces_in_node", "node"])
#reset index
df = df.reset_index(drop=True)
#create column called length_of_node
df['length_of_node'] = df['node'].str.len()

生成的df如下所示:

节点count_of_space_in_node节点长度
N0703
N07 S2817
N08 N0517
N07 S28 G N05313
N08 N05 G N27313
N07 S28 G N05 N03417
N07 S28 G N06 S16417
N07 S28 G N05 N03 G S31623
N07 S28 G N06 S16 G S32623
N07 S40 G S06 S29 G N13623

假设N08 N05 G N27将有一个父节点作为N08 N05,则生成了以下片段。

试试下面的片段,

df = pd.DataFrame({
"node": [
"N07 S40 G S06 S29 G N13", "N07", "N07 S28", "N07 S28 G N06 S16",
"N08 N05", "N07 S28 G N05", "N08 N05 G N27", "N07 S28 G N05 N03",
"N07 S28 G N05 N03 G S31", "N07 S28 G N06 S16 G S32"
]
})

node_list = [i.split() for i in df["node"]]
def find_par_node(x):

lis = x.split(" ")
for i in range(-1,-len(lis),-1):
if (lis[:i] in node_list):
return " ".join(lis[:i])
return np.nan

df["parent_node"] = df["node"].apply(find_par_node)
print(df)
node                     parent_node
N07 S40 G S06 S29 G N13      N07
N07                          NaN
N07 S28                      N07
N07 S28 G N06 S16            N07 S28
N08 N05                      NaN
N07 S28 G N05                N07 S28
N08 N05 G N27                N08 N05
N07 S28 G N05 N03            N07 S28 G N05
N07 S28 G N05 N03 G S31      N07 S28 G N05 N03
N07 S28 G N06 S16 G S32      N07 S28 G N06 S16

最新更新