我想构建三元组:源 --> 目标 --> 边缘并将这些三元组存储在新的数据帧中。
我有两个数据框
Accident_ID Location CarID_1 CarID_2 DriverID_1 DriverID_2
0 1 Tartu 1000 1001 1 3
1 2 Tallin 1002 1003 2 5
2 3 Tartu 1004 1005 4 6
3 4 Tallin 1006 1007 7 8
User_ID First Name Last Name Age Address Accident_ID ROLE
0 1 Chester Murphy 25 Narva 108, Tartu 1 Driver
1 2 Walter Turner 26 Tilgi 49, Tartu 2 Driver
2 3 Daryl Fowler 25 Piik 67, Tartu 1 Driver
3 4 Ted Nelson 45 Herne 20, Tartu 3 Driver
4 5 Olivia Crawford 38 Kalevi 25, Tartu 2 Driver
5 1 Chester Murphy 25 Narva 108, Tartu 2 Witness
6 6 Amy Miller 27 Riia 408, Tartu 3 Driver
7 7 Tes Smith 25 Narva 108, Tartu 4 Driver
8 8 Josh Blake 36 Parnu 37, Tallin 4 Driver
9 3 Daryl Fowler 25 Piik 67, Tartu 4 Witness
我必须形成的三元组就是这种模式 [![在此输入图像描述][2]][2]
这个的 Python 代码是什么?我已经写了这个,但我得到错误见证没有定义
df3 = df1.merge(df2,on='Accident_ID')
df3["train"] = df3.Accident_ID < 5
df3["train"] .value_counts()
triples = []
for _, row in df3[df3["train"]].iterrows():
if row["ROLE"] == "Driver":
if row["User_ID"] == row["DriverID_1"]:
Drives = (row["User_ID"],row["CarID_1"], "Drives")
elif row["User_ID"] == row["DriverID_2"]:
Drives = (row["User_ID"],row["CarID_2"], "Drives")
else:
Witness = (row["User_ID"],row["Accident_ID"], "Witness")
Involved_in_first = (row["CarID_1"],row["Accident_ID"], "Involved in")
Involved_in_second = (row["CarID_2"],row["Accident_ID"], "Involved in")
Happened_in = (row["Accident_ID"],row["Location"], "Happened in")
Lives_in = (row["User_ID"],row["Address"], "Lives in")
triples.extend((Drives , Witness , Involved_in_first,Involved_in_second, Happened_in , Lives_in ))
triples_df = pd.DataFrame(triples, columns=["Source", "Target", "Edge"])
triples_df.shape
你应该像这样,并对其余的边缘遵循相同的过程:
df = df2.merge(df1, on=['Accident_ID'], how='inner')
print(df)
columns = ['Source', 'Target', 'Edge']
rows = []
for i in range(0, df.shape[0]):
row1 = [
df.iloc[i]['First_Name'],
df.iloc[i]['CarID_1'],
'Drives'
]
row2 = [
df.iloc[i]['First_Name'],
df.iloc[i]['Accident_ID'],
'Witness'
]
rows.append(row1)
rows.append(row2)
df_g = pd.DataFrame(rows, columns=columns)
print(df_g)
输出:
Source Target Edge
0 Chester 1000 Drives
1 Chester 1 Witness
2 Daryl 1000 Drives
3 Daryl 1 Witness
4 Walter 1002 Drives
5 Walter 2 Witness
6 Olivia 1002 Drives
7 Olivia 2 Witness
8 Chester 1002 Drives
9 Chester 2 Witness
10 Ted 1004 Drives
11 Ted 3 Witness
12 Amy 1004 Drives
13 Amy 3 Witness
14 Tes 1006 Drives
15 Tes 4 Witness
16 Josh 1006 Drives
17 Josh 4 Witness
18 Daryl 1006 Drives
19 Daryl 4 Witness