从两个不同的数据帧构建碎片



我想构建三元组:源 --> 目标 --> 边缘并将这些三元组存储在新的数据帧中。

我有两个数据框

Accident_ID Location  CarID_1  CarID_2  DriverID_1  DriverID_2
0            1    Tartu     1000     1001           1           3
1            2   Tallin     1002     1003           2           5
2            3    Tartu     1004     1005           4           6
3            4   Tallin     1006     1007           7           8

User_ID First Name Last Name  Age           Address  Accident_ID     ROLE
0        1    Chester    Murphy   25  Narva 108, Tartu            1   Driver
1        2     Walter    Turner   26   Tilgi 49, Tartu            2   Driver
2        3      Daryl    Fowler   25    Piik 67, Tartu            1   Driver
3        4        Ted    Nelson   45   Herne 20, Tartu            3   Driver
4        5     Olivia  Crawford   38  Kalevi 25, Tartu            2   Driver
5        1    Chester    Murphy   25  Narva 108, Tartu            2  Witness
6        6        Amy    Miller   27   Riia 408, Tartu            3   Driver
7        7        Tes     Smith   25  Narva 108, Tartu            4   Driver
8        8       Josh     Blake   36  Parnu 37, Tallin            4   Driver
9        3      Daryl    Fowler   25    Piik 67, Tartu            4  Witness

我必须形成的三元组就是这种模式 [![在此输入图像描述][2]][2]

这个的 Python 代码是什么?我已经写了这个,但我得到错误见证没有定义

df3 = df1.merge(df2,on='Accident_ID')
df3["train"] = df3.Accident_ID < 5 
df3["train"] .value_counts()
triples = []
for _, row in df3[df3["train"]].iterrows():
if row["ROLE"] == "Driver":
if row["User_ID"] == row["DriverID_1"]:
Drives = (row["User_ID"],row["CarID_1"], "Drives")
elif row["User_ID"] == row["DriverID_2"]:  
Drives = (row["User_ID"],row["CarID_2"], "Drives")
else:    
Witness = (row["User_ID"],row["Accident_ID"], "Witness") 
Involved_in_first = (row["CarID_1"],row["Accident_ID"], "Involved in")
Involved_in_second = (row["CarID_2"],row["Accident_ID"], "Involved in")
Happened_in = (row["Accident_ID"],row["Location"], "Happened in")
Lives_in = (row["User_ID"],row["Address"], "Lives in")
triples.extend((Drives , Witness  , Involved_in_first,Involved_in_second, Happened_in , Lives_in ))

triples_df = pd.DataFrame(triples, columns=["Source", "Target", "Edge"])
triples_df.shape

你应该像这样,并对其余的边缘遵循相同的过程:

df = df2.merge(df1, on=['Accident_ID'], how='inner')
print(df)
columns = ['Source', 'Target', 'Edge']
rows = []
for i in range(0, df.shape[0]):
row1 = [
df.iloc[i]['First_Name'],
df.iloc[i]['CarID_1'],
'Drives'
]
row2 = [
df.iloc[i]['First_Name'],
df.iloc[i]['Accident_ID'],
'Witness'
]
rows.append(row1)
rows.append(row2)
df_g = pd.DataFrame(rows, columns=columns)
print(df_g)

输出:

Source Target     Edge
0   Chester   1000   Drives
1   Chester      1  Witness
2     Daryl   1000   Drives
3     Daryl      1  Witness
4    Walter   1002   Drives
5    Walter      2  Witness
6    Olivia   1002   Drives
7    Olivia      2  Witness
8   Chester   1002   Drives
9   Chester      2  Witness
10      Ted   1004   Drives
11      Ted      3  Witness
12      Amy   1004   Drives
13      Amy      3  Witness
14      Tes   1006   Drives
15      Tes      4  Witness
16     Josh   1006   Drives
17     Josh      4  Witness
18    Daryl   1006   Drives
19    Daryl      4  Witness

最新更新