给定8列的数据集,我想检查是否有一行匹配基于其他行的特定值并将其删除,所有这些都在同一数据集中。
下面是一个例子:
<表类>
IP_Src_X
IP_Dst_X
Port_Src_X
Port_Dst_X
IP_Src_Y
IP_Dst_Y
Port_Src_Y
Port_Dst_Y
tbody><<tr>10.00.000.00 90.00.000.00 1000 3000 90.00.000.00 10.00.000.00 3000 1000 60.50.500.00 30.000.300.00 8000 2000 30.000.300.00 60.50.500.00 2000 8000 66.00.000.00 10.00.000.00 5000 7000 10.00.000.00 66.00.000.00 7000 5000 90.00.000.00 10.00.000.00 3000 1000 10.00.000.00 90.00.000.00 1000 3000 10.00.000.00 66.00.000.00 7000 5000 66.00.000.00 10.00.000.00 5000 7000 表类>
使用您提供的数据框架:
import pandas as pd
df = pd.DataFrame(
{
"IP_Src_X": [
"100000000",
"605050000",
"660000000",
"900000000",
"100000000",
],
"IP_Dst_X": [
"900000000",
"3000030000",
"100000000",
"100000000",
"660000000",
],
"Port_Src_X": [1000, 8000, 5000, 3000, 7000],
"Port_Dst_X": [3000, 2000, 7000, 1000, 5000],
"IP_Src_Y": [
"900000000",
"3000030000",
"100000000",
"100000000",
"660000000",
],
"IP_Dst_Y": [
"100000000",
"605050000",
"660000000",
"900000000",
"100000000",
],
"Port_Src_Y": [3000, 2000, 7000, 1000, 5000],
"Port_Dst_Y": [1000, 8000, 5000, 3000, 7000],
}
)
下面是使用Pandas concat的一种方法:
# Stack X values onto Y values and remove duplicates
new_df = pd.concat(
[
df[[f"IP_Src_{x}", f"IP_Dst_{x}", f"Port_Src_{x}", f"Port_Dst_{x}"]].rename(
columns={
f"IP_Src_{x}": "IP_Src",
f"IP_Dst_{x}": "IP_Dst",
f"Port_Src_{x}": "Port_Src",
f"Port_Dst_{x}": "Port_Dst",
}
)
for x in ["X", "Y"]
]
)
new_df = df.drop_duplicates(keep="first")
# Stack first half of new_df onto switched second half
first_half = df.iloc[: int(df.shape[0] / 2), :]
first_half.columns = [i for i in range(first_half.shape[1])]
second_half = df.iloc[int(df.shape[0] / 2) :, :].reindex(
["IP_Dst", "IP_Src", "Port_Dst", "Port_Src"], axis=1
)
second_half.columns = [i for i in range(second_half.shape[1])]
# Filter df with remaining non duplicated rows
rows_to_keep = pd.concat([first_half, second_half]).drop_duplicates(keep="first").index
df = df[df.index.isin(rows_to_keep)]
:
print(df)
# Output
IP_Src_X IP_Dst_X Port_Src_X Port_Dst_X IP_Src_Y IP_Dst_Y
0 100000000 900000000 1000 3000 900000000 100000000
1 605050000 3000030000 8000 2000 3000030000 605050000
2 660000000 100000000 5000 7000 100000000 660000000
Port_Src_Y Port_Dst_Y
0 3000 1000
1 2000 8000
2 7000 5000