我有一个纯文本文件中的协议转储,格式如下:
Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits)
Bluetooth HCI H4
[Direction: Sent (0x00)]
HCI Packet Type: ACL Data (0x02)
0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................
0010 00 00 00 ...
Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits)
Bluetooth HCI H4
[Direction: Rcvd (0x01)]
HCI Packet Type: HCI Event (0x04)
0000 04 13 05 01 0b 00 01 00 ........
Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits)
Bluetooth HCI H4
[Direction: Rcvd (0x01)]
HCI Packet Type: ACL Data (0x02)
0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G..
0010 00 00 00 01 02 00 04 .......
在这个简化的示例中,帧号380、381等是文本格式中每个帧的第一行的一部分。我想将其转换为pandas数据框架,格式如下:
FrameNumber Details
|---------------------------------------------------------------------------------------|
| | Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits) |
| | Bluetooth HCI H4 |
| 380 | [Direction: Sent (0x00)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................ |
| | 0010 00 00 00 |
|---------------------------------------------------------------------------------------|
| | Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits) |
| | Bluetooth HCI H4 |
| 381 | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: HCI Event (0x04) |
| | 0000 04 13 05 01 0b 00 01 00 ........ |
|---------------------------------------------------------------------------------------|
| | Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits) |
| | Bluetooth HCI H4 |
| 382 | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G.. |
| | 0010 00 00 00 01 02 00 04 ....... |
+---------------------------------------------------------------------------------------+
我尝试使用pandasread_csv()
,但由于我对多行正则表达式选择的知识有限,我无法解决这个问题。谁能帮我想出一个简单的方法来解决这个问题?
另一种解决方案,使用re
模块:
import re
import pandas as pd
all_data = []
with open("data.txt", "r") as f_in:
for (g, n) in re.findall(
r"^(Frame (d+).*?)s*(?=^Frame d+|Z)", f_in.read(), flags=re.M | re.S
):
all_data.append({"FrameNumber": int(n), "Details": g})
df = pd.DataFrame(all_data)
print(df)
打印:
| | FrameNumber | Details |
|---:|--------------:|:-------------------------------------------------------------------------|
| 0 | 380 | Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits) |
| | | Bluetooth HCI H4 |
| | | [Direction: Sent (0x00)] |
| | | HCI Packet Type: ACL Data (0x02) |
| | | 0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................ |
| | | 0010 00 00 00 ... |
| 1 | 381 | Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits) |
| | | Bluetooth HCI H4 |
| | | [Direction: Rcvd (0x01)] |
| | | HCI Packet Type: HCI Event (0x04) |
| | | 0000 04 13 05 01 0b 00 01 00 ........ |
| 2 | 382 | Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits) |
| | | Bluetooth HCI H4 |
| | | [Direction: Rcvd (0x01)] |
| | | HCI Packet Type: ACL Data (0x02) |
| | | 0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G.. |
| | | 0010 00 00 00 01 02 00 04 ....... |
与extract
和groupby
:
df = pd.read_fwf("input2.txt", header=None, names=["Details"])
df["FrameNumber"] = (df["Details"].str.extract(r"(Frame d+)", expand=False)
.where(df["Details"].str.startswith(r"Frame")).ffill())
out = df.groupby("FrameNumber", as_index=False).agg("n".join)
输出:
+---------------+--------------------------------------------------------------------------+
| FrameNumber | Details |
|---------------+--------------------------------------------------------------------------|
| Frame 380 | Frame 380: 19 bytes on wire (152 bits), 19 bytes captured (152 bits) |
| | Bluetooth HCI H4 |
| | [Direction: Sent (0x00)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 00 0e 00 0a 00 01 00 05 0e 06 00 07 07 00 ................ |
| | 0010 00 00 00 ... |
| Frame 381 | Frame 381: 8 bytes on wire (64 bits), 8 bytes captured (64 bits) |
| | Bluetooth HCI H4 |
| | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: HCI Event (0x04) |
| | 0000 04 13 05 01 0b 00 01 00 ........ |
| Frame 382 | Frame 382: 23 bytes on wire (184 bits), 23 bytes captured (184 bits) |
| | Bluetooth HCI H4 |
| | [Direction: Rcvd (0x01)] |
| | HCI Packet Type: ACL Data (0x02) |
| | 0000 02 0b 20 12 00 0e 00 01 00 05 12 0a 00 47 00 00 .. ..........G.. |
| | 0010 00 00 00 01 02 00 04 ....... |