如何在熊猫中阅读未正确分隔的.txt



我有一个。txt文件,它与。csv非常相似,但又不完全相似。正如您所看到的,前4列可以用空格分隔,但最后一个字符串将被分割成不同数量的列。我需要最后一个字符串只有一列。

09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.

我怎么读这个,所以我只有5个一致的列?我以后可能会把日期和时间合并成一列。

您可以预解析每行,然后创建DataFrame,例如:

import pandas as pd
with open('input.txt') as f_input:
data = [line.strip().split(' ', 4) for line in f_input]

df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)

给你:

c1 c2        date      time                                                            desc
0  09  4  10/11/2021  22:21:17                The PLC reported that sorter SS02 has E-stopped.
1  08  4  10/11/2021  22:21:17                  The PLC reported that sorter SS02 has stopped.
2  08  4  10/11/2021  22:21:18                  The PLC reported that sorter SS01 has stopped.
3  20  5  10/11/2021  22:21:18  The PLC reported that purge mode was disabled for sorter SS02.
4  20  5  10/11/2021  22:21:18  The PLC reported that purge mode was disabled for sorter SS01.
5  23  5  10/11/2021  22:21:19                 AUX Sortation has been enabled for sorter SS02.
6  23  5  10/11/2021  22:21:20                 AUX Sortation has been enabled for sorter SS01.

可以通过组合datetime列并将它们转换为datetime来添加datetime列:

df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])

最新更新