用panda读取日志文件(制表符/换行符分隔,每行包含一列和值)



我正在用panda处理具有以下结构的日志文件,所有日志文件都具有相同的结构,并且包含关于一台机器的数据,这些数据应该可以缩减为1行:

Column1     Value1
Column2     Value2
Column3     Value3
Column4     Value4
Column5     Value5

我正在使用以下代码:

import pandas as pd
import glob
log_files = [i for i in glob.glob(inputdir+'***.log', recursive=True)]
appended_data = []
for logfile in log_files:
fileobject = open(logfile)
df = pd.read_csv(fileobject, sep='t',  lineterminator='n', names=['Column','Value'])
df = df.pivot(columns = 'Column', values = 'Value')
appended_data.append(df)
logdf = pd.concat(appended_data)
logdf = logdf.reset_index(drop=True)
logdf = logdf.rename_axis(columns=None)

然而,这为每列创建一行,而不是将所有行减少为一行:

Column  Column1 Column2 Column3 Column4 Column5 
0   1   NaN NaN NaN NaN 
1   NaN 2   NaN NaN NaN 
2   NaN NaN 3   NaN NaN 
3   NaN NaN NaN 4   NaN 
4   NaN NaN NaN NaN 5   

df应具有以下格式:

Column1 Column2 Column3 Column4 Column5 
0   1       2       3       4       5   

有没有一种有效的方法可以通过更改读取CSV设置或转换df来解决这个问题?

下面的解决方案有效,但我不认为它特别好。

df.sort_values(by='A',inplace=True)
df = df.fillna(method='ffill')
df.drop_duplicates(["A"],keep='last',inplace=True)

我认为您正在沿着行进行连接,如下所示:

import pandas as pd
appended_data = [
pd.DataFrame({"Column1": ["1"],}),
pd.DataFrame({"Column2": ["2"],}),
pd.DataFrame({"Column3": ["3"],}),
pd.DataFrame({"Column4": ["4"],}),
pd.DataFrame({"Column5": ["5"],}),
]
logdf = pd.concat(appended_data)
logdf = logdf.reset_index(drop=True)
logdf = logdf.rename_axis(columns=None)
print(logdf)
# Output
Column1 Column2 Column3 Column4 Column5
0       1     NaN     NaN     NaN     NaN
1     NaN       2     NaN     NaN     NaN
2     NaN     NaN       3     NaN     NaN
3     NaN     NaN     NaN       4     NaN
4     NaN     NaN     NaN     NaN       5

根据Pandas文档,您可以通过指定axis=1来连接列,如下所示:

logdf = (
pd
.concat(appended_data, axis=1)
.reset_index(drop=True)
.rename_axis(columns=None)
)
print(logdf)
# Output
Column1 Column2 Column3 Column4 Column5
0       1       2       3       4       5

相关内容

  • 没有找到相关文章

最新更新