我正在用panda处理具有以下结构的日志文件,所有日志文件都具有相同的结构,并且包含关于一台机器的数据,这些数据应该可以缩减为1行:
Column1 Value1
Column2 Value2
Column3 Value3
Column4 Value4
Column5 Value5
我正在使用以下代码:
import pandas as pd
import glob
log_files = [i for i in glob.glob(inputdir+'***.log', recursive=True)]
appended_data = []
for logfile in log_files:
fileobject = open(logfile)
df = pd.read_csv(fileobject, sep='t', lineterminator='n', names=['Column','Value'])
df = df.pivot(columns = 'Column', values = 'Value')
appended_data.append(df)
logdf = pd.concat(appended_data)
logdf = logdf.reset_index(drop=True)
logdf = logdf.rename_axis(columns=None)
然而,这为每列创建一行,而不是将所有行减少为一行:
Column Column1 Column2 Column3 Column4 Column5
0 1 NaN NaN NaN NaN
1 NaN 2 NaN NaN NaN
2 NaN NaN 3 NaN NaN
3 NaN NaN NaN 4 NaN
4 NaN NaN NaN NaN 5
df应具有以下格式:
Column1 Column2 Column3 Column4 Column5
0 1 2 3 4 5
有没有一种有效的方法可以通过更改读取CSV设置或转换df来解决这个问题?
下面的解决方案有效,但我不认为它特别好。
df.sort_values(by='A',inplace=True)
df = df.fillna(method='ffill')
df.drop_duplicates(["A"],keep='last',inplace=True)
我认为您正在沿着行进行连接,如下所示:
import pandas as pd
appended_data = [
pd.DataFrame({"Column1": ["1"],}),
pd.DataFrame({"Column2": ["2"],}),
pd.DataFrame({"Column3": ["3"],}),
pd.DataFrame({"Column4": ["4"],}),
pd.DataFrame({"Column5": ["5"],}),
]
logdf = pd.concat(appended_data)
logdf = logdf.reset_index(drop=True)
logdf = logdf.rename_axis(columns=None)
print(logdf)
# Output
Column1 Column2 Column3 Column4 Column5
0 1 NaN NaN NaN NaN
1 NaN 2 NaN NaN NaN
2 NaN NaN 3 NaN NaN
3 NaN NaN NaN 4 NaN
4 NaN NaN NaN NaN 5
根据Pandas文档,您可以通过指定axis=1
来连接列,如下所示:
logdf = (
pd
.concat(appended_data, axis=1)
.reset_index(drop=True)
.rename_axis(columns=None)
)
print(logdf)
# Output
Column1 Column2 Column3 Column4 Column5
0 1 2 3 4 5