通过 CSV 文件 [python3.6] 读取和解析



我对python很陌生。我正在尝试将 2 个 csv 文件合并为一个,选择特定的行和列。

CSV1:

Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000

CSV2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000

以下是合并两个文件的代码:

import csv
itertools as IT
filenames = ['csv1.csv', 'csv2.csv']
handles = [open(filename, 'rb') for filename in filenames]    
readers = [csv.reader(f, delimiter=',') for f in handles]
with  open('combined.csv', 'wb') as h:
    writer = csv.writer(h, delimiter=',', lineterminator='n', )
    for rows in IT.izip_longest(*readers, fillvalue=['']*3):
        combined_row = []
        for row in rows:
            row = row[:3] # select the columns you want
            if len(row) == 3:
                combined_row.extend(row)
            else:
                combined.extend(['']*3)
        writer.writerow(combined_row)
for f in handles:
    f.close()

这组合并输出了以下内容:

Host, Time Up, Time Down,Host,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%      (0.000%),server1.test.com:1717,application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000%   (0.000%),,server_hit_rate,100.000% (100.000%)
Average,100.000% (100.000%),0.000% (0.000%),,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,server2.test.com:1717,application_availability_check,100.000%   (100.000%)
,,,,server_hit_rate,100.000% (100.000%)
,,,,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,Average,100.000% (100.000%),0.000% (0.000%)

但在这里我只想提取以下内容-来自 csv1 和 csv2 :

Host, Time Up, Time Down,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%   (0.000%),application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)

有什么办法可以做到这一点吗?

import pandas as pd
df = pd.read_csv('csv1.csv',skipfooter=1)
df2 = pd.read_csv('csv2.csv',skipfooter=1)

combined = pd.merge(df[['Host','Service','Time OK']],df2[['Host','Time Up','Time Down']], on='Host')
combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[0])
combined['Time Up'] = combined['Time Up'].apply(lambda x: x.split('(')[0])
combined['Time Down'] = combined['Time Down'].apply(lambda x: x.split('(')[0])

combined.to_csv('combined.csv',index=False)

您应该能够使用熊猫轻松解决此问题,您是否有该选项?

输出:

,Host, Time Up, Time Down,Service, Time OK
0,server1.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)
1,server2.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)

最新更新