我有多个具有相同列标题的CSV文件,看起来像这样:
| Date & Time | Rain | Flow |
| --------------------- | ----- | ---------- |
| 3/19/2018 12:00 | 0 | 0.51 |
| 3/19/2018 13:00 | 2 | 0.51 |
...
我想从每个CSV中取出"流"列,并根据日期并排放置它们。我面临的问题是日期&每个CSV的时间是不同的,我想根据日期对齐列如果合并时没有日期值,我想留下空白或NaN
我创建了一个新的数据框架,该数据框架的日期范围封装了csv列表中找到的所有日期,但是我无法相应地合并列。
最终的数据帧看起来像
| Date & Time | CSV 1 Flow | CSV 2 Flow | CSV 3 Flow |
| --------------------- | ---------------- | ---------------- | ---------------- |
| 3/19/2018 12:00 | 0.51 | NaN | 0.34 |
| 3/19/2018 13:00 | 0.51 | NaN | 0.47 |
...
目前我所尝试的如下:
csv_files = glob.glob(os.path.join(pwd, "*.csv"))
range = pd.date_range('2017-01-01', periods=45985, freq='H')
df_full = pd.DataFrame({'Date & Time': range})
for j in csv_files:
df_full[j]=''
df_hourly = pd.read_csv(j, usecols=['Date & Time','Flow'])
df_merged = pd.merge(df_full, df_hourly, on='Date & Time', how='left')
我已经得到了代码看起来像:
range = pd.date_range('2017-01-01', periods=45985, freq='H')
df_full = pd.DataFrame({'Date & Time': range})
for filename in csv_files:
df_full[filename] = ''
df = pd.read_csv(filename,header=0, parse_dates=['Date & Time'],
usecols=['Date & Time', 'Flow'])
df_combined = pd.merge(left=df_full,right=df, on='Date & Time', how='outer')
df_combined
输出DF看起来像
| Date & Time | CSV 1 Filepath | CSV 2 Filepath |... | - Flow- |
| --------------------- | ---------------- | ---------------- |... | ------- |
| 01/01/2017 00:00 | BLANK | BLANK |... | 0.34 |
| 01/01/2017 01:00 | BLANK | BLANK |... | 0.25 |
...
整个表是空白的,除了最后一列被标记为'Flow'。脚本似乎没有把值放在正确的列中。
试试这样:
df1 = pd.read_csv('example.csv', parse_dates=['Date & Time'])
df2 = pd.read_csv('example.csv', parse_dates=['Date & Time'])
df_all = df1.merge(df2, on='Date & Time', how='left')
print(df_all)
输出:
Date & Time Rain_x Flow_x Rain_y Flow_y
0 2018-03-19 12:00:00 0 0.51 0 0.51
1 2018-03-19 13:00:00 2 0.51 2 0.51
你的循环大概是这样的:
csv_files = glob.glob(os.path.join(pwd, "*.csv"))
df_all = pd.read_csv(csv_files[0], parse_dates=['Date & Time'], usecols=['Date & Time','Flow'])
for file in csv_files[1:]:
df = pd.read_csv(file, parse_dates=['Date & Time'], usecols=['Date & Time','Flow'])
df_all = df_all.merge(df, on='Date & Time', how='left')