我有许多CSV文件,每个文件名为file1_OUT.CSV、file2_OUT.CSV[…]file52_OUT.CSV等。csv文件的内容如下所示:
| header1 | header2 |
| ---------- | ------- |
| 0.0000E+00 | ax |
| 1.0000E+00 | ay |
| 2.0000E+02 | bx |
| 3.0000E+03 | by |
| 4.0000E+03 | cx |
| 4.0000E+01 | cy |
| 0.0000E+00 | dx |
| 0.0000E+00 | dy |
对于每个文件,我想创建8个字典(ax,ay,bx,by,cx,cy,dx,dy(,它们看起来像这样:
ax = {'file1': 0.0000E+00, 'file2': 5.0000E+00, 'file3': 2.0000E+00 ... }
ay = {'file1': 1.0000E+00, 'file2': 0.0000E+00, 'file3': 3.0000E+00 ... }
bx = {...}
by = {...}
...
字典中的数字来自名为header1的列。
我是python新手,但我使用以下代码提取了ax、ay等的值:
import os, re, csv, glob
import pandas as pd
import numpy as np
from pathlib import Path
from os import listdir
for file in Path(directory).glob('*_OUT.csv'):
with open(file, mode='r') as inp:
ax = df['header1'][0]
ay = df['header1'][1]
bx = df['header1'][2]
by = df['header1'][3]
cx = df['header1'][4]
cy = df['header1'][5]
dx = df['header1'][6]
dy = df['header1'][7]
print(ax, ay, bx, by, cx, cy, dx, dy)
不幸的是,srings被称为ax,ay。。。对于每个文件,我想它们在每次迭代中都被过度渲染了。
此外,我还可以用以下代码将文件名提取到列表中:
files_dir = listdir(directory)
new_list = []
for names in files_dir:
if names.endswith("_OUT.csv"):
new_list.append(names.strip('.csv'))
print(new_list)
我不确定我的尝试有多有用,因为我不能组合ax,ay,bx。。。字符串,列表中包含文件名,字典中包含名称(即我输入的csv文件的第二列(。有人有更好的主意吗?
假设您的python脚本与csv文件位于同一个文件目录中,您应该能够执行这样的操作-这将返回一个嵌套的字典结构,带有"头2";作为每种情况下的关键。
import csv
import os
# gets current working directory of the python file - if neccessary, hard type the directory with the CSVs and replace cwd below
cwd = os.getcwd()
# list comprehension to build list of target CSV files in directory
target_files = [file for file in os.listdir(cwd) if file.endswith('csv')]
# create output dictionary
file_summaries = {}
# loop over files in target list
for filename in target_files:
# open each file
with open(filename, 'r') as file_in:
# pass file to built-in csv.DictReader - returns data in list of dictionaries [{}, {}]
file_data = csv.DictReader(file_in)
# loop over each files' data - and set default info into file summary object
for row in file_data:
file_summaries.setdefault(row['header2'], {})
file_summaries[row['header2']].setdefault(filename, row['header1'])
# display result
print(file_summaries)