我有带有日期的文件名,我想从目录中获取文件范围并将它们放入熊猫数据帧中, 不想加载所有文件
020-03-01.csv 2020-03-23.csv 2020-04-14.csv 2020-05-06.csv
2020-03-02.csv 2020-03-24.csv 2020-04-15.csv 2020-05-07.csv
2020-03-03.csv 2020-03-25.csv 2020-04-16.csv 2020-05-08.csv
2020-03-04.csv 2020-03-26.csv 2020-04-17.csv 2020-05-10.csv
2020-03-05.csv 2020-03-27.csv 2020-04-19.csv 2020-05-11.csv
2020-03-06.csv 2020-03-29.csv 2020-04-20.csv 2020-05-12.csv
2020-03-08.csv 2020-03-30.csv 2020-04-21.csv 2020-05-13.csv
2020-03-09.csv 2020-03-31.csv 2020-04-22.csv 2020-05-14.csv
2020-03-10.csv 2020-04-01.csv 2020-04-23.csv 2020-05-15.csv
2020-03-11.csv
我只想获取在开始日期和结束日期范围内的文件,并将开始日期作为sys.argv[1]
,结束日期sys.argv[2]
即脚本名称2020-03-01 2020-03-20
假设您的脚本与 CSV 文件位于同一目录中(并且所有 CSV 文件的格式相同XXXX-MM-DD.csv
日期单调递增(,并且您已使用 (chmod +x script.py
( 使脚本可执行,以下script.py
应该可以工作:
#!/usr/bin/env python3
import os
import sys
import pandas as pd
filepath = "/path/to/csvfiles/" #[TODO] change to the path where the csv files are located.
try:
first_file = sys.argv[1]
second_file = sys.argv[2]
except:
print("An error occurred. Enter two file names separated by a space in the format of YYYY-MM-D1 YYYY-MM-D2")
sys.exit(1)
csv_files = [csv_file for csv_file in os.listdir(filepath) if csv_file.endswith('.csv')]
ordered_files = sorted(csv_files)
try:
index_first_file = ordered_files.index(first_file)
index_second_file = ordered_files.index(second_file)
except:
print("One of the csv files was not found in the current directory.")
sys.exit(1)
df_list = [pd.read_csv(ordered_files[i]) for i in range(index_first_file, index_second_file+1)]
print(df_list)
为了使用它,只需运行例如./script.py 2020-03-01.csv 2020-03-02.csv