仅从命令行熊猫数据帧加载日期范围的文件



我有带有日期的文件名,我想从目录中获取文件范围并将它们放入熊猫数据帧中, 不想加载所有文件

020-03-01.csv   2020-03-23.csv  2020-04-14.csv  2020-05-06.csv
2020-03-02.csv  2020-03-24.csv  2020-04-15.csv  2020-05-07.csv
2020-03-03.csv  2020-03-25.csv  2020-04-16.csv  2020-05-08.csv
2020-03-04.csv  2020-03-26.csv  2020-04-17.csv  2020-05-10.csv
2020-03-05.csv  2020-03-27.csv  2020-04-19.csv  2020-05-11.csv
2020-03-06.csv  2020-03-29.csv  2020-04-20.csv  2020-05-12.csv
2020-03-08.csv  2020-03-30.csv  2020-04-21.csv  2020-05-13.csv
2020-03-09.csv  2020-03-31.csv  2020-04-22.csv  2020-05-14.csv
2020-03-10.csv  2020-04-01.csv  2020-04-23.csv  2020-05-15.csv
2020-03-11.csv

我只想获取在开始日期和结束日期范围内的文件,并将开始日期作为sys.argv[1],结束日期sys.argv[2]即脚本名称2020-03-01 2020-03-20

假设您的脚本与 CSV 文件位于同一目录中(并且所有 CSV 文件的格式相同XXXX-MM-DD.csv日期单调递增(,并且您已使用 (chmod +x script.py( 使脚本可执行,以下script.py应该可以工作:

#!/usr/bin/env python3
import os
import sys
import pandas as pd
filepath = "/path/to/csvfiles/" #[TODO] change to the path where the csv files are located.
try:
first_file = sys.argv[1]
second_file = sys.argv[2]
except:
print("An error occurred. Enter two file names separated by a space in the format of YYYY-MM-D1 YYYY-MM-D2")
sys.exit(1)
csv_files = [csv_file for csv_file in os.listdir(filepath) if csv_file.endswith('.csv')]
ordered_files = sorted(csv_files)
try:
index_first_file = ordered_files.index(first_file)
index_second_file = ordered_files.index(second_file)
except:
print("One of the csv files was not found in the current directory.")
sys.exit(1)
df_list = [pd.read_csv(ordered_files[i]) for i in range(index_first_file, index_second_file+1)]
print(df_list)

为了使用它,只需运行例如./script.py 2020-03-01.csv 2020-03-02.csv

最新更新