如何在 python 中读取和索引动态生成的文件

我们如何在 python 中从源文件夹读取和索引动态生成的文件，并在代码刷新时将索引与文件夹中新添加或未读的文件一起附加？

自动化工具不断将文件(例如xlsx(放入源文件夹，然后python程序将从文件夹中存在的所有文件中读取并绘制图形，以优化代码的性能，我们计划在刷新代码/应用程序后不读取所有文件，而仅在索引后附加未读文件。

索引可以是局部变量/表，其中包含有关输入文件的信息，例如哪些文件已经加载/读取，以便系统知道现在要读取哪个文件，哪些文件都已读取。这个想法是只读取一次文件，而不是每次刷新后读取所有文件。

以下代码将帮助您提供新文件名及其索引的列表。

这些变量用于：

bag_of_file：已继续的文件名内容列表
curr_files：源文件夹中的文件名的内容列表
new_files ：您感兴趣的文件名的内容列表。

首次运行此代码时bag_of_file为空。

import os
curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
if file not in bag_of_files:
new_files.append(file)
bag_of_files.append(file)
new_files

输出：

['AP Output.csv',
'Delhi Output.csv',
'Gujrat Output.csv',
'Haryana Output.csv',
'Jharkhand Output V1.csv',
'Jharkhand Output V1.xlsx',
'Jharkhand Output.csv',
'Karnataka Output.csv']

下次始终运行以下代码。区别仅在于第 3 行，我们使用了以前版本的 bag_of_files。每次我在同一文件夹中添加一些新文件时。

curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
#bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
if file not in bag_of_files:
new_files.append(file)
bag_of_files.append(file)
new_files

输出：

['Maharashtra Output.csv',
'MP Output.csv',
'Punjab Output.csv',
'Rajsthan Output.csv']

再次运行它:)

输出：

['Bihar Output.csv',
'Tamilnadu Output.csv',
'Telangana Output.csv',
'WB Output.csv']

为了保持答案简单，您可以使用os.listdir((来监视目录内容。要监视程序已经索引的修改文件，请使用 os.stat(( 检查这些文件的修改时间。

相关内容

最新更新

热门标签：