从python中的不同路径同时提取多个Excel文件，同时在不同的地方压缩这些文件

我有30个文件夹(从2021-006-01到2022-006-30)，每个文件夹中我有15个excel文件。目前我正在单独使用此代码30次

file1= glob.glob('C:/Users/Dell/Downloads/2021-06-01/*')

制作一个文件file1，为每个文件夹运行数据处理操作(进入15个excel文件)。这样我就可以用file 1到file 30，然后用concat来制作一个单独的文件。有没有办法自动化这个过程，因为我不想单独运行这个操作30次？我不知道如何制作一个循环，以便从不同的路径提取文件。我也有数据，但它们被压缩在文件夹中(从2021-06-01到2022-06-30)。因此，一个接一个地去那里，解压缩它们，然后逐个运行操作是很乏味的。我怎样才能更容易地达到这两个目的？我看到了通过搜索来解压缩操作解决方案。我不能这样做，同时我还必须获得我提到的另一个目的(遍历不同的文件夹，一次逐个提取，一次生成file 1到file 30)我的目录看起来像：

- download                                                                                                                           
-month                                                                                                                                
-2021-01-01 
-AA                                                                                                         
-file.zip                                                                                                                                       
-a list of .xlsx file 
-BB
-CC                                                                                  
-2021-01-02 
-AA                                                                                                          
-file.zip                                                                                                                                     
-a list of .xlsx file                                                                                            
-BB
-CC
........................................................................................................................................................................... 
-2021-01-30

现在我不想连接这些xlsx文件。我想在这些excel文件上一个接一个地运行某个操作，然后将它们连接起来。但无法做到这一点。

下面是一个适用于您的Python脚本：

import os
import shutil
import time
import pandas as pd

def read_csv_or_excel(f):
if f.endswith(".csv"):
df = pd.read_csv(f"{root}/{f}", sep="t")
if f.endswith(".xlsx"):
df = pd.read_excel(f"{root}/{f}")
return df

for root, dirs, files in os.walk("./questions/69878352/"):
#     print(root, dirs, files)
if root.split("/")[-1].startswith("20"):
print(root)
appended = []
dfs = []
for f in files:
if f not in appended:
print(f)
if f.endswith(".csv") or f.endswith(".xlsx"):
dfs.append(read_csv_or_excel(f))
elif f.endswith(".zip"):
shutil.unpack_archive(f"{root}/{f}", f"{root}/")
time.sleep(0.5)
f = f"{f[:-4]}.xlsx" # ← this assumes any zipped files will be Excel files...
dfs.append(read_csv_or_excel(f))
else:
continue
appended.append(f[:-4])
pd.concat(dfs).to_excel(f"{root}.xlsx"

Lmk，如果它不起作用！我的测试数据不是最好的，我必须花更多的时间来制作更好的测试数据，使其100%有效，所以如果你有任何问题，这可能只是一个必要的小调整

您也可以尝试在终端中使用bash：

$ find . -maxdepth 5 -name *.zip | parallel unzip # this will unzip everything in one command
$ find . -maxdepth 5 -name *.xlsx | parallel # perform whatever operation you want on all the excel files

相关内容

最新更新

热门标签：