遍历子文件夹并将文件格式从txt转换为csv



对于当前项目,我计划运行多个子文件夹,每个子文件夹都包含文件num.txtsub.txt(但都有不同的内容(。

我已经尝试过使用后续的转换公式通过for subdir, dirs, files in os.walk(rootdir):设置循环,该公式允许脚本运行,但不会产生任何结果。

有什么聪明的调整来激活从txt到csv的文件类型转换吗?我目前使用的代码如下:

import pandas as pd
import os
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, dirs, files in os.walk(rootdir):
for file in files:
# Converation from TXT to CSV
read_file1 = pd.read_csv("num.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv("df1.csv")
read_file2 = pd.read_csv("sub.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv("df2.csv")

您需要将subdirs路径添加到文件名中。每个dirsfiles都不需要这样做(所以我把它们都做成了"_"(,因为每个子目录都已经在for中访问过一次了。

import pandas as pd
import os
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, _, _ in os.walk(rootdir):
# Converation from TXT to CSV
try:
read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(os.path.join(subdir, "df1.csv"))
except FileNotFoundError:
pass
try:
read_file2 = pd.read_csv(os.path.join(subdir, "sub.txt"),delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv(os.path.join(subdir, "df2.csv"))
except FileNotFoundError:
pass

循环到像"这样的隐藏目录中;。ipynb_checkpoint";在linux上基本上是无害的,但你可以过滤掉它们。当您执行自上而下的os.walk时,您可以通过从";dirs";列表使用win32api.GetFileAttributes也可以在Windows上执行类似的操作。

for subdir, dirs, _ in os.walk(rootdir):
dirs[:] = [name for name in dirs if not name.startswith(".")]
...do the rest...

您可以使用pathlib来更紧凑地连接路径。其Path对象覆盖除法以联接路径字符串。

import pandas as pd
import os
from pathlib import Path
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, dirs, _ in os.walk(rootdir):
# filter out hidden
dirs[:] = [name for name in dirs if not name.startswith(".")]
subdir = Path(subdir)
# Converation from TXT to CSV
try:
read_file1 = pd.read_csv(subdir/"num.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(subdir/"df1.csv")
except FileNotFoundError:
pass
try:
read_file2 = pd.read_csv(subdir/"sub.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv(subdir/"df2.csv")
except FileNotFoundError:
pass

您一直在读写相同的两个文件。您所需要做的就是完成您交给pd.read_csv的路径。

for subdir, dirs, files in os.walk(rootdir): 
read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="t", sep=',', 
error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(os.path.join(subdir, "df1.csv"))

最新更新