对于当前项目,我计划运行多个子文件夹,每个子文件夹都包含文件num.txt
和sub.txt
(但都有不同的内容(。
我已经尝试过使用后续的转换公式通过for subdir, dirs, files in os.walk(rootdir):
设置循环,该公式允许脚本运行,但不会产生任何结果。
有什么聪明的调整来激活从txt到csv的文件类型转换吗?我目前使用的代码如下:
import pandas as pd
import os
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, dirs, files in os.walk(rootdir):
for file in files:
# Converation from TXT to CSV
read_file1 = pd.read_csv("num.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv("df1.csv")
read_file2 = pd.read_csv("sub.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv("df2.csv")
您需要将subdirs
路径添加到文件名中。每个dirs
或files
都不需要这样做(所以我把它们都做成了"_"(,因为每个子目录都已经在for中访问过一次了。
import pandas as pd
import os
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, _, _ in os.walk(rootdir):
# Converation from TXT to CSV
try:
read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(os.path.join(subdir, "df1.csv"))
except FileNotFoundError:
pass
try:
read_file2 = pd.read_csv(os.path.join(subdir, "sub.txt"),delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv(os.path.join(subdir, "df2.csv"))
except FileNotFoundError:
pass
循环到像"这样的隐藏目录中;。ipynb_checkpoint";在linux上基本上是无害的,但你可以过滤掉它们。当您执行自上而下的os.walk
时,您可以通过从";dirs";列表使用win32api.GetFileAttributes
也可以在Windows上执行类似的操作。
for subdir, dirs, _ in os.walk(rootdir):
dirs[:] = [name for name in dirs if not name.startswith(".")]
...do the rest...
您可以使用pathlib
来更紧凑地连接路径。其Path
对象覆盖除法以联接路径字符串。
import pandas as pd
import os
from pathlib import Path
# Directory of root folder
rootdir = '/Users/name/SEC'
# Iteration over sub-folders
for subdir, dirs, _ in os.walk(rootdir):
# filter out hidden
dirs[:] = [name for name in dirs if not name.startswith(".")]
subdir = Path(subdir)
# Converation from TXT to CSV
try:
read_file1 = pd.read_csv(subdir/"num.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(subdir/"df1.csv")
except FileNotFoundError:
pass
try:
read_file2 = pd.read_csv(subdir/"sub.txt",delimiter="t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
read_file2.to_csv(subdir/"df2.csv")
except FileNotFoundError:
pass
您一直在读写相同的两个文件。您所需要做的就是完成您交给pd.read_csv
的路径。
for subdir, dirs, files in os.walk(rootdir):
read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="t", sep=',',
error_bad_lines=False, index_col=False, dtype='unicode')
read_file1.to_csv(os.path.join(subdir, "df1.csv"))