我有100个文件在文件夹命名为1.htm - 100.htm。我运行这段代码从文件中提取一些信息,并将提取的信息放在另一个文件final.txt中。目前,我必须手动运行100个文件的程序。我需要构建一个循环,可以运行程序100次,读取每个文件一次。(请详细解释我需要在我的代码中做的确切编辑)
下面是文件6.htm: 的代码import glob
import BeautifulSoup
from BeautifulSoup import BeautifulSoup
fo = open("6.htm", "r")
bo = open("output.txt" ,"w")
f = open("final.txt","a+")
htmltext = fo.read()
soup = BeautifulSoup(htmltext)
#print len(urls)
table = soup.findAll('table')
rows = table[0].findAll('tr');
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = str(td.find(text=True)) + ';;;'
if(text!=" ;;;"):
bo.write(text);
bo.write('n');
fo.close()
bo.close()
b= open("output.txt", "r")
for j in range (1,5):
str=b.readline();
for j in range(1, 15):
str=b.readline();
c=str.split(";;;")
#print c[1]
if(c[0]=="APD ID:"):
f.write(c[1])
f.write("#")
if(c[0]=="Name/Class:"):
f.write(c[1])
f.write("#")
if(c[0]=="Source:"):
f.write(c[1])
f.write("#")
if(c[0]=="Sequence:"):
f.write(c[1])
f.write("#")
if(c[0]=="Length:"):
f.write(c[1])
f.write("#")
if(c[0]=="Net charge:"):
f.write(c[1])
f.write("#")
if(c[0]=="Hydrophobic residue%:"):
f.write(c[1])
f.write("#")
if(c[0]=="Boman Index:"):
f.write(c[1])
f.write("#")
f.write('n');
b.close();
f.close();
f.close();
print "End"
import os
f = open("final.txt","a+")
for root, folders, files in os.walk('./path/to/html_files/'):
for fileName in files:
fo = open(os.path.abspath(root + '/' + fileName, "r")
...
然后剩下的代码放到那里。
也考虑(最佳实践)
with open(os.path.abspath(root + '/' + fileName, "r") as fo:
...
所以你不要忘记关闭那些文件句柄,因为在你的操作系统中允许打开的文件句柄数量是有限的,这将确保你不会错误地填充它。
让你的代码看起来像这样:
import os
with open("final.txt","a+") as f:
for root, folders, files in os.walk('./path/to/html_files/'):
for fileName in files:
with open(os.path.abspath(root + '/' + fileName, "r") as fo:
...
同时NEVER替换全局变量名,如str
:
str=b.readline();
在代码行末尾也不需要;
,这是Python..我们以舒适的方式编写代码!
最后但并非最不重要的……
if(c[0]=="APD ID:"):
if(c[0]=="Name/Class:"):
if(c[0]=="Source:"):
if(c[0]=="Sequence:"):
if(c[0]=="Length:"):
if(c[0]=="Net charge:"):
if(c[0]=="Hydrophobic residue%:"):
if(c[0]=="Boman Index:"):
应:if(c[0]=="APD ID:"):
elif(c[0]=="Name/Class:"):
elif(c[0]=="Source:"):
elif(c[0]=="Sequence:"):
elif(c[0]=="Length:"):
elif(c[0]=="Net charge:"):
elif(c[0]=="Hydrophobic residue%:"):
elif(c[0]=="Boman Index:"):
除非你一路上修改了c
,当然你不会。所以开关!
该死的,我只是不断发现更多关于这段代码的可怕的事情(你显然是从所有星系的例子中复制粘贴的…):
您可以将上述所有if
/elif
/else
压缩成一个if块:
if(c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:")):
f.write(c[1])
f.write("#")
还有,跳过(
…)
围绕你的if块,再一次…这是Python..我们以一种舒适的方式编程:
if c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:"):
f.write(c[1])
f.write("#")
可能是这样的结构:
# declare main files
bo = open("output.txt" ,"w")
f = open("final.txt","a+")
#loop over range ii = [1,100]
for ii in range(1,101):
fo = open(str(ii) + ".htm", "r")
# Run program like normal
...
...
...
fo.close()
f.close()
bo.close()
Listdir列出特定目录下的所有文件。
正如@Torxed所指出的,最佳实践是使用with子句(这样文件句柄就关闭了)。
你可以像这样查找。htm文件:
import os
# Creates a list of 1-100.htm file names
filenames = map(lambda x: str(x) + ".htm", range(1,101))
for file in os.listdir("/mydir"):
if (file in filenames):
# Do your logic here.