我学习Python并尝试对html表进行解析,然后我将创建。csv文件以将数据导入mySQL。
>>> htmlread = handlestatbydate.read()
>>> soup = BeautifulSoup("".join(htmlread))
>>> souptable = soup('tbody', limit=2)[1].findAll('tr')
>>> souptablestr = ''.join(str(t) for t in souptable)
>>> reclearbyonce = re.compile('</tr><tr>n|^<tr>n|</tr>$')
>>> recleartd = re.compile(r'</td>|<td.*?>')
>>> retdtd = re.compile('""| ')
>>> soupclearbyonce = reclearbyonce.sub('', souptablestr)
>>> soupcleartd = recleartd.sub('"', soupclearbyonce)
>>> souptdtd = retdtd.sub('","', soupcleartd)
>>> print souptdtd
"59","00059413","00059413","70000000001","2011-08-22","18:01:48","0:07","0.45"
"60","00059413","00059413","70000000002","2011-08-22","18:49:48","0:43","1.95"
"61","00059413","00059413","70000000003","2011-08-22","18:52:50","5:07","11.70"
"62","00059413","00059413","70000000003","2011-08-22","19:02:47","4:10","9.75"
然后,我创建csv文件,有错误。
>>> tablecsv = file(r'/tmp/table.csv', 'w')
>>> tablecsv.write("".join(souptdtd))
>>> tablecsv = (r'/tmp/table.csv', 'r')
>>> print tablecsv.read()
print tablecsv.read()
AttributeError: 'tuple' object has no attribute 'read
不幸的是,我不能理解何时以及如何创建元组。有人能告诉我什么时候我错了,怎么补救吗?
您错过了这里的方法名称:tablecsv = (r'/tmp/table.csv', 'r')
,即您可能想要打开文件。e。g tablecsv = open(r'/tmp/table.csv', 'r')
在使用tablecsv.close()
如果你只是在括号中有一个项目列表,例如(r'/tmp/table.csv', 'r')
,那么这将创建一个元组