我有一些html页面,应该有1表,一些行和2列,我试图将这些转换为cvs表。我想通过行循环并获得列,但我不能只获得id内的部分(例如id="(-)由于第3条CRR" AT1资本的额外扣除)。是否有一种方法可以提取每一行id的内容?
import requests
from bs4 import BeautifulSoup
import pandas as pd
definitions=[]
file = '/Users/tom/Downloads/Capitalresourceitemlevel1.html'
soup = BeautifulSoup(open(file), "html.parser")
table = soup.find_all('table')
for i in table:
rows = i.find_all('tr')
for i in rows:
row_tds = i.find_all('td')
if len(row_tds) == 2:
definitions.append((row_tds[0].text, row_tds[1].text))
with open('output.csv', 'w') as f:
for line in definitions:
f.write(','.join(line))
f.write('n')