从维基百科上的BeautifulSoup网站中查找TICKR号码



我正试图从维基百科上找到标普100的股票(代码)号码,从这里的链接:https://en.wikipedia.org/wiki/S%26P_100.

我认为直到row_soup_list = table_soup.find('tr'),我的代码工作,但.find似乎选择我的table_soup太小的一部分,但.find_all返回此错误:

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

我怎样才能刮掉所有的符号?

import requests
from bs4 import BeautifulSoup 
url  = r'https://en.wikipedia.org/wiki/S%26P_100'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
tag = 'table'
attributes = {'class':'wikitable sortable'}
table_soup =soup.find(tag, attributes)
print(table_soup)
symbol= []
row_soup_list = table_soup.find('tr')
print(row_soup_list)
for row_soup in row_soup_list:
td_soup_list = row_soup.find('td')
item = {}
item['Symbol'] = td_soup_list[0].text
symbol.append(item)

print(item)

代码中的几个问题:

  • .find()给出第一个匹配结果。.find_all()返回一个列表在所有匹配结果中。因为需要所有的行,所以必须使用.find_all()

    row_soup_list = table_soup.find_all('tr')
    
  • 您正在将这些符号添加到items字典中。但是您的代码只将最后一个符号保存到items,因为您每次都在循环中创建字典。必须在循环之前初始化。

    item = {'Symbol': []}
    
  • 也因为table有一个<th>标签,它表示表标题,你需要从row_soup_list中跳过它。所以使用row_soup_list[1:]并遍历它。

    for row_soup in row_soup_list[1:]:
    

这是正确的代码。

import requests
from bs4 import BeautifulSoup 
url  = r'https://en.wikipedia.org/wiki/S%26P_100'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
tag = 'table'
attributes = {'class':'wikitable sortable'}
table_soup =soup.find(tag, attributes)
symbol= []
row_soup_list = table_soup.find_all('tr')
item = {'Symbol': []}
for row_soup in row_soup_list[1:]:
td_soup_list = row_soup.find('td')
item['Symbol'].append(td_soup_list.text.strip())

print(item)

要获得这些符号,您可以使用CSS选择器#constituents td:nth-of-type(1):

选择ID为constituents的所有第一个HTML<td>
import requests
from bs4 import BeautifulSoup

url = r"https://en.wikipedia.org/wiki/S%26P_100"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
symbols = {
"Symbols": [
tag.get_text(strip=True)
for tag in soup.select("#constituents td:nth-of-type(1)")
]
}
print(symbols)

输出:

{'Symbols': ['AAPL', 'ABBV', 'ABT', 'ACN', 'ADBE', 'AIG', 'AMGN', 'AMT', 'AMZN', 'AVGO', 'AXP', 'BA', 'BAC', 'BIIB', 'BK', 'BKNG', 'BLK', 'BMY', 'BRK.B', 'C', 'CAT', 'CHTR', 'CL', 'CMCSA', 'COF', 'COP', 'COST', 'CRM', 'CSCO', 'CVS', 'CVX', 'DD', 'DHR', 'DIS', 'DOW', 'DUK', 'EMR', 'EXC', 'F', 'FB', 'FDX', 'GD', 'GE', 'GILD', 'GM', 'GOOG', 'GOOGL', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'JPM', 'KHC', 'KO', 'LIN', 'LLY', 'LMT', 'LOW', 'MA', 'MCD', 'MDLZ', 'MDT', 'MET', 'MMM', 'MO', 'MRK', 'MS', 'MSFT', 'NEE', 'NFLX', 'NKE', 'NVDA', 'ORCL', 'PEP', 'PFE', 'PG', 'PM', 'PYPL', 'QCOM', 'RTX', 'SBUX', 'SO', 'SPG', 'T', 'TGT', 'TMO', 'TMUS', 'TSLA', 'TXN', 'UNH', 'UNP', 'UPS', 'USB', 'V', 'VZ', 'WBA', 'WFC', 'WMT', 'XOM']}

最新更新