需要帮助检索第一次出现带有美丽汤和蟒蛇的东西



我正在尝试搜索SEC网站以查找"10-Q"或"10-K"的第一个出现,并检索网站上"交互式数据按钮"下的链接。

我尝试从中检索链接的网址是:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&type=&dateb=20200506&owner=exclude&count=40

结果链接应为:

https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v

我当前使用的代码:

import requests
from bs4 import BeautifulSoup
date1 = "20200506"
ticker = "AAPL"
URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + 
date1 + '&owner=exclude&count=40'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='seriesDiv')
rows = results.find_all('tr')
for row in rows:
document = row.find('td', string='10-Q')
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
print(document.text)
print(link['href'])

此代码返回 10-Q的所有链接,但它应该同时用于 10-Q 和 10-K。

有人可以帮助我塑造此代码,使其仅返回第一次出现的 10-Q 或 10-K 的链接吗?

谢谢

最快的解决方案是在.find()方法中使用lambda。

例如:

import requests
from bs4 import BeautifulSoup
date1 = "20200506"
ticker = "AAPL"
URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + date1 + '&owner=exclude&count=40'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='seriesDiv')
rows = results.find_all('tr')
for row in rows:
document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
print(document.text)
print('https://www.sec.gov' + link['href'])

打印10-Q10-K链接:

10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000010&xbrl_type=v
10-K
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000076&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000066&xbrl_type=v
>EDIT:要仅获取第一次出现,您可以使用字典。每次迭代都会检查字典中是否有键(字符串10-Q10-K(,如果没有,请添加它:
links = dict()
for row in rows:
document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
if document.text not in links:
links[document.text] = 'https://www.sec.gov' + link['href']
print(links)

指纹:

{'10-Q': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v', 
'10-K': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v'}

最新更新