解析url请求的字符串时出错



提示:我正试图访问SEC EDGAR数据库以提取特定的公司文件。我的urllib.request.request((有问题。目前我需要访问该网站的源代码。在那之后,我会用re来解析正文段落

**import re
import urllib.request as request
import urllib.parse as parse
import pandas
import csv
'''
WE ARE finding & parsing information to find https://www.sec.gov/Archives/edgar/data/1018724/0001018724-20-000030.txt
'''
frm_type = input('Enter the file type (e.g. 10-k, 8-q): ')
year = input('Enter fiscal year(4 digit number): ')
quarter = input('Enter quarter (NOTE. Must be in format QTX, with x being 1-4): ')
CIK = input('Enter CIK (company identifier): ')
def find_sec_filings(cik, year, quarter, filetype):
quarter = quarter.upper()
"""sources relevant file from EDGAR Database."""
lookup = 'edgar/data/'
web = 'https://www.sec.gov/Archives/edgar/full-index/'
direction = web + str(year) + '/' + str(quarter) + '/' + 'master.idx'
try:
idx = request.urlopen(direction)  
for line in idx:
if year in line and cik in line:
for element in line.split('|'):
if lookup in element:
file_direction = str(element[lookup:])
return file_direction
except:
print("No file with the specifications were found")
#Path to 10-k
fd = find_sec_filings(CIK,year,quarter,frm_type)
print(fd)
url1 = 'https://www.sec.gov/Archives/'+ fd
ERROR MESSAGE:
No file with the specifications were found
None
File "C:\Users\trisy\OneDrive\Desktop\classes\SP_22_courses\CS1110\pye_files\edgar.py", line 44, in <module>
url1 = 'https://www.sec.gov/Archives/'+ fd
TypeError: can only concatenate str (not "NoneType") to str`

该函数

def find_sec_filings(cik, year, quarter, filetype):
quarter = quarter.upper()
"""sources relevant file from EDGAR Database."""
lookup = 'edgar/data/'
web = 'https://www.sec.gov/Archives/edgar/full-index/'
direction = web + str(year) + '/' + str(quarter) + '/' + 'master.idx'
try:
idx = request.urlopen(direction)  
for line in idx:
if year in line and cik in line:
for element in line.split('|'):
if lookup in element:
file_direction = str(element[lookup:])
return file_direction
except:
print("No file with the specifications were found")

不保证字符串的return或失败。在CCD_ 2从不成立的情况下,CCD_。当python中未达到return时,函数None由函数return调用。因此之后

fd = find_sec_filings(CIK,year,quarter,frm_type)

fd可能是字符串,也可能是None,您不能随意地将它与字符串连接起来。否则,您可能会以TypeError结束。不使用except:的一侧(除外,称为裸露(被认为是python的不良做法。

一个简单的问题解决方案

def find_sec_filings(cik, year, quarter, filetype):
quarter = quarter.upper()
"""sources relevant file from EDGAR Database."""
lookup = 'edgar/data/'
web = 'https://www.sec.gov/Archives/edgar/full-index/'
direction = web + str(year) + '/' + str(quarter) + '/' + 'master.idx'
try:
idx = request.urlopen(direction)  
for line in idx:
if year in line and cik in line:
for element in line.split('|'):
if lookup in element:
file_direction = str(element[lookup:])
return file_direction
except:
print("No file with the specifications were found")
return ""

这样可以确保函数返回字符串,而不是None