模拟和测试一个从url返回文本的函数

我有一个函数，它接受一个url并从该url返回文本。

def extract_raw_text_from_url(url, set_parser='lxml'):
try:
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})  # Set user agent as Mozilla. Otherwise: Error 403
source = urlopen(req).read()  # Return source code
parser = set_parser
soup = bs.BeautifulSoup(source, parser)  # create beautiful soup object
text = soup.get_text()  # get text of websites
except (ValueError): # ToDo: Why urllib.error.URLError is unknown? I want to include it in exception! Works in Colab!
text = []
return text

如何正确测试此功能？由于我认为每次测试都提出请求是不好的做法，我认为嘲笑结果是个好主意。

知道怎么做吗？我正在使用pytest，但我还是个初学者。

我认为这取决于你想测试什么，如果你想测试请求，你应该每次都执行一个请求(事实上，网页可能会在一天到另一天发生变化，它会考虑到这一点(。

如果你想测试给定html输入的解析过程，我认为你可以下载并将html页面放在测试中的资产(或其他(文件夹中，然后你可以尝试使用

url = "assets/mywebpage1.html"
with open(url, 'r') as f:
source = f.read()
#...

编辑：我认为可以采取两种方法：

将这两个操作划分为两个不同的函数，然后只测试parse_content_from_html(source(，其中source是在测试例程中获得的，如上所述

def extract_raw_text_from_url(url, set_parser='lxml'):
try:
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
source = urlopen(req).read()  # Return source code
text = parse_content_from_html(source)
except (ValueError): 
text = []
return text
def parse_content_from_html(source):
parser = set_parser
soup = bs.BeautifulSoup(source, parser)  # create beautiful soup object
text = soup.get_text()  # get text of websites
return text

使用标志来区分本地html加载和远程html加载。您可以使用extract_raw_text_from_url("assets/mywebpage1.html", ..., local=True)

def extract_raw_text_from_url(url, set_parser='lxml', local=False):
try:
if local:
with open(url, 'r') as f:
source = f.read()
else:
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})  # Set user agent as Mozilla. Otherwise: Error 403
source = urlopen(req).read()  # Return source code
parser = set_parser
soup = bs.BeautifulSoup(source, parser)  # create beautiful soup object
text = soup.get_text()  # get text of websites
except (ValueError): 
text = []
return text

相关内容

最新更新

热门标签：