为什么我的web scraper将所有内容写在一行中?



完整的新手,但我已经成功地从上游代码段创建的链接列表中使用Python抓取EAN数字。但是,我的输出文件将所有刮取的数字作为连续的单行包含,而不是每行一个EAN。

这是我的代码-有什么问题吗?(抓取的URL编辑)

import requests
from bs4 import BeautifulSoup
import urllib.request
import os
subpage = 1
while subpage <= 2:
URL = "https://..." + str(subpage)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
"""writes all links under the h2 tag into a list"""
links = []
h2s = soup.find_all("h2")
for h2 in h2s:
links.append("http://www.xxxxxxxxxxx.com" + h2.a['href'])
"""opens links from list and extracts EAN number from underlying page"""
with open("temp.txt", "a") as output:
for link in links:
urllib.request.urlopen(link)
page_2 = requests.get(link)
soup_2 = BeautifulSoup(page_2.content, "html.parser")
if "EAN:" in soup_2.text:
span = soup_2.find(class_="articleData_ean")
EAN = span.a.text
output.write(EAN)
subpage += 1
os.replace('temp.txt', 'EANs.txt')

output.write(EAN)正在编写每个EAN,它们之间没有任何内容。它不会自动添加分隔符或换行符。您可以添加换行符:output.write('n')或逗号等来分隔它们

最新更新