将Python beautifulsoup的span结果连接到字符串中



下面的代码片段可以正常工作,但是作为改进的一部分,我想将条目结果连接到一个用逗号分隔的字符串中。我一直在试,但是没有锁。

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen
url = 'https://bscscan.com/tx/0xb9044e77ae66b6f128866e049d55f09b3501de6fc75478e406e4c32d1de4bd6a'
headers = {'User-Agent': 'Mozilla/5.0'}
req = Request(url, headers=headers)
html = urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
main_data = soup.select("ul#wrapperContent div.media-body")
for item in main_data:
    all_span = item.find_all("span", class_='mr-1')
    last_span = all_span[-1]
    all_a = item.find_all("a")
    last_a = all_a[-1]
    print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))

当前输出:

                    2 ($598.51) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
          13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                              4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c

需要改进:

                    2 ($598.51) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
          13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                              4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c
         -> Wrapped BNB (WBNB) , MoMo KEY (KEY) , Chi Gastoken...(CHI) #-- Concatenated String

首先,您试图连接的字符串似乎是来自链接的文本,而不是跨度。

其次:初始化一个空字符串(在您的情况下,它不会是空的,因为您希望它以'->'开头),然后在每次迭代时向其添加所需的字符串,并获得最终答案。试试以下命令:

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen
url = 'https://bscscan.com/tx/0xb9044e77ae66b6f128866e049d55f09b3501de6fc75478e406e4c32d1de4bd6a'
headers = {'User-Agent': 'Mozilla/5.0'}
req = Request(url, headers=headers)
html = urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
main_data = soup.select("ul#wrapperContent div.media-body")
link_texts = '->'    # initialize a new string
for item in main_data:
    all_span = item.find_all("span", class_='mr-1')
    last_span = all_span[-1]
    all_a = item.find_all("a")
    last_a = all_a[-1]
    print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))
    link_texts += last_a.get_text(strip=True) + ","    # add the link text to the string you initialized on each iteration
link_texts = link_texts[:-1]    # slice the string so as to remove the extra comma at the last :):):)
print(link_texts)

输出如下:

  2 ($597.04) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
              13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                                  4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c
->Wrapped BNB (WBNB),MoMo KEY (KEY),Chi Gastoken...(CHI)

您应该将值存储在列表中(在for循环之前声明),并使用','.join(list_variable)连接

之类的
temp_list = []
main_data = soup.select("ul#wrapperContent div.media-body")
for item in main_data:
    all_span = item.find_all("span", class_='mr-1')
    last_span = all_span[-1]
    all_a = item.find_all("a")
    last_a = all_a[-1]
    print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))
    temp_list.append(last_a.get_text(strip=True))
print(', '.join(temp_list))

最新更新