无法从表中抓取所有 ul 标签



我正试图从这个网站上抓取所有代理ip:https://proxy-list.org/english/index.php但我最多只能得到一个ip这是我的代码:

from helium import *
from bs4 import BeautifulSoup
url = 'https://proxy-list.org/english/index.php'
browser = start_chrome(url, headless=True)
soup = BeautifulSoup(browser.page_source, 'html.parser')
proxies = soup.find_all('div', {'class':'table'})
for ips in proxies:
print(ips.find('li', {'class':'proxy'}).text)

我试着使用ips.findall,但没有用。

from bs4 import BeautifulSoup
import requests
url = 'https://proxy-list.org/english/index.php'
pagecontent = requests.get(url)
soup = BeautifulSoup(browser.pagecontent, 'html.parser')
maintable = soup.find_all('div', {'class':'table'})
for div_element  in maintable:
rows = div_element.find_all('li', class_='proxy')
for ip in rows:
print(ip.text)

如果我答对了你的问题,下面是如何使用请求模块和Beautifulsoup库获取这些代理的方法之一:

import re
import base64
import requests
from bs4 import BeautifulSoup
url = 'https://proxy-list.org/english/index.php'
def decode_proxy(target_str):
converted_proxy = base64.b64decode(target_str)
return converted_proxy.decode()
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
for tr in soup.select("#proxy-table li.proxy > script"):
proxy_id = re.findall(r"Proxy[^']+(.*)'",tr.contents[0])[0]
print(decode_proxy(proxy_id))

前几个结果:

62.80.180.111:8080
68.183.221.156:38159
189.201.134.13:8080
178.60.201.44:8080
128.199.79.15:8080
139.59.78.193:8080
103.148.216.5:80

最新更新