我需要帮助在python中使用beautifulsoup
从这个网站抓取数据,有人能帮我吗。几个小时以来,我一直在文档之间左右为难。
我只需要将这两个链接中的所有名称存储到一个数组中https://angelsname.com/Twin-Boy-Nameshttps://angelsname.com/Modern-Hindu-Baby-Names/Boy
from traceback import print_tb
from bs4 import BeautifulSoup
import requests, json, lxml
import time
names=[]
html = requests.get(url)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select(#MysteryCode)
names.append(result)
print(names)
我正在尝试不同的烫发和梳子,但在神秘代码方面运气不佳,请指导。
选择元素并迭代ResultSet
:
[e.td.get_text(strip=True) for e in soup.select('table tr:has(td)')]
示例
from bs4 import BeautifulSoup
import requests
url = 'https://angelsname.com/Twin-Boy-Names'
response = requests.get(url)
soup = BeautifulSoup(response.text)
[e.td.get_text(strip=True) for e in soup.select('table tr:has(td)')]
您可能想要调用pandas
,这是抓取表数据的最佳实践:
pd.read_html('https://angelsname.com/Twin-Boy-Names')[0]['First Name'].to_list()
示例
import pandas as pd
pd.read_html('https://angelsname.com/Twin-Boy-Names')[0]
或仅名称列表:
import pandas as pd
pd.read_html('https://angelsname.com/Twin-Boy-Names')[0]['First Name'].to_list()