无法使用BS4 Python识别正确的'div'



我正在尝试抓取数据,但无法识别正确的"div",因为其中有两个具有相同的类。如果我试图在第二个"div"的父级上进行查找,然后调用它的子级,它只会给出none。

需要收集的数据是录取状态、学校名称、GRE、GMAT成绩。

我是在Python和beautifulsoup 的帮助下完成这项工作的

这是我在下面的代码

import requests
from bs4 import BeautifulSoup

url = 'https://www.clearadmit.com/livewire/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

container = soup.find('div', attrs = {'class' : 'livewire-container'})
print(container)

帖子是通过Ajax从外部源加载的。您可以使用以下示例来加载它们:

import requests
from bs4 import BeautifulSoup

url = "https://www.clearadmit.com/wp-admin/admin-ajax.php"
params = {
"action": "livewire_load_posts",
"school": "",
"round": "",
"status": "",
"orderby": "",
"paged": "",
}
for page in range(1, 5):  # <--- increase number of pages here
print("Getting page {}..".format(page))
params["paged"] = page
data = requests.post(url, data=params).json()
soup = BeautifulSoup(data["markup"], "html.parser")
for entry in soup.select(".livewire-entry"):
status = entry.select_one(".status")
name = status.find_next("strong")
details = entry.select_one(".lw-details")
print(
"{:<25} {:<30} {}".format(
status.get_text(strip=True),
name.get_text(strip=True),
details.get_text(strip=True),
)
)
print("-" * 80)

打印:

Getting page 1..
News                      All Schools                    
Accepted from Waitlist    Michigan / Ross                Round: Round 2
Accepted from Waitlist    UT Austin / McCombs            GMAT: 640 Round: Round 2
Accepted                  Johns Hopkins / Carey          Round: Round 3
Accepted                  Michigan / Ross                GPA: 3.65 GRE: 322 Round: Round 3
Accepted from Waitlist    Michigan / Ross                GPA: 3.1 Round: Round 2 | Michigan
Note                      All Schools                    GMAT: 740 Round: Round 1 | Africa
Accepted                  INSEAD                         GPA: 3.5 GMAT: 770 Round: Round 4 | Taiwan
Accepted                  INSEAD                         GMAT: 750 Round: Round 4 | India
Interview Invite          Berkeley / Haas                GPA: 3.72 GMAT: 740 Round: Round 1 | IL
Enrolled                  Duke / Fuqua                   GPA: 3.59 GRE: 333 Round: Round 2 | Miami
Interview Invite          Georgetown / McDonough         GRE: 307 Round: Round 3 | Arlington
Accepted                  USC / Marshall                 GPA: 3.4 GMAT: 720 Round: Round 2 | NY
Note                      NYU Stern                      Round: Round 1
Enrolled                  Berkeley / Haas                GMAT: 760 Round: Round 2 | Canada
Waitlisted                Duke / Fuqua                   GRE: 314 Round: Round 2
Note                      All Schools                    Round: Rolling Admissions
Interview Invite          Columbia                       GPA: 3.5 Round: Round 3 | NY
Accepted                  UNC Kenan-Flagler              GMAT: 740 Round: Round 1
Interview Invite          MIT Sloan                      GPA: 3.6 GMAT: 740 Round: Round 3
--------------------------------------------------------------------------------
Getting page 2..
Rejected                  Columbia                       GPA: 3.6 GRE: 331 Round: Round 3 | IL
Interview Invite          Northwestern / Kellogg         GPA: 3.6 GRE: 331 Round: Round 3 | IL
Waitlisted                Duke / Fuqua                   GMAT: 760 Round: Round 2 | Canada
...

EDIT:添加分页。

最新更新