是否有办法使这段代码更干净?


import requests
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/players/a/'
urlb = 'https://www.basketball-reference.com/players/b/'
urlc = 'https://www.basketball-reference.com/players/c/'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string="Kentucky")
result = requests.get(urlb)
doc = BeautifulSoup(result.text, 'lxml')
collegeb = doc.find_all(string='Kentucky')
result = requests.get(urlc)
doc = BeautifulSoup(result.text, 'lxml')
collegec = doc.find_all(string='Kentucky')
print(college)
print(collegeb)
print(collegec)

我需要为至少30所学校的字母表中的每个字母做这个,我真的很想知道如何更有效地做到这一点

对几乎相同的代码进行重复删除,在输入上循环,结果的listdict:

import requests
from bs4 import BeautifulSoup
url_template = 'https://www.basketball-reference.com/players/{}/'
folders = ['a', 'b', 'c']  # The only varying thing in your original tripled code
colleges = []              # Store the results for each varied thing here in same order
for folder in folders:     # Loop over varying component
result = requests.get(url_template.format(folder))  # Substitute it in template
doc = BeautifulSoup(result.text, 'lxml')
colleges.append(doc.find_all(string="Kentucky"))    # Append result in same order
# Loop over results to print them
for college in colleges:
print(college)

如果你让它为许多学校工作,对于字母表的每个字母,你可能会使用dict(更好的是defaultdict)而不是list(这样你就可以按学校分组结果),用一个内循环按学校解析数据:

import requests
from bs4 import BeautifulSoup
from collections import defaultdict
from string import ascii_lowercase
url_template = 'https://www.basketball-reference.com/players/{}/'
folders = ascii_lowercase  # Will run for every lowercase alphabet letter
schoolnames = ("Kentucky", "Gonzaga", ...)
colleges = defaultdict(list) # Store a list of results for each school
for folder in folders:     # Loop over varying component
result = requests.get(url_template.format(folder))  # Substitute it in template
doc = BeautifulSoup(result.text, 'lxml')
for schoolname in schoolnames:
colleges[schoolname].append(doc.find_all(schoolname=school))
# Loop over results to print them
for collegename, results in colleges.items():
print(collegename)
for result in results:
print(result)

这里有一个稍微简单一点的代码。我所做的就是拉入所有玩家表,然后在'Colleges'列上使用.value_counts()。这会让你得到所有的学校。然后,如果您只想查看一所学校,只需调用索引值:

import pandas as pd
from string import ascii_lowercase
dfs_list = []
for letter in ascii_lowercase:
url = f'https://www.basketball-reference.com/players/{letter}/'
dfs_list.append(pd.read_html(url)[0])
print(url)

results = pd.concat(dfs_list, axis=0)
colleges_count = results['Colleges'].value_counts()

你甚至可以在更少的代码行中使用列表推导来转换它:

import pandas as pd
from string import ascii_lowercase
results = pd.concat([pd.read_html(f'https://www.basketball-reference.com/players/{letter}/')[0] for letter in ascii_lowercase], axis=0)
colleges_count = results['Colleges'].value_counts()

输出:

print(colleges_count)
Kentucky                                  112
UCLA                                       91
UNC                                        91
Duke                                       84
Kansas                                     72
Kansas, Houston                             1
California Western Uiversity                1
Florida, Louisiana                          1
NC State, Iona College                      1
Seattle Pacific University, Washington      1
Name: Colleges, Length: 806, dtype: int64

或者只看某所学校:

print(colleges_count['Kentucky'])
112

你可以直接使用for循环。

import requests
from bs4 import BeautifulSoup
colleges = []
for char in "abcdefghijklmnopqrstuvwxyz":
url = f"https://www.basketball-reference.com/players/{char}/"
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string="Kentucky")
colleges.append(college)
print(*colleges, sep = "n")

你可以用你需要的指令写一个函数来"重做";每一所学校。然后,为每个学校部署一个包含每个参数或特征/特征的main()函数。您的代码似乎是一大块行,您应该将它们分开到不同的指令中,并更多地依赖于"整齐编码">

最新更新