BeautifulSoup find所有具有多个变量类输入的HTML类



我有以下代码,它抓取了一个带有"odd"或"偶数"类的div的网站。我想使"奇数"和"偶数"成为我的函数接受的参数,这也允许我添加其他div。这是我的代码:

#
# Imports
#
import urllib2
from bs4 import BeautifulSoup
import re
import os
from pprint import pprint
#
# library
#
def get_soup(url):
    page = urllib2.urlopen(url)
    contents = page.read()
    soup = BeautifulSoup(contents, "html.parser")
    body = soup.findAll("tr", ["even", "odd"])
    string_list = str([i for i in body])
    return string_list

def save_to_file(path, soup):
    with open(path, 'w') as fhandle:
        fhandle.write(soup)

#
# script
#
def main():
    url = r'URL GOES HERE'
    path = os.path.join('PATH GOES HERE')
    the_soup = get_soup(url)
    save_to_file(path, the_soup)

if __name__ == '__main__':
    main()

我想将*args合并到代码中,以便get_soup function如下所示:

def get_soup(url, *args):
    page = urllib2.urlopen(url)
    contents = page.read()
    soup = BeautifulSoup(contents, "html.parser")
    body = soup.findAll("tr", [args])
    string_list = str([i for i in body])
    return string_list
def main():
    url = r'URL GOES HERE'
    path = os.path.join('PATH GOES HERE')
    the_soup = get_soup(url, "odd", "even")
    save_to_file(path, the_soup)

不幸的是,这不起作用。想法?

不要把 args 放在列表中,args 已经是一个元组,所以只需传递它:

body = soup.findAll("tr", args)

如果你[args],你最终会得到类似[("odd","even")]的东西。

此外,str([i for i in body])没有真正的意义,这与只做str(body)是一样的,但我看不出这种格式有什么用处。

最新更新