Webdriver/BeautifulSoup让我的程序检查网页上是否存在字符串的一部分



我正在编写一个自动购买物品的机器人。我目前的做法是将产品信息放在一个名为info的字典中,并在我需要特定产品/颜色等时引用它。目前,我的代码(特别是在findProduct((中(检查temp_tuple中的索引是否与INFO['product']相同。

在我的例子中,我查找一个产品,但我的代码返回了一个错误,因为一些名称的末尾有一个空格,而我的代码无法处理这个问题。

然而,我想将其修改为检查字符串是否在网页上,这样即使有额外的空间,我的代码也能运行。

我的代码已经够多了,它可以正常工作:

#!/usr/bin/env python3
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
import time
import requests
import bs4 as bs
from splinter import Browser
import helpers
from selenium.common.exceptions import ElementNotInteractableException
from config import INFO
def __init__(self, **info):
self.base_url = 'http://www.supremenewyork.com/'
self.shop = 'shop/all/'
self.checkout = 'checkout/'
self.info = info


class supremeBot(object):
def __init__(self, **info):
self.base_url = 'http://www.supremenewyork.com/'
self.shop = 'shop/all/'
self.info = info
def initializeBrowser(self):
driverz = self.info["driver"]
path = helpers.get_driver_path(driver)
if driverz == "geckodriver":
self.b = Browser()
elif driverz == "chromedriver":
executable_path = {"executable_path": path}
self.b = Browser('chrome', **executable_path)

#This looks for the product based on what the category is 
def findProduct(self):
category =  str(INFO['category'])
source = requests.get("http://www.supremenewyork.com/shop/all/"+category).text
soup = bs.BeautifulSoup(source, 'lxml')
temp_link = []
temp_tuple = []
for link in soup.find_all('a', href=True):
temp_tuple.append((link['href'], link.text))
for i in temp_tuple:
if i[1] == INFO['product'] or i[1] == INFO['color']: # <------------ I want this to recognize a partial string
temp_link.append(i[0])
#print(temp_link)
#This creates end of the final link
self.final_link = list(
set([x for x in temp_link if temp_link.count(x) == 2]))[0]
#Concatenates the previous link w/ the website
link = 'http://www.supremenewyork.com'+str(self.final_link)
driver.get(link)        
if __name__ == "__main__":
driver = webdriver.Chrome('./chromedriver')

'''
BOT = supremeBot(**INFO)
BOT.findProduct()
order()
'''
BOT = supremeBot(**INFO)
found_product = False
counter = 1
max_iter = 5
while not found_product and counter < max_iter:
found_product = BOT.findProduct()
print("We tried ",counter," times.")  
counter +=1
if found_product:
print('Couldn't find it')
continue
else:
print('found it')
order()
INFO = {
"driver": "chromedriver",
"product": "Supreme®/MLB New Era®", # "Big Duffle Bag " is an example of a product that has the space after it
"color": "Navy",
"category": "hats",
"size": "Medium",
"namefield": "Bucky McNuts",
"emailfield": "email@email.com",
"phonefield": "(555)555-5555",
"addressfield": "321 St",
}

在这种情况下,如果你用"Big Duffle Bag"取代Supreme®/MLB New Era®,如果你去掉单词Bag后面的空格,你会看到代码不会运行。

如果有人能帮忙,我会非常感激的!

您可以检查部分字符串:

if "part" in "partstring":
print("the word 'part' is within 'partsting'")

此处的可能用途:

if INFO['product'] in i[1].lower() or INFO['color'] in i[1].lower():
#do something

.lower()是为了确保网站上的文本是小写

最新更新