提供随机值的Web scraper



在工作中,我的任务是对砖块进行市场分析。我选择了一些竞争对手,制作了网络爬虫来收集他们的价格。它适用于大多数砖类型,但在某些砖类型上,它会更改值或在匹配时说没有匹配。

问题仅与Prices_Building有关。代码的其余部分工作得很好,奇怪的是,如果我使用Prices_Building代码只搜索一个名称,它就会正确。

这是输出电子表格的图像

绿色的是网站上的正确值,红色的是{}中正确值的错误值(如果存在(。

这是我的代码:

sheet = client.open("Bricks Compare Prices").get_worksheet(0)

Prices_Amari = []
Prices_Wholesale = []
Prices_Building = []
Names = []
Prices_Amari = []
#List of bricks to compare
lis = [ (list of names boiled down to NAME pack of SIZE
]
Prices_Building = []
Namez = []
for name in lis: # for every name in the list
target = name.rpartition("Pack")[0] #get the essential name 
pack_size = re.search(pattern = '[0-9]+', string=name).group() #get the pack size
res = requests.get("https://eucs13.ksearchnet.com/cloud-search/n-search/search?ticket=klevu-15598202362809967&term={}&paginationStartsFrom=0&sortPrice=false&ipAddress=undefined&analyticsApiKey=klevu-15598202362809967&showOutOfStockProducts=true&klevuFetchPopularTerms=false&klevu_priceInterval=500&fetchMinMaxPrice=true&klevu_multiSelectFilters=true&noOfResults=1&klevuSort=rel&enableFilters=true&layoutVersion=1.0&autoComplete=false&autoCompleteFilters=&filterResults=&visibility=search&category=KLEVU_PRODUCT&klevu_filterLimit=50&sv=2316&lsqt=&responseType=json&klevu_loginCustomerGroup=".format(name))
results = json.loads(res.text)['result'] #go to this site, search for the brick 
for i in results: #for every result, check that the name and pack size is in the title, or sau there's no match
if target in i['name'] and pack_size in i['name']:
Prices_Building.append(i['salePrice'])
Namez.append(i['name'])
else:
Prices_Building.append("No match in Building Supplies Online" + name)
Namez.append(i['name'])

#重复其他网站用于lis:中的名称

def get_url_Amaari(search_term):
build = 'https://ammaaristones.co.uk/?s={}&post_type=product'
url = build.format(search_term)
return url
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result_Ammaristones = requests.get(get_url_Amaari(Name), headers=headers)
try:
soupAmm = BeautifulSoup(result_Ammaristones.text, 'lxml')
Par = soupAmm.find('div', class_='box-text box-text-products')
PriceAmm = re.findall("[-+]?[.]?[d]+(?:,ddd)*[.]?d*(?:[eE][-+]?d+)?",Par.find('bdi').text)[0]
Prices_Amari.append(PriceAmm)
except:
PriceAmm = "no match in Ammari Stones for:" + Name
Prices_Amari.append(PriceAmm)
pass

#重复其他网站用于lis:中的名称

try:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
def get_url_Wholesale(search_term):
build = 'https://brickwholesale.co.uk/?s={}&post_type=product&dgwt_wcas=1'
url = build.format(search_term)
return url
result_Wholesale = requests.get(get_url_Wholesale(Name), headers=headers)

soupWhole = BeautifulSoup(result_Wholesale.text, 'html.parser')
Pparent = soupWhole.find_all('span', class_='woocommerce-Price-currencySymbol')
Whole = (float(re.findall("[-+]?[.]?[d]+(?:,ddd)*[.]?d*(?:[eE][-+]?d+)?",soupWhole.find('bdi').text.strip())[0]))*1.2+96
PriceWhole = math.floor(Whole)
if PriceWhole == 96:
PriceWhole = "No Match in Wholesale Bricks for: " + Name

Prices_Wholesale.append(PriceWhole)

except:
PriceWhole = "no match in wholesale Bricks Stones for:" + Name



#打印到谷歌表一次一行,匹配价格进行比较

for j in range(len(lis)):
time.sleep(1)
row =[lis[j],Prices_Amari[j], Prices_Building[j], Prices_Wholesale[j]]
sheet.append_row(row)

在不打印或报告异常的情况下使用except:是非常危险的,因为您完全隐藏了可能发生的每一个异常。你应该至少打印一个异常,或者为了正确地使用一个特定的异常,除非你预计可能发生并愿意抑制异常,但让其他人提出自己的问题并停止你的代码,这样你就知道发生了不寻常的事情。

相关内容

  • 没有找到相关文章

最新更新