小贝子编程

在Python中抓取具有多个页面的ajax搜索引擎

本文关键字：ajax 搜索引擎 Python 抓取 python ajax screen-scraping
更新时间 : 2023-08-28
英文 : Scraping ajax search engine with multiple pages in Python

我试图废除韩国专利局。但是，搜索引擎使用 ajax。我需要什么才能获得我的第一个结果？然后我将如何报废后续页面？假设我正在寻找关键字电视的专利。

这是我的起始代码。任何提示都受到高度赞赏

import urllib
import re

url = 'http://engpat.kipris.or.kr/engpat/searchLogina.do?next=MainSearch'
acct = open("results.txt", "w")
regex= '<title>(.+?)</title>'
pattern = re.compile(regex)
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
title= re.findall(pattern,htmltext)
acct.write(title)

谢谢！

有很多方法可以做到这一点。我推荐的方法之一是使用Selenium来完成此任务，使用XPath抓取每个页面以选择下一页元素。查看硒文档以获取更多示例。正则表达式不是使用 html 抓取的方式......

在Python中抓取具有多个页面的ajax搜索引擎

相关内容

最新更新

热门标签：