试图解析我使用scrapy
的URLdef parse_info_has_id(self, css_path):
profileID = ""
for div in css_path.xpath('div'):
url = "".join(div.css('div > a::attr(href)').extract())
if "add_friend.php?id" in url:
print(url)
#parsed = urlparse.urlparse(url)
#print urlparse.parse_qs(parsed.query)['id']
return profileID
此打印
/a/mobile/friends/add_friend.php?id=100003669247258&hf=search&sld=eyJzZWFyY2hfc2lkIjoiNGYxMmNhZGJhZDVkOGQ5ZGFkN2RkZTdhYjc3MTMwNTQiLCJxdWVyeSI6IjIwMjM2MDg3OTciLCJzZWFyY2hfdHlwZSI6IlNlYXJjaCIsInNlcXVlbmNlX2lkIjoxOTg2MTg0OTIzLCJwYWdlX251bWJlciI6MSwiZmlsdGVyX3R5cGUiOiJTZWFyY2giLCJlbnRfaWQiOjEwMDAwMzY2OTI0NzI1OCwicG9zaXRpb24iOjAsInJlc3VsdF90eXBlIjoyMDQ4fQ%3D%3D&gfid=AQB03j5V7CqqGQSD/graphsearch/100003669247258/photos-of?ent=100003669247258&refid=0&query=2023608797&sld=eyJzZWFyY2hfc2lkIjoiNGYxMmNhZGJhZDVkOGQ5ZGFkN2RkZTdhYjc3MTMwNTQiLCJxdWVyeSI6IjIwMjM2MDg3OTciLCJzZWFyY2hfdHlwZSI6IlNlYXJjaCIsInNlcXVlbmNlX2lkIjoxOTg2MTg0OTIzLCJwYWdlX251bWJlciI6MSwiZmlsdGVyX3R5cGUiOiJTZWFyY2giLCJlbnRfaWQiOjEwMDAwMzY2OTI0NzI1OCwicG9zaXRpb24iOjAsInJlc3VsdF90eXBlIjoyMDQ4fQ%3D%3D&source=pivot
我想从字符串中获取ID = 100003669247258
,但是当我尝试
#parsed = urlparse.urlparse(url)
#print urlparse.parse_qs(parsed.query)['id']
我有'function' object has no attribute 'urlparse'
错误,如何解析该URL字符串以从add_friend.php?id=10000366924725
或/graphsearch/100003669247258/
您可以使用
import urllib.parse as urlparse
将库导入:
from urlparse import urlparse
使用该方法为:
urlparse(url)
而不是:
urlparse.urlparse(url)
如果您的目标仅是从字符串中获取ID,则可以使用re
实现它。
import re
match_object = re.search("id=(d+)", "/a/mobile/friends/add_friend.php?id=100003669247258&hf=search&sld=eyJzZWFyY2hfc2lkIjoiNGYxMmNhZGJhZDVkOGQ5ZGFkN2RkZTdhYjc3MTMwNTQiLCJxdWVyeSI6IjIwMjM2MDg3OTciLCJzZWFyY2hfdHlwZSI6IlNlYXJjaCIsInNlcXVlbmNlX2lkIjoxOTg2MTg0OTIzLCJwYWdlX251bWJlciI6MSwiZmlsdGVyX3R5cGUiOiJTZWFyY2giLCJlbnRfaWQiOjEwMDAwMzY2OTI0NzI1OCwicG9zaXRpb24iOjAsInJlc3VsdF90eXBlIjoyMDQ4fQ%3D%3D&gfid=AQB03j5V7CqqGQSD/graphsearch/100003669247258/photos-of?ent=100003669247258&refid=0&query=2023608797&sld=eyJzZWFyY2hfc2lkIjoiNGYxMmNhZGJhZDVkOGQ5ZGFkN2RkZTdhYjc3MTMwNTQiLCJxdWVyeSI6IjIwMjM2MDg3OTciLCJzZWFyY2hfdHlwZSI6IlNlYXJjaCIsInNlcXVlbmNlX2lkIjoxOTg2MTg0OTIzLCJwYWdlX251bWJlciI6MSwiZmlsdGVyX3R5cGUiOiJTZWFyY2giLCJlbnRfaWQiOjEwMDAwMzY2OTI0NzI1OCwicG9zaXRpb24iOjAsInJlc3VsdF90eXBlIjoyMDQ4fQ%3D%3D&source=pivot")
id = match_object.group(1)
print id
您可以将URLPARSE导入为:
from urlparse import urlparse
,也可以从urllib导入为:
from urllib import parse as urlparse