Jsoup登录cookie不适用于子页面,而仅适用于主页



第1部分:获取Cookie(工作(

第2部分:使用cookie与另一个子页面(不工作(

2.1https://www.valueresearchonline.com/funds/26123/motilal-oswal-flexi-cap-fund-regular-plan/#fund-投资组合:显示";Top Holdings;页面处于登录状态时的部分。

我不明白为什么即使提供了cookie,子页面也没有登录状态。

第1部分(获取cookie(

Connection.Response login2 = Jsoup.connect("https://www.valueresearchonline.com/login/?")
.timeout(15000)
.userAgent("Mozilla")
.data("username", "valid_email")
.data("password", "valid_password")
.method(Connection.Method.POST)
.execute();
System.out.println(login2.statusCode());
System.out.println(login2.cookies());
doc = login2.parse();
System.out.println(doc.body().text().indexOf("My Favourite Stories"));
System.out.println(doc.body().text().indexOf("Logout"));
String sessionId2 = login2.cookie("PHPSESSID");

Chrome开发工具网络选项卡输出

curl 'https://www.valueresearchonline.com/login/?' 
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' 
-H 'Accept-Language: en' 
-H 'Cache-Control: max-age=0' 
-H 'Connection: keep-alive' 
-H 'Cookie: PHPSESSID=pklb54pa5chma4hi69bgfu7vcc; currency=INR; magnitude=LC; ad=53c5b4ffbcbd345c755abe149e639d10aa8fdb70; ad=53c5b4ffbcbd345c755abe149e639d10aa8fdb70; wec=295393018; nobtlgn=368510251; ac=67889375%7C379669468%7C35351482; ac=67889375%7C379669468%7C35351482; _gcl_au=1.1.265956817.1663225155; _gid=GA1.2.417218768.1663225156; _gat_UA-240759-1=1; _clck=5maw1g|1|f4w|0; _fbp=fb.1.1663225156547.1723305957; __gads=ID=456fb6dced89a9d4-22af789890d6008a:T=1663225157:S=ALNI_MYgbR1A0Fh3kAwwVwrYjWKMDAYOuA; __gpi=UID=000009c87e7a6399:T=1663225157:RT=1663225157:S=ALNI_MZ4wVT9remS6N-MFKJvbFF1GCqtRg; __cf_bm=XMBuGU25ky6q.8Z_vTFVtdWF.EWPPsrG8Buy1QFgIl4-1663225158-0-AW29ht0HH+iYPhBRF4AU3bmUim5cGJvhNtZuM41NVaC0kWPvnRr4/+1v+n+0Q8iA6SxKD8m9lScYnM8T/HfGonbEoOz84uh83Y7d98O4qe/mVT8Ixv4yya4ZWhzazxOboQ==; _ga=GA1.1.448580854.1663225156; _ga_N9R425YFBJ=GS1.1.1663225155.1.1.1663225167.48.0.0; _clsk=15nbblb|1663225167506|3|1|l.clarity.ms/collect; pgv=4' 
-H 'Referer: https://www.valueresearchonline.com/register' 
-H 'Sec-Fetch-Dest: document' 
-H 'Sec-Fetch-Mode: navigate' 
-H 'Sec-Fetch-Site: same-origin' 
-H 'Sec-Fetch-User: ?1' 
-H 'Upgrade-Insecure-Requests: 1' 
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' 
-H 'sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"' 
-H 'sec-ch-ua-mobile: ?0' 
-H 'sec-ch-ua-platform: "macOS"' 
--compressed

第2部分-不工作(页面未处于登录状态(

//张贴请求-不工作

login2 = Jsoup.connect("https://www.valueresearchonline.com/funds/26123/motilal-oswal-flexi-cap-fund-regular-plan/#fund-portfolio")
.timeout(15000)
.userAgent("Mozilla")
.cookie("PHPSESSID", sessionId2)
.cookies(login2.cookies())
.method(Connection.Method.POST)
.execute();
System.out.println(login2.statusCode());
doc = login2.parse();
System.out.println(doc);

//获取请求-不工作

doc = Jsoup.connect("https://www.valueresearchonline.com/funds/26123/motilal-oswal-flexi-cap-fund-regular-plan/#fund-portfolio")
.userAgent("Mozilla")
.timeout(15000)
.cookies(loginResponse.cookies())
.get();

Chrome开发工具网络选项卡输出

curl 'https://www.valueresearchonline.com/fund-details/26123/?tab=fund-portfolio' 
-H 'Accept: application/json, text/javascript, */*; q=0.01' 
-H 'Accept-Language: en-US,en;q=0.9' 
-H 'Connection: keep-alive' 
-H 'Cookie: currency=INR; magnitude=LC; ad=78991d1d28c094ebf1f39eb89bdeba08fa7442fb; ad=78991d1d28c094ebf1f39eb89bdeba08fa7442fb; wec=295383799; nobtlgn=789939331; ac=67886089%7C246429140%7C430534802; ac=67886089%7C246429140%7C430534802; _gcl_au=1.1.695297406.1663222951; _gid=GA1.2.274867828.1663222952; _clck=qcxo6j|1|f4w|0; __cf_bm=bzlNVWAtaJiSxUfJ75njw.Zjxxhm_6NdpHRAnt_yZME-1663222953-0-AZ+JMB1vgmANxPS0dbOP5fijqdwMV2dO8gcChGvTkmBsdjKzFC0dMTF8H7zJFtDVwy16hjeygZ224SUimQNMxPmNjen+nfhLNp9v9dHjxMy/ezpdYYa1rYd+7JGe4RS/lA==; alp=VROL; PERMA-ALERT=0; g_state={"i_t":1663309591263,"i_l":0}; PHPSESSID=adcn3ck7fuinlliqnmqco9d4ep; shop-beta=ee1e0e7e3a3617e78e0827d43a83398fd12221b2; aa=364476%7C372053540%7C953882152; aa=364476%7C372053540%7C953882152; arl=801870920; arl=801870920; _clsk=11pwsr9|1663223225068|10|1|l.clarity.ms/collect; _gat_UA-240759-1=1; pgv=17; _ga_N9R425YFBJ=GS1.1.1663222951.1.1.1663223343.60.0.0; _ga=GA1.1.1796800378.1663222952' 
-H 'Referer: https://www.valueresearchonline.com/funds/26123/motilal-oswal-flexi-cap-fund-regular-plan/' 
-H 'Sec-Fetch-Dest: empty' 
-H 'Sec-Fetch-Mode: cors' 
-H 'Sec-Fetch-Site: same-origin' 
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' 
-H 'X-Requested-With: XMLHttpRequest' 
-H 'sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"' 
-H 'sec-ch-ua-mobile: ?0' 
-H 'sec-ch-ua-platform: "macOS"' 
--compressed

尝试的解决方案

  1. jsoup发布和cookie

  2. Jsoup Cookie用于HTTPS抓取

  3. 使用Jsoup登录网站并停留在网站

登录后我会收到cookie,但当我使用cookie从同一网站的另一个url获取信息时;它总是"而不是";已登录状态。

为我寻找您想要构建一个web scraper的方法。为了规避这一点,可能会触犯法律,具体取决于您所在的国家。或者可能是您客户所在的国家

如果允许,请检查基于网络驱动程序的方法,该方法将使用真实的浏览器来解决问题。您无法执行使您的调用成为可能所必需的JavaScript。使用WebDriver或HtmlUnit(不太成功(来解决您的问题。

最新更新