无法正确地将CURL转换为Ruby请求



我正在尽可能多地从雅虎新闻中抓取数据。由于雅虎正在使用无限分页,我需要找到一种方法来模拟Ruby上的这种行为。我认为当向下滚动时加载更多新闻时,更容易看到调用了哪个URL。以下是我的发现:

网址:http://news.yahoo.com/hjsal?m_mode=multipart&site=新闻&地区=US&lang=en-US&pagetype=小型住宅

标题:有很多,但只有少数是重要的。它们包括:推荐人、主机、来源、Cookie、内容类型

数据:有4个键:_json、_mode、_txnid和_crumb

params = {"_json" => [{"_action"=>"show","batch"=>2,"cat"=>"","catName"=>"","sb"=>0,"ccode"=>"grandSlam_news","woeid"=>28379204,"_subAction"=>"more","items"=>[{"u"=>"18fdc929-d35a-3b57-a253-435b0c9fa1e8","i"=>"4000010183"},{"u"=>"52a2961d-441f-34b7-bfa2-60ee2dae1001","i"=>"4000006954"},{"u"=>"0a6e7572-9eda-3c34-9052-ae58fbf009a2","i"=>"4000006504"},{"u"=>"ba9832a7-30a8-3e58-9a82-667c980db5ef","i"=>"4000009705"},{"u"=>"ff37c89f-4146-310c-ac28-68dea7626396","i"=>"4000007878"},{"u"=>"772cd139-7fec-3107-9bf6-e789c51e333c","i"=>"4000009780"},{"u"=>"20598ae3-7581-3c0d-94b3-55f15e0cace9","i"=>"4000007760"},{"u"=>"a98a5581-a0aa-3e4b-9f7b-b6a8ba16ff2b","i"=>"4000006306"},{"u"=>"00d23e7a-9b03-39ed-b669-f612e051d49f","i"=>"4000005248"},{"u"=>"f5c4f020-c608-32ed-aab2-dca6f20e62e1","i"=>"4000005770"},{"u"=>"ba6ac04c-a94c-3eec-a18a-63c98dabc926","i"=>"4000007324"},{"u"=>"f5a82053-132e-3c5d-abff-1026de72addd","i"=>"4000004708"},{"u"=>"86024916-08d3-3162-b158-9947a7eaeb6e","i"=>"4000009721"},{"u"=>"2486b22c-e8fc-3195-8ad9-26c2c4d1ac4b","i"=>"4000006895"},{"u"=>"9fd886b1-0708-3c75-a70c-5384c3bf2d6e","i"=>"4000008879"},{"u"=>"942dea9b-4686-30a9-b59a-95d4c7c69580","i"=>"4000004912"},{"u"=>"88d6a345-cc86-3c42-a975-729852551041","i"=>"4000004711"},{"u"=>"d2021342-2066-3bdc-8d67-2b07b5051888","i"=>"4000007372"},{"u"=>"b2c80259-6ef9-3622-9016-4625f3d7bf67","i"=>"4000005337"},{"u"=>"071008c4-4fe3-3b98-8695-b8e8ff15c599","i"=>"4000003388"}],"listid"=>"","blogtype"=>"","left"=>130,"key"=>"1","cpos"=>23,"prid"=>"evecfmlai6o1o","_container"=>0,"_id"=>"u_30345786","_txnid"=>1428381836200}],
"_mode" => "json",
"_crumb" => "NugJjVjNCWa",
"_txnid" => 1428381836200
}

我从Chrome开发工具的"网络"选项卡上获得了以上所有信息。此请求的相应CURL(由Developer Tool生成)为:

curl 'http://news.yahoo.com/hjsal?m_mode=multipart&site=news&region=US&lang=en-US&pagetype=minihome' -H 'Cookie: ucs=sfcTs=1425628433&sfc=1; AO=u=1; B=b6dveila7ac17&b=4&d=c_ArSSppYEH6SFcrdazmtHDka6k-&s=b6&i=gQmslnXVPGkRceH.mrGk; F=a=RZpcadUMvSym7INq.f8fhle0_OQ0LB4p7I.8J.56OYwNqahr5jVi6DZsH5vjMWANCHUIoF0-&b=U8Fw; PH=fn=muyarBFoN154nhRrjzs-&i=us; DSS=cnt=1&ts=1425628533; ywandp=10001393120079%3A2763408991; fpc=10001393120079%3AZWeFxmga%7C%7C; ugccmt_comment_usortoptv1=highestRated; ywadp115488662=2349330862; fpms=; yvapF=%7B%22cc%22%3A1%2C%22rcc%22%3A1%2C%22vl%22%3A47.129999999999995%2C%22rvl%22%3A47.129999999999995%7D' -H 'Origin: http://news.yahoo.com' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' -H 'Referer: http://news.yahoo.com/' --data '_crumb=NugJjVjNCWa&_mode=json&_txnid=1428540838640&_json=%5B%7B%22_action%22%3A%22show%22%2C%22batch%22%3A2%2C%22cat%22%3A%22%22%2C%22catName%22%3A%22%22%2C%22sb%22%3A0%2C%22ccode%22%3A%22grandSlam_news%22%2C%22woeid%22%3A28289488%2C%22_subAction%22%3A%22more%22%2C%22items%22%3A%5B%7B%22u%22%3A%220d9b1c7e-bfe6-3d71-b0fc-e31aa6a4ac78%22%2C%22i%22%3A%224000006886%22%7D%2C%7B%22u%22%3A%22d6291cfc-70e2-331d-b202-7d3e89d43275%22%2C%22i%22%3A%224000009863%22%7D%2C%7B%22u%22%3A%22d3c3af9f-cf8f-3d4e-a642-0070528819e0%22%2C%22i%22%3A%224000006408%22%7D%2C%7B%22u%22%3A%22104e5284-2998-3d7c-aaf6-9ceda966699d%22%2C%22i%22%3A%224000008669%22%7D%2C%7B%22u%22%3A%225f5e58cd-4e73-34f0-b1e3-59b8ed0d6dd2%22%2C%22i%22%3A%224000006338%22%7D%2C%7B%22u%22%3A%228f9ce219-79e9-33f8-baa1-1ddea51aed5a%22%2C%22i%22%3A%224000005997%22%7D%2C%7B%22u%22%3A%22683a22bb-e622-31b9-8c1d-d872db4dbf71%22%2C%22i%22%3A%224000007873%22%7D%2C%7B%22u%22%3A%222bd68c78-9645-34cf-8ba4-08ac328ffc60%22%2C%22i%22%3A%224000005723%22%7D%2C%7B%22u%22%3A%2261ed0f69-1e95-3264-baae-0d635606c9c4%22%2C%22i%22%3A%224000006596%22%7D%2C%7B%22u%22%3A%22e86ec903-edf8-3f46-a27a-4c10cdee8f19%22%2C%22i%22%3A%224000007547%22%7D%2C%7B%22u%22%3A%22d4f5eaea-3d28-36de-89a3-b9bb3fc79834%22%2C%22i%22%3A%224000009036%22%7D%2C%7B%22u%22%3A%228a05f4f4-e162-3586-96f1-0b7212a1ebd9%22%2C%22i%22%3A%224000006790%22%7D%2C%7B%22u%22%3A%22b49da9d0-a9ec-3c49-afbe-da1da9c3dfc5%22%2C%22i%22%3A%224000007305%22%7D%2C%7B%22u%22%3A%22f5fe6bcf-77fb-3e9e-bd56-a62027d604f1%22%2C%22i%22%3A%224000005595%22%7D%2C%7B%22u%22%3A%22b644c626-a6a4-3ab7-9d5d-737ee62fa492%22%2C%22i%22%3A%224000007357%22%7D%2C%7B%22u%22%3A%22bfa50721-be1b-38e2-94d1-e8728d112b0c%22%2C%22i%22%3A%224000008326%22%7D%2C%7B%22u%22%3A%22b9e49778-e5dd-3d84-9eb4-6a2681d81a93%22%2C%22i%22%3A%224000006213%22%7D%2C%7B%22u%22%3A%226073c345-05da-3fba-a6d0-f8699fe0e961%22%2C%22i%22%3A%224000005885%22%7D%2C%7B%22u%22%3A%22104a6bec-1244-39f8-ab7c-0586a1a5bd6d%22%2C%22i%22%3A%224000007645%22%7D%2C%7B%22u%22%3A%221280a6da-4b77-3543-8ded-ee62fcef6f9e%22%2C%22i%22%3A%224000007605%22%7D%5D%2C%22listid%22%3A%22%22%2C%22blogtype%22%3A%22%22%2C%22left%22%3A130%2C%22key%22%3A%221%22%2C%22cpos%22%3A24%2C%22prid%22%3A%223mmkn1daibjbe%22%2C%22_container%22%3A0%2C%22_id%22%3A%22u_30345786%22%2C%22_txnid%22%3A1428540838640%7D%5D' --compressed

基于这个CURL请求,我想出了以下Ruby代码来抓取数据:

require 'net/http'
Require 'json'
params = {"_json" => URI.encode([{"_action"=>"show","batch"=>2,"cat"=>"","catName"=>"","sb"=>0,"ccode"=>"grandSlam_news","woeid"=>28379204,"_subAction"=>"more","items"=>[{"u"=>"18fdc929-d35a-3b57-a253-435b0c9fa1e8","i"=>"4000010183"},{"u"=>"52a2961d-441f-34b7-bfa2-60ee2dae1001","i"=>"4000006954"},{"u"=>"0a6e7572-9eda-3c34-9052-ae58fbf009a2","i"=>"4000006504"},{"u"=>"ba9832a7-30a8-3e58-9a82-667c980db5ef","i"=>"4000009705"},{"u"=>"ff37c89f-4146-310c-ac28-68dea7626396","i"=>"4000007878"},{"u"=>"772cd139-7fec-3107-9bf6-e789c51e333c","i"=>"4000009780"},{"u"=>"20598ae3-7581-3c0d-94b3-55f15e0cace9","i"=>"4000007760"},{"u"=>"a98a5581-a0aa-3e4b-9f7b-b6a8ba16ff2b","i"=>"4000006306"},{"u"=>"00d23e7a-9b03-39ed-b669-f612e051d49f","i"=>"4000005248"},{"u"=>"f5c4f020-c608-32ed-aab2-dca6f20e62e1","i"=>"4000005770"},{"u"=>"ba6ac04c-a94c-3eec-a18a-63c98dabc926","i"=>"4000007324"},{"u"=>"f5a82053-132e-3c5d-abff-1026de72addd","i"=>"4000004708"},{"u"=>"86024916-08d3-3162-b158-9947a7eaeb6e","i"=>"4000009721"},{"u"=>"2486b22c-e8fc-3195-8ad9-26c2c4d1ac4b","i"=>"4000006895"},{"u"=>"9fd886b1-0708-3c75-a70c-5384c3bf2d6e","i"=>"4000008879"},{"u"=>"942dea9b-4686-30a9-b59a-95d4c7c69580","i"=>"4000004912"},{"u"=>"88d6a345-cc86-3c42-a975-729852551041","i"=>"4000004711"},{"u"=>"d2021342-2066-3bdc-8d67-2b07b5051888","i"=>"4000007372"},{"u"=>"b2c80259-6ef9-3622-9016-4625f3d7bf67","i"=>"4000005337"},{"u"=>"071008c4-4fe3-3b98-8695-b8e8ff15c599","i"=>"4000003388"}],"listid"=>"","blogtype"=>"","left"=>130,"key"=>"1","cpos"=>23,"prid"=>"evecfmlai6o1o","_container"=>0,"_id"=>"u_30345786","_txnid"=>1428381836200}].to_json),
"_mode" => "json",
"_crumb" => "NugJjVjNCWa",
"_txnid" => 1428381836200
}
cookie = 'ucs=sfcTs=1425628433&sfc=1; AO=u=1; B=b6dveila7ac17&b=4&d=c_ArSSppYEH6SFcrdazmtHDka6k-&s=b6&i=gQmslnXVPGkRceH.mrGk; F=a=RZpcadUMvSym7INq.f8fhle0_OQ0LB4p7I.8J.56OYwNqahr5jVi6DZsH5vjMWANCHUIoF0-&b=U8Fw; PH=fn=muyarBFoN154nhRrjzs-&i=us; DSS=cnt=1&ts=1425628533; ywandp=10001393120079%3A2763408991; fpc=10001393120079%3AZWeFxmga%7C%7C; ugccmt_comment_usortoptv1=highestRated; ywadp115488662=2349330862; fpms=; yvapF=%7B%22cc%22%3A1%2C%22rcc%22%3A1%2C%22vl%22%3A47.129999999999995%2C%22rvl%22%3A47.129999999999995%7D'
uri = URI('http://news.yahoo.com/hjsal?m_mode=multipart&site=news&region=US&lang=en-US&pagetype=minihome')
req = Net::HTTP::Post.new(uri.path)
req.add_field('Referer', 'http://news.yahoo.com/')
req.add_field('Origin', 'http://news.yahoo.com')
req.add_field('Host', 'news.yahoo.com')
req.add_field('Cookie', cookie)
req.add_field('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
req.body = URI.encode(params.to_json)
http = Net::HTTP.new(uri.host,uri.port)
res = http.request(req)
p res.body

但当运行在Ruby代码之上时,它总是返回错误消息作为响应(CURL请求非常有效)。我不太确定这里出了什么问题,我希望你们能给我一些提示来解决这个问题。

响应(如果您复制URL并将其粘贴到浏览器上,则此错误相同):

<div class="oops-msg" role="alertdialog">n<span class="icon icon-error y-glbl-universal"></span>n<h3 class="oops">The module encountered a problem while trying to load</h3>n                <p class="oops">App is currently not available. Please, try again.</p>n                n            </div><!-- hdf50.fp.gq1.yahoo.com compressed/chunked Wed Apr 15 08:15:39 UTC 2015 -->n

此外,我甚至无法在Postman客户端(启用了Interceptor的新版本)中模拟此请求。

非常感谢

您的Ruby代码正在指定一个Post请求,但您的curl没有。尝试替换此行:

req = Net::HTTP::Post.new(uri.path)

这个:

req = Net::HTTP::Get.new(uri.path)

最新更新