r-curl问题:URL中的括号

  • 本文关键字:URL 问题 r-curl r curl
  • 更新时间 :
  • 英文 :


我有一个矢量URL,我想在Mac OSX上使用curl从R下载:

## URLs
grab = c("http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1079_33994-C81-I620_5-ANI-L056-00001[006154]ready//DA_2011-06-03_STINGA SIMONA_30381371.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1486_67011-C27-I620_6-ANI-L141-00001[045849]ready//DA_2012-05-28_SORIN VASILE_1308151.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34934-C93-I620_6-ANI-L058-00001[005631]ready//DI_2011-05-25_CONSTANTIN CATALIN IONITA_50364334.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1486_66964-C65-I620_5-ANI-L141-00001[045952]ready//DA_2012-05-24_DORINA ORZAC_1312037.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1486_67290-C65-I620_5-ANI-L141-00001[045768]ready//DI_2012-06-01_JIPA CAMELIA_1304833.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34936-C74-I620_7-ANI-L058-00001[005633]ready//DA_2011-06-09_NICOLE MOT_50364493.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34937-C74-I620_7-ANI-L058-00001[005634]ready//DA_2011-06-14_PETRE ECATERINA_50364543.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1566_67978-C85-I780_2-ANI-L144-00001[046398]ready//DA_2012-05-25_RAMONA GHIERGA_1332323.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34936-C74-I620_7-ANI-L058-00001[005633]ready//DA_2011-06-05_LOVIN G. ADINA_50364475.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP2135_40131-C90-I780_3-ANI-L069-00001[009742]ready//DI_2011-05-25_VARTOLOMEI PAUL-CONSTANTIN_467652.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1086_34373-C11-I620_3-ANI-L057-00001[005657]ready//DI_2011-05-16_CAZACU LILIANA_40437536.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34935-C93-I620_6-ANI-L058-00001[005632]ready//DI_2011-06-07_ROSCA EUGEN-CONSTANTIN_50364400.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP181_27399-C11-I780_2-ANI-L051-00001[005421]ready//DI_2010-11-03_DIAMANDI SAVA-CONSTANTIN_40429564.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1151_34936-C74-I620_7-ANI-L058-00001[005633]ready//DI_2011-06-07_ZAMFIRESCU I. IULIA_50364498.pdf", 
         "http://declaratii.integritate.eu/UserFiles/PDFfiles/RP1563_67587-C71-I780_3-ANI-L143-00001[046079]ready//DI_2012-05-21_MAZURU C. EMILIA_1317509.pdf"
)

我的第一次尝试返回HTTP错误400:

## fails on Mac OSX 10.9 (HTTP 400)
## for(x in grab) download.file(x, destfile = gsub("(.*)//D", "D", x))

我了解到这是由于URL中包含括号,所以我以这种方式应用了globoff修复:

## also fails despite fixing HTTP Err 400 (files are zero-sized)
for(x in grab) download.file(x, destfile = gsub("(.*)//D", "D", x), method = "curl", extra = "--globoff")

…并且文件现在已经下载,但都是空的(零大小)。

我做错了什么?

附言:我愿意切换到Python或shell来获取文件,但更愿意保留代码100%R。

您尝试过对括号进行URL编码吗?

%5B=[

%5D=]

有点晚了,但URLencode是用来确保您有一个格式良好的URL的。

> x <- "http://example.com/[xyz]//file with spaces.pdf"
> URLencode(x)
[1] "http://example.com/%5bxyz%5d//file%20with%20spaces.pdf"

最新更新