R,gsub 在尝试获取超链接的子集时不起作用



我尝试运行如下代码。我想知道为什么 gsub 函数在此输入上不起作用。有人知道为什么以及如何处理此案吗?

> text
[1] <a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4" rel="nofollow">UberSocial for Twitter on iOS</a>
65 Levels: <a href="http://aktualpost.com" rel="nofollow">Aktualpost</a> ...
> start = as.numeric(regexpr(">",text)[[1]])+1
> start
[1] 103
> to_cut = substr(text,1,start-1)
> to_cut
[1] "<a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119?mt=8&uo=4" rel="nofollow">"
> new_text = gsub(to_cut,"",as.character(text))
> new_text
[1] "<a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119?mt=8&uo=4" rel="nofollow">UberSocial for Twitter on iOS</a>"

"to_cut"中有"text"中找不到的?。 如果我们修复了这个问题,它应该可以工作,即在"to_cut"中检查?mt,在"文本"中检查mt

gsub("^<a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4" rel="nofollow">(.*)", "\1", text)
#[1] "UberSocial for Twitter on iOS</a>"

目前尚不清楚OP是如何获得?的"to_cut"

start = as.numeric(regexpr(">",text)[[1]])+1
to_cut <-substr(text,1,start-1)
to_cut
#[1] "<a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4" rel="nofollow">"
gsub(to_cut, "", text)
#[1] "UberSocial for Twitter on iOS</a>"

最新更新