消除 R 中字符串中最后两个单词之间的间隙



我正在尝试删除包含多个字符串的数据帧中最后两个单词之间的间隙。我尝试使用gsub但我gsub("(\s){1}$","",df1$V1)尝试似乎很错误! df1是我的数据集,df2是我追求的结果。

df1 <- data.frame(V1=c("Apple Pear Orange, AAA 111", "Grapes Banana Pear . BBB 222", "Orange Kiwi Melon , CCC 333", "Apple DDD 444", "Kiwi Melon Orange CCC 333", "Apple Pear Orange, AAA 111", "Tomato Cucumber EEE 222", "Seagull Pigeon ZZZ 111" ), stringsAsFactors = F)
df2 <- data.frame(V1=c("Apple Pear Orange, AAA111", "Grapes Banana Pear . BBB222", "Orange Kiwi Melon , CCC333", "Apple DDD444", "Kiwi Melon Orange CCC333", "Apple Pear Orange, AAA111", "Tomato Cucumber EEE222", "Seagull Pigeon ZZZ111" ), stringsAsFactors = F)

甚至这个:

gsub("(.*)\s","\1",df1$V1)

可以使用捕获组:

sub("(.*)\s+([^\s]+)$", "\1\2", df1$V1)
#[1] "Apple Pear Orange, AAA111"   "Grapes Banana Pear . BBB222" "Orange Kiwi Melon , CCC333"  "Apple DDD444"               
#[5] "Kiwi Melon Orange CCC333"    "Apple Pear Orange, AAA111"   "Tomato Cucumber EEE222"      "Seagull Pigeon ZZZ111" 

这将捕获第一组任意数量的字符,然后捕获 1+ 空格和第二组 1+ 字符,直到字符串末尾都不是空格。然后,它仅提取两个捕获组,中间没有空格。

离开Docendo的答案,你可以使用\w+来匹配任何长度的单词:

gsub("(\w+)\s+(\w+$)", "\1\2" ,df1$V1)
#[1] "Apple Pear Orange, AAA111"   "Grapes Banana Pear . BBB222" "Orange Kiwi Melon , CCC333" 
#[4] "Apple DDD444"                "Kiwi Melon Orange CCC333"    "Apple Pear Orange, AAA111"  
#[7] "Tomato Cucumber EEE222"      "Seagull Pigeon ZZZ111"

然后,您可以对捕获组使用相同的想法。

最新更新