我有一个数据框架,其中一列充满单元格,看起来像这样:
"***ORDER LIST***nCustomer: Lucillenitem1: applesnitem2: oranges"
"***ORDER LIST***nCustomer: Frank and Sallynitem1: winenitem2: milk"
"***ORDER LIST***nnnitem1: winenitem2: milk"
我正在尝试对每个单元格进行消毒,删除以单词Customer开头的整行,或者如果不存在,则删除第一个空白行。
我希望最终得到这样的经过处理的文本数据:
"***ORDER LIST***nitem1: applesnitem2: oranges"
"***ORDER LIST***nitem1: winenitem2: milk"
"***ORDER LIST***nitem1: winenitem2: milk"
使用gsub
是否有一种方法可以摆脱空白行和包含Customer的整行?
试试这样写:
text<-c("***ORDER LIST***nCustomer: Lucillenitem1: applesnitem2: oranges",
"***ORDER LIST***nCustomer: Frank and Sallynitem1: winenitem2: milk",
"***ORDER LIST***nnnitem1: winenitem2: milk")
gsub("Customer: .*?\n|\n\n", " ", text)
[1] "***ORDER LIST***n item1: applesnitem2: oranges" "***ORDER LIST***n item1: winenitem2: milk"
[3] "***ORDER LIST*** nitem1: winenitem2: milk"
这对你有用吗?
gsub("(.*\*).*?(nitem.*)", "\1\2", text)
[1] "***ORDER LIST***nitem1: applesnitem2: oranges" "***ORDER LIST***nitem1: winenitem2: milk"
[3] "***ORDER LIST***nitem1: winenitem2: milk"