所以我有一个数据框架如下。我想提取列表中列出的所有单词&;x&;从"phone"栏中取出,并将它们放入"vowel"栏中。
x <- C("IH","EH","AE","AH","OH","UH","IY","EY","AY","OY","AW","OW","ER","AA","AO")
df
word phone vowel
THERE DH, EH, AH NA
MUSHROOM M, AH, SH, R, UW, M NA
YOU Y, UW NA
IT'S IH, T, S NA
预期结果如下:
df
word phone vowel
THERE DH, EH, AH EH, AH
MUSHROOM M, AH, SH, R, UW, M AH
YOU Y, UW
IT'S IH, T, S IH
我尝试了下面的代码,但它只输出"AH">
for (i in df$phone){
+ if (i %in% x){
+ df$vowel <- i
+ }
+ }
有人能帮我一下吗?提前感谢!
df$vowel=sapply(
df$phone,
function(a){
b=strsplit(a,", ")[[1]]
paste(b[b %in% x],collapse=", ")
}
)
word phone vowel
1 THERE DH, EH, AH EH, AH
2 MUSHROOM M, AH, SH, R, UW, M AH
3 YOU Y, UW
4 IT'S IH, T, S IH
如果你在手机变量
中有一些奇怪的东西df=rbind(df,c('MOM','c("EH", "IH", "OW"), c("EH", "AH")...',NA))
在分割字符串之前添加gsub('([c\()..."])*([A-Z]+)*',"\2",a)
。
你可以试试
d1$vowel <- stringr::str_extract_all(d1$phone, paste(x, collapse = '|'))
> d1
word phone vowel
1 THERE DH,EH,AH EH, AH
2 MUSHROOM M,AH,SH,R,UW,M AH
3 YOU Y,UW
4 ITS IH,T,S IH
你好,因为在你的问题中没有R代码来生成df,这里有一些例子。希望对大家有帮助。
a="DH, EH, AH"
b=c('AH', 'EH'.'OO')
sapply(b, FUN=function(b) grepl(pattern=b, x=a))
AH EH
TRUE TRUE