Hunspell解决方法Empty建议r中的越界错误

我正在尝试自动拼写检查data.table/data.frame的字符串列。

环顾四周，我发现有几种方法都给出了"out of bounds"字样。如果hunspell.suggest没有返回任何建议(即一个空列表，例如"pippasnjfjsfiadjg")，请参阅此处的方法(此处接受的答案产生NA，因此原则上有效)和此处

我们似乎需要unlist来识别这些空建议，然后将它们从选择第一个建议的代码部分排除，但我不知道如何。

library(dplyr)
library(stringi)
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("pippasnjfjsfiadjg came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"),
stringsAsFactors = FALSE)
# Get bad words.
badwords <- hunspell(df1$Text) %>% unlist
# Extract the first suggestion for each bad word.
suggestions <- sapply(hunspell_suggest(badwords), "[[", 1)
mutate(df1, Text = stri_replace_all_fixed(str = Text,
pattern = badwords,
replacement = suggestions,
vectorize_all = FALSE)) -> out

你需要过滤坏词和建议列表，删除那些没有建议的

badwords <- hunspell(df1$Text) %>% unlist()
# note use of '[' rather than '[['
suggestions <- sapply(hunspell_suggest(badwords), '[', 1)
badwords <- badwords[!is.na(suggestions)]
suggestions <- suggestions[!is.na(suggestions)]

相关内容

最新更新

热门标签：