计算 R 中字符串(不包括空格)中的字符数

我想计算字符串中的字符数（不包括空格），我想知道我的方法是否可以改进。

假设我有：

x <- "hello to you"

我知道nchar()会给我字符串中的字符数（包括空格）：

> nchar(x)
[1] 12

但我想返回以下内容（不包括空格）：

[1] 10

为此，我做了以下工作：

> nchar(gsub(" ", "",x))
[1] 10

我担心的是gsub()会花很长时间来处理许多字符串。这是解决这个问题的正确方法，还是有一种 nchar'esque 函数可以返回字符数而不计算空格？

提前谢谢。

基于理查德的评论，"stringi"在这里将是一个很好的考虑因素：

该方法可以是计算整个字符串长度并减去空格数。

比较以下内容。

library(stringi)
library(microbenchmark)
x <- "hello to you"
x
# [1] "hello to you"
fun1 <- function(x) stri_length(x) - stri_count_fixed(x, " ")
fun2 <- function(x) nchar(gsub(" ", "",x))
y <- paste(as.vector(replicate(1000000, x, TRUE)), collapse = "     ")
microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
#     expr   min    lq     mean median      uq    max neval
#  fun1(x) 5.560 5.988  8.65163  7.270  8.1255 44.047   100
#  fun2(x) 9.408 9.837 12.84670 10.691 12.4020 57.732   100
microbenchmark(fun1(y), fun2(y), times = 10)
# Unit: milliseconds
#     expr        min         lq      mean     median         uq        max neval
#  fun1(y)   68.22904   68.50273   69.6419   68.63914   70.47284   75.17682    10
#  fun2(y) 2009.14710 2011.05178 2042.8123 2030.10502 2079.87224 2090.09142    10

事实上，stringi在这里似乎是最合适的。试试这个：

library(stringi)
x <- "hello to you"
stri_stats_latex(x)

结果：

CharsWord CharsCmdEnvir    CharsWhite         Words          Cmds        Envirs 
       10             0             2             3             0             0

如果你在变量中需要它，你可以通过常规 [i] 访问参数，例如： stri_stats_latex(x)[1]

相关内容

最新更新

热门标签：