将 StringVector 与 Rcpp 连接起来



我不知道如何使用 Rcpp 连接 2 个字符串;当我怀疑有一个明显的答案时,文档对我没有帮助。

http://gallery.rcpp.org/articles/working-with-Rcpp-StringVector/

http://gallery.rcpp.org/articles/strings_with_rcpp/

StringVector concatenate(StringVector a, StringVector b)
{
 StringVector c;
 c= ??;
 return c;
}

我希望这个输出:

a=c("a","b"); b=c("c","d");
concatenate(a,b)
[1] "ac" "bd"

可能有几种不同的方法可以解决这个问题,但这里有一个std::transform选项:

#include <Rcpp.h>
using namespace Rcpp;
struct Functor {
    std::string
    operator()(const std::string& lhs, const internal::string_proxy<STRSXP>& rhs) const
    {
        return lhs + rhs;
    }
};
// [[Rcpp::export]]
CharacterVector paste2(CharacterVector lhs, CharacterVector rhs)
{
    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(
        res.begin(), res.end(),
        rhs.begin(), res.begin(),
        Functor()
    );
    return wrap(res);
}
/*** R
lhs <- letters[1:2]; rhs <- letters[3:4]
paste(lhs, rhs, sep = "")
# [1] "ac" "bd"
paste2(lhs, rhs)
# [1] "ac" "bd"
*/ 

首先将左手表达式复制到std::vector<std::string>的原因是internal::string_proxy<>类为operator+提供了签名

std::string operator+(const std::string& x, const internal::string_proxy<STRSXP>& y) 

而不是,例如

operator+(const internal::string_proxy<STRSXP>& x, const internal::string_proxy<STRSXP>& y) 

如果您的编译器支持 C++11,则可以稍微干净一些:

// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector paste3(CharacterVector lhs, CharacterVector rhs)
{
    using proxy_t = internal::string_proxy<STRSXP>;
    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(res.begin(), res.end(), rhs.begin(), res.begin(),
        [&](const std::string& x, const proxy_t& y) {
            return x + y;
        }
    );
    return wrap(res);
}
/*** R
lhs <- letters[1:2]; rhs <- letters[3:4]
paste(lhs, rhs, sep = "")
# [1] "ac" "bd"
paste3(lhs, rhs)
# [1] "ac" "bd"
*/

一个有效的解决方案是使用:

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector concatenate(std::string x, std::string y)
{
               return wrap(x + y);
}

然后:

Vconcatenate=Vectorize(concatenate)
Vconcatenate(letters[1:2],letters[3:4])

或:

// [[Rcpp::export]]
CharacterVector concatenate(std::vector<std::string> x,std::vector<std::string> y)
{
  std::vector<std::string> res(x.size());
  for (int i=0; i < x.size(); i++)
  {
    res[i]=x[i]+y[i];
  }
  return wrap(res);
}

我把这个答案留了下来,但请注意@nrussell提供的关于使用push_back()的警告!


我自己仍然在掌握Rcpp,所以我在一个循环中做了一个字符串生成器

library(Rcpp)
cppFunction('StringVector concatenate(StringVector a, StringVector b)
{
  StringVector c;
  std::ostringstream x;
  std::ostringstream y;
 // concatenate inputs
  for (int i = 0; i < a.size(); i++)
    x << a[i];
  for (int i = 0; i < b.size(); i++)
    y << b[i];
  c.push_back(x.str());
  c.push_back(y.str());
  return c;
}')
a=c("a","b"); b=c("c","d");
concatenate(a,b)
# [1] "ab" "cd" 

比较 (i( 重复调用push_back与 (ii( 预分配和填充策略的性能,我们可以看到后者更可取:

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector pbpaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res;
    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res.push_back(oss.str());
    }
    return res;
}
// [[Rcpp::export]]
CharacterVector sspaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res(sz);
    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res[i] = oss.str();
    }
    return res;
}
/*** R
lhs <- as.character(1:5000); rhs <- as.character(5001:10000)
all.equal(pbpaste(lhs, rhs), sspaste(lhs, rhs))
# [1] TRUE
microbenchmark::microbenchmark(
    "push_back" = pbpaste(lhs, rhs),
    "preallocate" = sspaste(lhs, rhs),
    times = 200L
)
# Unit: milliseconds
#         expr        min         lq       mean     median         uq        max neval cld
#    push_back 101.521579 105.334649 115.156544 107.275678 110.957420 256.722239   200   b
#  preallocate   1.364213   1.585818   1.789564   1.778153   1.934758   2.955352   200   a
*/

最新更新