r语言 - 简单的 Rcpp 函数,带有 try catch 返回'memory not mapped'错误



Background

该函数有一个简单的任务,即迭代因子元素并尝试将每个元素转换为双精度、整数,最后将其保留为字符。每次计数时,相应的计数器都会增加。在末尾返回与最大计数器对应的字符串。

理由

这主要是一个学习示例。我遇到了一个凌乱的data.frame,其中包含一些我想使用的数据保存为因子。变量实际上是双精度、整数或字符串。我想把它们带到这些类型。有更好的方法可以在基础R中完成,但这个问题看起来是学习更多rcpp的好机会。

法典

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//'   guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
int converted_integer;

// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element]);
// Try converting to an integer
try
{
converted_integer = std::stoi(temp);
}
catch(...)
{
// Try converting to a doubke
try
{
// Convert to ineteges
converted_double = std::stod(temp);
}
catch(...)
{
++num_integers;
}
++num_doubles;
}
++num_strings;
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;

// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}

测试

test_factor <- as.factor(rep(letters, 3))

应返回标量字符串"character"

错误

guess_vector_type(test_factor)
*** caught segfault ***
address 0xe1000013, cause 'memory not mapped'

我知道这类似于这里讨论的问题,但我不清楚错误在哪里。


更新

在评论之后,我更新了函数:

Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
// flag for runnig more tests
bool is_number;
// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element - 1]);
// Reset number checking flag
is_number = 1;
// Attempt conversion to double
try {
converted_double = std::stod(temp);
} catch(...) {
// Conversion failed, increase string count
++num_strings;
// Do not run more test
is_number = 0;
}
// If number run more tests
if (is_number == 1) {
// Check if converted string is an integer
if(floor(converted_double) == converted_double) {
// Increase counter for integer
++num_integers;
} else {
// Increase count for doubles
++num_doubles;
}
}
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;

// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}
测试
>> guess_vector_type(x = as.factor(letters))
[1] "character"
>> guess_vector_type(as.factor(1:10))
[1] "integer"
>> guess_vector_type(as.factor(runif(n = 1e3)))
[1] "double"

导致段错误的问题出在此行上

std::string temp = Rcpp::as<std::string>(levels[element]);

由于 R 是 1 索引的,因此您需要

std::string temp = Rcpp::as<std::string>(levels[element - 1]);

但是,我还注意到您在错误的位置递增计数器(您需要在最里面的 catch 中增加字符串,在捕获之外增加整数(,并且在增量之后需要继续语句(否则您最终会执行不适用的增量除了您想要做的增量(。修复这些问题后,代码将在测试用例上按预期运行(但请参阅最后有关双精度与整数的更新(。

guess_vector_type(test_factor)
# [1] "character"

完整的工作代码是

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//'   guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
int converted_integer;

// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element - 1]);
// Try converting to an integer
try
{
converted_integer = std::stoi(temp);
}
catch(...)
{
// Try converting to a doubke
try
{
// Convert to ineteges
converted_double = std::stod(temp);
}
catch(...)
{
++num_strings;
continue;
}
++num_doubles;
continue;
}
++num_integers;
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;

// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}

更新

我在更多例子中尝试了它,发现它对双精度工作并不如预期,因为该程序能够将"42.18"转换为整数(例如(。不过,它确实清楚地区分了整数/双精度和字符:

test_factor <- as.factor(rep(letters, 3))
guess_vector_type(test_factor)
# [1] "character"
test_factor <- as.factor(1:3)
guess_vector_type(test_factor)
# [1] "integer"
test_factor <- as.factor(c(letters, 1))
guess_vector_type(test_factor)
# [1] "character"
test_factor <- as.factor(c(1.234, 42.1138, "a"))
guess_vector_type(test_factor)
# [1] "integer"

无论如何,这是一个与问题中提出的问题完全不同的问题,例如,您可能需要查阅此 Stack Overflow 帖子。

最新更新