r-为什么从因子变量的文档中强制这个因子变量会返回几个NA

因子文档将此代码作为构建因子变量的第一个示例：

(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters))

上述文件建议如下：

若要将因子f转换为近似其原始数值，建议使用as.numeric(levels(f))[f]，并且其效率略高于as.numeric(as.character(f))。

但当我在他们的例子中尝试这些时，我会得到胡说八道：

> (ff <- factor(substring("statistics", 1:10, 1:10), levels = letters))
[1] s t a t i s t i c s
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> ff
[1] s t a t i s t i c s
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> as.numeric(levels(ff))[ff]
[1] NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion 
> as.numeric(as.character(ff))
[1] NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion

我的误解在哪里？我认为ff因子变量没有任何异常。它肯定有潜在的数字：

> as.integer(ff)
[1] 19 20  1 20  9 19 20  9  3 19

尽管它的级别是字符，但我不认为这有什么奇怪的，任何一个因素的变量总是有字符级别。

一旦你创建了ff，运行这个：table(ff)，它会告诉你每个字母表的频率，即使是那些不存在的字母表，它们的频率也相应地为0。

现在levels(ff)将所有这些字母都作为字符返回，将它们封装在as.numeric(levels(ff))中将始终返回NA。as.numeric(as.character(ff))也是如此。

我的猜测是，你可能混淆了labels和levels。如果您运行labels(ff)，那么您将得到引用的数字1到10。如果使用as.numeric()进行转换。你会得到10个数字的结果。运行：as.numeric(labels(ff))

我希望这能解释你的困惑。否则请告诉我。

输出：

R>table(ff)
ff
a b c d e f g h i j k l m n o p q r 
1 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 
s t u v w x y z 
3 3 0 0 0 0 0 0 
R>levels(ff)
[1] "a" "b" "c" "d" "e" "f" "g" "h"
[9] "i" "j" "k" "l" "m" "n" "o" "p"
[17] "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
R>labels(ff)
[1] "1"  "2"  "3"  "4"  "5"  "6" 
[7] "7"  "8"  "9"  "10"

编辑：

好吧，看来OP在文档中的这一行有问题：

因子的解释取决于代码和"水平"；属性小心只比较相同的因素一组级别(按相同顺序(。特别是，作为数字应用对一个因素来说是没有意义的，并且可能通过隐含的胁迫而发生。到将因子f变换为近似其原始数值，as。numeric(levels(f(([f]是推荐的，效率略高而不是作为数字(作为字符(f((.

现在上面说，如果你有因子(最初是数字(，不要直接将它们转换为数字，例如：

nums <- c(1, 2, 3, 10)
new_fact <- as.factor(nums)

现在，如果我们尝试从new_fact中获取数字并运行as.numeric(new_fact)，我们将得到1,2,3,4(错误(！！！现在这是错误的，所以所有的文档都说要转换为原始数字，必须执行as.numeric(as.character(new_fact))或as.numeric(levels(new_fact))[new_fact]，这两个操作都将返回1 2 3 10。我希望这能帮助

相关内容

最新更新

热门标签：