使用LETTERS[1:3]创建伪数据，将其作为不同长度df上的新功能回收

我发现这非常具有挑战性。

示例：

library(tidyverse)
diamonds %>% mutate(DummyCategory = LETTERS[1:3])
Error: Problem with `mutate()` input `DummyCategory`.
x Input `DummyCategory` can't be recycled to size 53940.
ℹ Input `DummyCategory` is `LETTERS[1:3]`.
ℹ Input `DummyCategory` must be size 53940 or 1, not 3.

试过的基本R

my_diamonds <- diamonds
> my_diamonds$DummyCategory <- LETTERS[1:3]
Error: Assigned data `LETTERS[1:3]` must be compatible with existing data.
x Existing data has 53940 rows.
x Assigned data has 3 rows.
ℹ Only vectors of size 1 are recycled.

我想要一个新的列DummyCategory作为值a、B、C，回收到数据帧的长度，如果a、B和C不能被不同频率的nrow整除也没关系，我只想回收到钻石中的所有行都有一个新DummyCatagory。我该怎么做？

如mutate()的文档中提供的，值可以是：

长度为1的向量，它将被回收到正确的长度。

与当前组(或整个数据帧(长度相同的矢量如果未分组(。

NULL，以删除该列。

数据帧或tibble，用于在输出中创建多列。

因此，它不会将向量循环到df的长度。然而，你可以这样做：

diamonds %>% 
mutate(DummyCategory = rep(LETTERS[1:3], length.out = n()))
carat cut       color clarity depth table price     x     y     z DummyCategory
<dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>        
1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 A            
2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 B            
3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31 C            
4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 A            
5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75 B            
6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 C            
7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47 A            
8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53 B            
9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 C            
10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39 A

实现这一点的一种方法是：

#Code
diamonds$DummyCategory <- rep(LETTERS[1:3],dim(diamonds)[1]/length(LETTERS[1:3]))

输出：

# A tibble: 53,940 x 11
carat cut       color clarity depth table price     x     y     z DummyCategory
<dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>        
1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 A            
2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 B            
3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31 C            
4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 A            
5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75 B            
6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 C            
7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47 A            
8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53 B            
9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 C            
10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39 A            
# ... with 53,930 more rows

另一种选择是将data.frame()与数据和矢量一起使用：

diamonds <- data.frame(diamonds,DummyCategory=LETTERS[1:3])

输出：

carat       cut color clarity depth table price    x    y    z DummyCategory
1   0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43             A
2   0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31             B
3   0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31             C
4   0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63             A
5   0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75             B
6   0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48             C
7   0.24 Very Good     I    VVS1  62.3    57   336 3.95 3.98 2.47             A
8   0.26 Very Good     H     SI1  61.9    55   337 4.07 4.11 2.53             B
9   0.22      Fair     E     VS2  65.1    61   337 3.87 3.78 2.49             C
10  0.23 Very Good     H     VS1  59.4    61   338 4.00 4.05 2.39             A

相关内容

最新更新

热门标签：