我有一个表,其中两列的值在0到1之间。例如:
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
value1 value2
0.2875775 0.95683335
0.7883051 0.45333416
0.4089769 0.67757064
0.8830174 0.57263340
0.9404673 0.10292468
0.0455565 0.89982497
0.5281055 0.24608773
0.8924190 0.04205953
0.5514350 0.32792072
0.4566147 0.95450365
我想使用数据。表创建一个新的二进制列,将value2和value1之间差异最大的x
行赋值为1。我可以"有所作为"。像这样的列:
table[,difference:=value1-value2]
我可以用order
和tail
找到x
最大的差异,例如,如果x
是5:
x<-5
table[order(difference), tail(.SD, x)]
但是我还没能想出一种方法来将这些与ifelse
或case_when
之类的东西结合起来,将x
最大的差异分配为1,将其余的分配为0。
setorderv(table, "difference", order = -1)
table[, large := 0]
x <- 5
table[1:x, large := 1]
我希望这能解决你的问题:
library(data.table)
library(dtplyr)
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
#> value1 value2
#> 1: 0.2875775 0.95683335
#> 2: 0.7883051 0.45333416
#> 3: 0.4089769 0.67757064
#> 4: 0.8830174 0.57263340
#> 5: 0.9404673 0.10292468
#> 6: 0.0455565 0.89982497
#> 7: 0.5281055 0.24608773
#> 8: 0.8924190 0.04205953
#> 9: 0.5514350 0.32792072
#> 10: 0.4566147 0.95450365
x <- 5
table <- table %>%
lazy_dt() %>%
mutate(difference = value1 - value2) %>%
arrange(desc(difference)) %>%
mutate(difference = ifelse(test = row_number() <= x, yes = 1, no = 0)) %>%
as.data.table()
table
#> value1 value2 difference
#> 1: 0.8924190 0.04205953 1
#> 2: 0.9404673 0.10292468 1
#> 3: 0.7883051 0.45333416 1
#> 4: 0.8830174 0.57263340 1
#> 5: 0.5281055 0.24608773 1
#> 6: 0.5514350 0.32792072 0
#> 7: 0.4089769 0.67757064 0
#> 8: 0.4566147 0.95450365 0
#> 9: 0.2875775 0.95683335 0
#> 10: 0.0455565 0.89982497 0
问候,m .
由reprex包(v2.0.1)于20121-10-11创建