我有一个特殊的用例,我需要经常设置键数据中单行的值。目前我正在使用:=
符号,但在帮助页中阅读,有一些情况下,set()
可以更快。
对于键的data.tables是这样吗?或者有一种方法来使用set()
键的数据表?我想我不太清楚到底发生了什么。
library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
head(mt)
#> rn mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#> 2: Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#> 3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#> 4: Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#> 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 6: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
mt["AMC Javelin", mpg := -10] # want to do this, but faster?
head(mt)
#> rn mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: AMC Javelin -10.0 8 304 150 3.15 3.435 17.30 0 0 3 2
#> 2: Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#> 3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#> 4: Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#> 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 6: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
set(mt, "AMC Javelin", 2L, -10) # this doesn't work
#> Error in set(mt, "AMC Javelin", 2L, -10): i is type 'character'. Must be integer, or numeric is coerced with warning. If i is a logical subset, simply wrap with which(), and take the which() outside the loop if possible for efficiency.
set(mt, 1L, 2L, -10) # this would work if I could get the row number of a given key...
由reprex包(v0.3.0)创建于2021-08-06
更新:Ronak Shah和sindri_baldur的回答和评论对我提出的问题很有帮助(参见下面的基准测试)。不幸的是,我认为我的简单示例与我拥有的实际用例不匹配。在我的情况下,有多个键列,因此match
和chmatch
不起作用。有没有一个解决方案可以处理数据。有多个键列的表?
library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
library(microbenchmark)
# Original question
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
key <- "AMC Javenlin"
microbenchmark(
mt[key, mpg := -10],
set(mt, 1L, 2L, -10),
set(mt, match(key, mt$rn), 2L, -10),
set(mt, chmatch(key, mt$rn), 2L, -10)
)
#> Unit: microseconds
#> expr min lq mean median
#> mt[key, `:=`(mpg, -10)] 490.129 568.7480 746.67525 619.0085
#> set(mt, 1L, 2L, -10) 1.597 1.8980 4.17609 2.8475
#> set(mt, match(key, mt$rn), 2L, -10) 3.104 3.7130 6.60660 4.9275
#> set(mt, chmatch(key, mt$rn), 2L, -10) 2.740 3.3025 5.27118 4.3200
#> uq max neval cld
#> 701.094 8996.071 100 b
#> 4.298 87.451 100 a
#> 7.726 45.807 100 a
#> 7.002 11.811 100 a
我的情况更接近于此,那里有多个键…
dt <- CJ(a = 1:10, b = 1:10, c = 1:60)
setkey(dt)
dt$d <- NA
key <- list(a = 2, b = 7, c = 35)
microbenchmark(
{ dt[key, d := 1] },
{ set(dt, 1L, 4L, 1)}
)
#> Unit: microseconds
#> expr min lq mean median uq
#> { dt[key, `:=`(d, 1)] } 634.125 666.5825 768.59937 756.9030 819.7585
#> { set(dt, 1L, 4L, 1) } 2.019 2.5355 3.95986 3.9325 4.6590
#> max neval cld
#> 1171.794 100 b
#> 22.945 100 a
match(key, dt[, .(a, b, c)]) # doesn't work
#> [1] NA NA NA
chmatch(key, dt[, .(a, b, c)]) # doesn't work
#> Error in chmatch(key, dt[, .(a, b, c)]): table is type 'list' (must be 'character' or NULL)
由reprex包(v0.3.0)创建于2021-08-06
您可以使用match
来获取键的行号
library(data.table)
set(mt, match("AMC Javelin", mt$rn), 2L, -10)
head(mt)
# rn mpg cyl disp hp drat wt qsec vs am gear carb
#1: AMC Javelin -10.0 8 304 150 3.15 3.435 17.30 0 0 3 2
#2: Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#4: Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#6: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2