,因为我的输入经常使用德语,但我希望代码是纯英语,我想拥有一个简短的自定义词典 - 基本上是由工作日和几个月的缩写组成。因此,我想创建一个快速的英语 - 德语(和Vise Versa)词典 - 理想情况下是具有parent environment = .GlobalEnv
的环境。但是,当我将代码放在函数中时,dict_g2e
字典已不复存在。
set_dict <- function() { # Delete this line and ...
dict_g2e <- new.env(hash = TRUE, size = 7)
from <- c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa")
to <- c("Sun", "Mon", "Tues", "Wed", "Thurs", "Fri", "Sat")
for (i in 1:19) {
assign(x = from[i], value = to[i], envir = dict_g2e)
} # this line and the code is working as expected
测试:
> get("So", env = dict_g2e) # ran without the set_dict <- function() {...} part
[1] "Sun"
- 错误在哪里?
- 我会使用
dict_e2g
做同样的事情。有更快的速度吗?做这个较短的方法? - 是否有比
get("So", env = dict_g2e)
更好的命令?是否有任何反对g2e <- function(wd) {get(wd, envir = dict_g2e)}
的论点
在@Roland和@Alexis_Laz 的评论后进行编辑:
df_dict <- function() {
df <- data.frame(german = c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa"),
english = c("Sun", "Mon", "Tues", "Wed", "Thurs", "Fri", "Sat"),
stringsAsFactors = F)
return(df)
}
df <- df_dict()
df_g2e <- function(wd) {
df$english[which(df$german == wd)]
}
微学分:
print(summary(microbenchmark::microbenchmark(
g2e("So"),
df_g2e("So"),
times = 1000L, unit = "us")))
}
和结果:
expr min lq mean median uq max neval
g2e("So") 1.520 2.280 2.434178 2.281 2.661 17.106 1000
df_g2e("So") 12.545 15.205 16.368450 15.966 16.726 55.500 1000
您可以使用闭合:
dict <- function() { # Delete this line and ...
dict_g2e <- new.env(hash = TRUE, size = 7)
from <- c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa")
to <- c("Sun", "Mon", "Tues", "Wed", "Thurs", "Fri", "Sat")
for (i in 1:19) {
assign(x = from[i], value = to[i], envir = dict_g2e)
}
function(from) {
dict_g2e[[from]]
}
}
wdays1 <- dict()
wdays1("So")
#[1] "Sun"
但是,向量子集更快:
wdays2 <- setNames(c("Sun", "Mon", "Tues", "Wed", "Thurs", "Fri", "Sat"),
c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa"))
并在全球环境中定义环境的速度更快:
wdays3 <- list2env(as.list(wdays2), hash = TRUE)
library(microbenchmark)
microbenchmark(for (i in seq_len(1e3)) wdays1("Mi"),
for (i in seq_len(1e3)) wdays2[["Mi"]],
for (i in seq_len(1e3)) wdays3[["Mi"]])
#Unit: microseconds
# expr min lq mean median uq max neval cld
# for (i in seq_len(1000)) wdays1("Mi") 434.045 488.205 520.6626 507.0265 516.2455 2397.108 100 c
# for (i in seq_len(1000)) wdays2[["Mi"]] 182.324 211.005 214.6720 215.9985 217.9190 239.173 100 b
# for (i in seq_len(1000)) wdays3[["Mi"]] 141.609 164.143 167.1088 168.2410 169.7770 190.007 100 a
但是,向量方法有明显的优势:它已被矢量化。
wdays2[c("So", "Do")]
# So Do
# "Sun" "Thurs"
如果您想在两个方向上翻译,使用data.frame将是自然方法,但是数据帧子集相当慢。您可以使用两个命名向量,一个用于每个方向。