我正在尝试创建一个偏好或计数的正方形矩阵(实际上并不重要)。
假设我有以下data.table
可以使用:
library(data.table)
segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)
df <- data.table(athlete,segment,time)
df[,time_diff:=min(time)-time,by=segment]
df[,winner:=athlete[1],by=segment]
athlete segment time time_diff winner
1: gunnar track 54 0 gunnar
2: brandon track 56 -2 gunnar
3: raphael track 57 -3 gunnar
4: raphael round 23 0 raphael
5: ben round 25 -2 raphael
6: brandon round 28 -5 raphael
7: brandon sprint 15 0 brandon
8: raphael sprint 16 -1 brandon
9: ben sprint 19 -4 brandon
10: gunnar sprint 26 -11 brandon
names <- unique(df$athlete)
[1] "gunnar" "brandon" "raphael" "ben"
现在,我想在运动员身上拥有一个方形矩阵,这表明他们对每首曲目的获胜者的时间,类似于此:
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben -2 -4 0 0
在我的脑海中,我有一些想法可以解决这个问题,但似乎没有任何努力。我来自Matlab背景,我只是迭代了,但是我觉得这不是data.table
的方法。
我觉得我应该能够在运动员上使用foreach
迭代来完成它。沿着:
foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
[[1]]
winner pref
1: gunnar 0
2: brandon -11
[[2]]
winner pref
1: gunnar -2
2: raphael -5
3: brandon 0
[[3]]
winner pref
1: gunnar -3
2: raphael 0
3: brandon -1
[[4]]
winner pref
1: raphael -2
2: brandon -4
但是,在这一点上,我不确定如何进行。我有一些最初的想法,创建了批准的lenght vec <- vector(mode="double", length=length(names))
的向量,然后使用 which(names %in% df[,winner,by=IREALLYDONTKNOW])
进行索引,但是如您所见,我尚不清楚如何正确处理它。
如果有人会给我一些有关正确data.table
方法的提示,我将非常感激。
运行代码时不会产生打印的表,我认为您正在寻找的是dcast.data.table
:
dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))
我解决的方式实际上很容易,经过一定的意识:
names <- unique(df$athlete)
vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))
pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)
> vec
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben 0 -4 -2 0