假设我有一些(图边)数据可以像这样生成:
data.gen = function(g,n){
m = (length(g)-1)*n
gm = rep(g,each=m)
x = data.frame(g=gm,v1=paste0(gm,1,seq(m)),v2=paste0(gm,2,seq(m)))
}
。data.gen(c('a','b','c'),1)
生产:
g v1 v2
a a11 a21
a a12 a22
b b11 b21
b b12 b22
c c11 c21
c c12 c22
我想重新分配(重新排序)列v2
中的数据,以便每个组g
的所有值最终在对应于不同组的行中。为了便于理解,我将在不相交的子图(独立的"岛屿")之间重新连接随机数的边。相互连接的节点)。在上面的例子中,一个解决方案是:
g v1 v2
a a11 b21
a a12 c21
b b11 a21
b b12 c22
c c11 a22
c c12 b22
n
的解释是连接每对子图的边的个数。如何为任意g
,n
编写此代码?
请确认:两个v2值从一个原始组不应该移动到同一个新组?只有在组包含的行数小于组数
的情况下才有可能。LOOP over groups
IF Group contain more rows than one less than the number of groups
STOP no solution
ig = -1 // index of group being rewired
LOOP over groups
ig++
LOOP over rows in group
ng = -1 // index of destination group
LOOP over groups
ng++
IF ng == ig
CONTINUE
Move v2 in row to group ng
BREAK
感谢伪代码@ ravenpoint。我在这里包括R
代码解决方案:
data.gen = function(g,n){ # data generating
m = (length(g)-1)*n
gm = rep(g,each=m)
x = data.frame(g=gm,v1=paste0(gm,1,seq(m)),v2=paste0(gm,2,seq(m)))
}
data.rewire = function(x,n){ # solution
# could technically compute n from nrow(x)
v2.old = lapply(split(x,x$g),function(xg){ as.character(xg$v2) })
v2.new = lapply(v2.old,function(.){ NULL })
for (g.new in names(v2.new)){
for (g.old in names(v2.old)){
if (g.new != g.old){
v2.new[[g.new]] = c(v2.new[[g.new]],v2.old[[g.old]][seq(n)])
v2.old[[g.old]] = v2.old[[g.old]][-seq(n)]
}
}
}
x$v2 = do.call(c,v2.new)
return(x)
}
# testing
g = c('a','b','c')
n = 1
x.old = data.gen(g,n)
x.new = data.rewire(x.old,n)
# print(x.old)
print(x.new)
收益率:
g v1 v2
1 a a11 b21
2 a a12 c21
3 b b11 a21
4 b b12 c22
5 c c11 a22
6 c c12 b22
注意:这将按照它们出现的顺序使用边,例如,b
组的第一个n
元素被移动到a
组,等等。要获得随机顺序,您可以简单地在as.character
周围添加sample()
。