如何通过行有条件地比较数据行,并输出不同的结果与其他列?
请参阅下面的dataset
,
第1行的den
是1,开始将每行的重量与第1行的重量进行比较,将每行的体积与第1行1的音量进行比较。
首先检查某些行的Weight
是否高于第1行的重量,第1行的higher
列将变为1,否则请检查某些行的Volume
是否低于第1行1.0的Volume
第1行的下部将变为1。
直到满足这些条件中的任何一个,请继续比较下一行和下一行。...如果与第2行相遇的条件中的任何一个,则将其转到第3行,如果两个条件都与行相满足3,转到第4行,行逐行.....
满足条件之一(第1行1 == 1的higher
或lower
列之一(,请继续进入下一行,在这种情况下为第3行的den==1
。然后第6行。
howhigh
列是在第1行1的higher == 1
时记录第1行Weight
的Weight
之间的差异。between
列是要记录满足条件的行差(例如:在Expected Outcome
中,第1行between
为5,因为与第6行满足了条件,因此6 - 1 = 5
,第3行的between
是3,因为条件与行相满足。6因此6 - 3 = 3
(
然后dataset
将变成类似Expected Outcome
例如,从Expected Outcome
中以14行,higher==1
,因为第18行的Weight
较高。howhigh
是0.0649
,因为第14行的Weight
是0.0649
,between
是4
,因为18-14=4
如何实现这一增加的计算速度的矢量化方法?提前致谢。
数据集
Weight Volume den higher lower between howhigh
1 5.1626 5.1594 1 0 0 0 0
2 5.1615 5.1559 0 0 0 0 0
3 5.1600 5.1574 1 0 0 0 0
4 5.1593 5.1582 0 0 0 0 0
5 5.1592 5.1572 0 0 0 0 0
6 5.1635 5.1580 1 0 0 0 0
7 5.1608 5.1580 0 0 0 0 0
8 5.1602 4.0565 0 0 0 0 0
9 5.1582 5.1554 0 0 0 0 0
10 5.1563 5.1547 0 0 0 0 0
11 5.1578 5.1550 1 0 0 0 0
12 5.1589 5.1560 0 0 0 0 0
13 5.1578 3.1553 0 0 0 0 0
14 5.1591 5.1554 1 0 0 0 0
15 5.1585 5.1563 0 0 0 0 0
16 5.1572 5.1557 0 0 0 0 0
17 5.1565 5.1520 0 0 0 0 0
18 5.2240 5.1518 0 0 0 0 0
19 5.1540 5.1505 1 0 0 0 0
20 5.1539 5.1488 0 0 0 0 0
21 5.1520 5.1408 0 0 0 0 0
22 5.1450 5.1420 0 0 0 0 0
23 5.1455 5.1420 0 0 0 0 0
24 5.1461 5.1435 0 0 0 0 0
25 5.1470 5.1437 0 0 0 0 0
26 5.1449 5.1378 0 0 0 0 0
27 5.1423 5.1385 0 0 0 0 0
28 6.1429 5.1401 0 0 0 0 0
29 5.1425 5.1399 0 0 0 0 0
30 5.1433 5.1403 1 0 0 0 0
预期结果
Weight Volume den higher lower between howhigh
1 5.1626 5.1594 1 1 0 5 0.0009
2 5.1615 5.1559 0 0 0 0 0
3 5.1600 5.1574 1 1 0 3 0.0035
4 5.1593 5.1582 0 0 0 0 0
5 5.1592 5.1572 0 0 0 0 0
6 5.1635 5.1580 1 0 1 2 0
7 5.1608 5.1580 0 0 0 0 0
8 5.1602 4.0565 0 0 0 0 0
9 5.1582 5.1554 0 0 0 0 0
10 5.1563 5.1547 0 0 0 0 0
11 5.1578 5.1550 1 0 1 2 0
12 5.1589 5.1560 0 0 0 0 0
13 5.1578 3.1553 0 0 0 0 0
14 5.1591 5.1554 1 1 0 4 0.0649
15 5.1585 5.1563 0 0 0 0 0
16 5.1572 5.1557 0 0 0 0 0
17 5.1565 5.1520 0 0 0 0 0
18 5.2240 5.1518 0 0 0 0 0
19 5.1540 5.1505 1 1 0 9 0.9889
20 5.1539 5.1488 0 0 0 0 0
21 5.1520 5.1408 0 0 0 0 0
22 5.1450 5.1420 0 0 0 0 0
23 5.1455 5.1420 0 0 0 0 0
24 5.1461 5.1435 0 0 0 0 0
25 5.1470 5.1437 0 0 0 0 0
26 5.1449 5.1378 0 0 0 0 0
27 5.1423 5.1385 0 0 0 0 0
28 6.1429 5.1401 0 0 0 0 0
29 5.1425 5.1399 0 0 0 0 0
30 5.1433 5.1403 1 0 0 0 0
我对此进行了刺伤。让我知道速度如何,因为它不是100%的矢量化解决方案。我花了一段时间才能理解您只想查看以下的行,如果音量较低,您的意思不是较低的1.0,而是等于或小于1.0。
。# Your data
dat <- structure(list(Weight = c(5.1626, 5.1615, 5.16, 5.1593, 5.1592, 5.1635, 5.1608, 5.1602, 5.1582, 5.1563, 5.1578, 5.1589, 5.1578, 5.1591, 5.1585, 5.1572, 5.1565, 5.224, 5.154, 5.1539, 5.152, 5.145, 5.1455, 5.1461, 5.147, 5.1449, 5.1423, 6.1429, 5.1425, 5.1433), Volume = c(5.1594, 5.1559, 5.1574, 5.1582, 5.1572, 5.158, 5.158, 4.0565, 5.1554, 5.1547, 5.155, 5.156, 3.1553, 5.1554, 5.1563, 5.1557, 5.152, 5.1518, 5.1505, 5.1488, 5.1408, 5.142, 5.142, 5.1435, 5.1437, 5.1378, 5.1385, 5.1401, 5.1399, 5.1403), den = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), higher = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), lower = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), between = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), howhigh = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Weight", "Volume", "den", "higher", "lower", "between", "howhigh"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))
我在data.frame上添加了一个rownumber。帧,以便更轻松地访问,然后我只需使用den == 1
的行才能创建一个新变量以循环循环。
dat$rownum <- 1:nrow(dat)
newd <- dat[dat$den == 1,]
# Weight Volume den higher lower between howhigh rownum
#1 5.1626 5.1594 1 1 0 0 0 1
#3 5.1600 5.1574 1 1 0 0 0 3
#6 5.1635 5.1580 1 1 0 0 0 6
#11 5.1578 5.1550 1 1 0 0 0 11
#14 5.1591 5.1554 1 1 0 0 0 14
#19 5.1540 5.1505 1 1 0 0 0 19
#30 5.1433 5.1403 1 1 0 0 0 30
功能:
out <- t(apply(newd, 1, function(d){
rownum <- d["rownum"]
a <- which(dat$Weight > d["Weight"])
a <- a[a > rownum][1]
b <- which((dat$Volume - d["Volume"]) <= -1.0)
b <- b[b > rownum][1]
pick <- ifelse(!is.na(b), ifelse(a < b, "a", "b"), "a")
if( pick == "a"){
d["higher"] <- 1
d["howhigh"] <- dat$Weight[a] - d["Weight"]
d["between"] <- a - rownum
} else {
d["lower"] <- 1
d["between"] <- b - rownum
}
d[is.na(d)] <- 0
d
}))
out
# Weight Volume den higher lower between howhigh rownum
#1 5.1626 5.1594 1 1 0 5 0.0009 1
#3 5.1600 5.1574 1 1 0 3 0.0035 3
#6 5.1635 5.1580 1 0 1 2 0.0000 6
#11 5.1578 5.1550 1 1 0 1 0.0011 11
#14 5.1591 5.1554 1 1 0 4 0.0649 14
#19 5.1540 5.1505 1 1 0 9 0.9889 19
#30 5.1433 5.1403 1 1 0 0 0.0000 30
dat[dat$den == 1,] <- out # replace old rows with new ones
dat[,-8] # remove the rownum column