R-如何通过行有条件地比较数据行,并将不同的结果输出到其他列



如何通过行有条件地比较数据行,并输出不同的结果与其他列?

请参阅下面的dataset

第1行的den是1,开始将每行的重量与第1行的重量进行比较,将每行的体积与第1行1的音量进行比较。

首先检查某些行的Weight是否高于第1行的重量,第1行的higher列将变为1,否则请检查某些行的Volume是否低于第1行1.0的Volume第1行的下部将变为1。

直到满足这些条件中的任何一个,请继续比较下一行和下一行。...如果与第2行相遇的条件中的任何一个,则将其转到第3行,如果两个条件都与行相满足3,转到第4行,行逐行.....

满足条件之一(第1行1 == 1的higherlower列之一(,请继续进入下一行,在这种情况下为第3行的den==1。然后第6行。

howhigh列是在第1行1的higher == 1时记录第1行WeightWeight之间的差异。between列是要记录满足条件的行差(例如:在Expected Outcome中,第1行between为5,因为与第6行满足了条件,因此6 - 1 = 5,第3行的between是3,因为条件与行相满足。6因此6 - 3 = 3(

然后dataset将变成类似Expected Outcome

的东西

例如,从Expected Outcome中以14行,higher==1,因为第18行的Weight较高。howhigh0.0649,因为第14行的Weight0.0649between4,因为18-14=4

如何实现这一增加的计算速度的矢量化方法?提前致谢。

数据集

   Weight Volume den higher lower between howhigh
1  5.1626 5.1594   1      0     0       0       0
2  5.1615 5.1559   0      0     0       0       0
3  5.1600 5.1574   1      0     0       0       0
4  5.1593 5.1582   0      0     0       0       0
5  5.1592 5.1572   0      0     0       0       0
6  5.1635 5.1580   1      0     0       0       0
7  5.1608 5.1580   0      0     0       0       0
8  5.1602 4.0565   0      0     0       0       0
9  5.1582 5.1554   0      0     0       0       0
10 5.1563 5.1547   0      0     0       0       0
11 5.1578 5.1550   1      0     0       0       0
12 5.1589 5.1560   0      0     0       0       0
13 5.1578 3.1553   0      0     0       0       0
14 5.1591 5.1554   1      0     0       0       0
15 5.1585 5.1563   0      0     0       0       0
16 5.1572 5.1557   0      0     0       0       0
17 5.1565 5.1520   0      0     0       0       0
18 5.2240 5.1518   0      0     0       0       0
19 5.1540 5.1505   1      0     0       0       0
20 5.1539 5.1488   0      0     0       0       0
21 5.1520 5.1408   0      0     0       0       0
22 5.1450 5.1420   0      0     0       0       0
23 5.1455 5.1420   0      0     0       0       0
24 5.1461 5.1435   0      0     0       0       0
25 5.1470 5.1437   0      0     0       0       0
26 5.1449 5.1378   0      0     0       0       0
27 5.1423 5.1385   0      0     0       0       0
28 6.1429 5.1401   0      0     0       0       0
29 5.1425 5.1399   0      0     0       0       0
30 5.1433 5.1403   1      0     0       0       0

预期结果

   Weight Volume den higher lower between howhigh
1  5.1626 5.1594   1      1     0       5  0.0009
2  5.1615 5.1559   0      0     0       0       0
3  5.1600 5.1574   1      1     0       3  0.0035    
4  5.1593 5.1582   0      0     0       0       0
5  5.1592 5.1572   0      0     0       0       0
6  5.1635 5.1580   1      0     1       2       0
7  5.1608 5.1580   0      0     0       0       0
8  5.1602 4.0565   0      0     0       0       0
9  5.1582 5.1554   0      0     0       0       0
10 5.1563 5.1547   0      0     0       0       0
11 5.1578 5.1550   1      0     1       2       0
12 5.1589 5.1560   0      0     0       0       0
13 5.1578 3.1553   0      0     0       0       0
14 5.1591 5.1554   1      1     0       4  0.0649
15 5.1585 5.1563   0      0     0       0       0
16 5.1572 5.1557   0      0     0       0       0
17 5.1565 5.1520   0      0     0       0       0
18 5.2240 5.1518   0      0     0       0       0
19 5.1540 5.1505   1      1     0       9  0.9889
20 5.1539 5.1488   0      0     0       0       0
21 5.1520 5.1408   0      0     0       0       0
22 5.1450 5.1420   0      0     0       0       0
23 5.1455 5.1420   0      0     0       0       0
24 5.1461 5.1435   0      0     0       0       0
25 5.1470 5.1437   0      0     0       0       0
26 5.1449 5.1378   0      0     0       0       0
27 5.1423 5.1385   0      0     0       0       0
28 6.1429 5.1401   0      0     0       0       0
29 5.1425 5.1399   0      0     0       0       0
30 5.1433 5.1403   1      0     0       0       0

我对此进行了刺伤。让我知道速度如何,因为它不是100%的矢量化解决方案。我花了一段时间才能理解您只想查看以下的行,如果音量较低,您的意思不是较低的1.0,而是等于或小于1.0。

# Your data
dat <- structure(list(Weight = c(5.1626, 5.1615, 5.16, 5.1593, 5.1592, 5.1635, 5.1608, 5.1602, 5.1582, 5.1563, 5.1578, 5.1589, 5.1578, 5.1591, 5.1585, 5.1572, 5.1565, 5.224, 5.154, 5.1539, 5.152, 5.145, 5.1455, 5.1461, 5.147, 5.1449, 5.1423, 6.1429, 5.1425, 5.1433), Volume = c(5.1594, 5.1559, 5.1574, 5.1582, 5.1572, 5.158, 5.158, 4.0565, 5.1554, 5.1547, 5.155, 5.156, 3.1553, 5.1554, 5.1563, 5.1557, 5.152, 5.1518, 5.1505, 5.1488, 5.1408, 5.142, 5.142, 5.1435, 5.1437, 5.1378, 5.1385, 5.1401, 5.1399, 5.1403), den = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), higher = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), lower = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), between = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), howhigh = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Weight", "Volume", "den", "higher", "lower", "between", "howhigh"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))

我在data.frame上添加了一个rownumber。帧,以便更轻松地访问,然后我只需使用den == 1的行才能创建一个新变量以循环循环。

dat$rownum <- 1:nrow(dat)
newd <- dat[dat$den == 1,]
#   Weight Volume den higher lower between howhigh rownum
#1  5.1626 5.1594   1      1     0       0       0      1
#3  5.1600 5.1574   1      1     0       0       0      3
#6  5.1635 5.1580   1      1     0       0       0      6
#11 5.1578 5.1550   1      1     0       0       0     11
#14 5.1591 5.1554   1      1     0       0       0     14
#19 5.1540 5.1505   1      1     0       0       0     19
#30 5.1433 5.1403   1      1     0       0       0     30

功能:

out <- t(apply(newd, 1, function(d){
  rownum <- d["rownum"]
  a <- which(dat$Weight > d["Weight"])
  a <- a[a > rownum][1]
  b <- which((dat$Volume - d["Volume"]) <= -1.0)
  b <- b[b > rownum][1]
  pick <- ifelse(!is.na(b), ifelse(a < b, "a", "b"), "a")
  if( pick == "a"){
    d["higher"] <- 1
    d["howhigh"] <- dat$Weight[a] - d["Weight"]
    d["between"] <- a - rownum
  } else {
    d["lower"] <- 1
    d["between"] <- b - rownum
  }
  d[is.na(d)] <- 0
  d
}))
out
#   Weight Volume den higher lower between howhigh rownum
#1  5.1626 5.1594   1      1     0       5  0.0009      1
#3  5.1600 5.1574   1      1     0       3  0.0035      3
#6  5.1635 5.1580   1      0     1       2  0.0000      6
#11 5.1578 5.1550   1      1     0       1  0.0011     11
#14 5.1591 5.1554   1      1     0       4  0.0649     14
#19 5.1540 5.1505   1      1     0       9  0.9889     19
#30 5.1433 5.1403   1      1     0       0  0.0000     30
dat[dat$den == 1,] <- out # replace old rows with new ones
dat[,-8] # remove the rownum column

最新更新