假设我有这样的数据:
# Data frame
df <- data.frame(round = factor(c(rep(1,4),rep(2,3),rep(3,4),rep(4,2))),
value = c(100,150,200,250,200,160,900,500,600,900,1200,100,120),
SE = c(1.3,1.5,0.7,2,1,2,1,1,1,0.5,0.75,20,3))
df
round value SE
1 1 100 1.30
2 1 150 1.50
3 1 200 0.70
4 1 250 2.00
5 2 200 1.00
6 2 160 2.00
7 2 900 1.00
8 3 500 1.00
9 3 600 1.00
10 3 900 0.50
11 3 1200 0.75
12 4 100 20.00
13 4 120 3.00
- 我想在同一轮中获得值差小于20%的2行或更多行(例如在第1轮中:所有行都将被排除,第2轮中:值=900的行将被排除;在第3轮中:将值=900和1200的行排除(
到目前为止我尝试过的是:
library(dplyr)
df %>%
group_by(Round) %>%
mutate(medians = median(value),
deviation = abs(value - medians) * 100 / medians) %>%
mutate(rowcounts = n()) %>%
mutate(passORfailed = ifelse(
rowcounts == 2,
ifelse((max(value) - min(value)) * 100 / max(value) > 20, "failed", "pass"),
ifelse(deviation > 20, "failed", "pass"))) %>%
filter(passORfailed != "failed") %>%
filter(sum(rowcounts) != 1)
结果:
# A tibble: 8 x 7
# Groups: round [4]
round value SE medians deviation rowcounts passORfailed
<fct> <dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 150 1.5 175 14.3 4 pass # -> not right
2 1 200 0.7 175 14.3 4 pass # -> not right
3 2 200 1 200 0 3 pass # -> ok
4 2 160 2 200 20 3 pass # -> ok
5 3 600 1 750 20 4 pass # -> not right (500 was excluded)
6 3 900 0.5 750 20 4 pass # -> not right
7 4 100 20 110 9.09 2 pass # -> ok
8 4 120 3 110 9.09 2 pass # -> ok
正如您所看到的,当行计数为偶数并且>3、事情变得疯狂。问题是,当我使用中位数时,为标准计算的实际值是一半(由于两个中心值之间的平均值(。是否有任何方法可以调整代码,使其在所有情况下都成为可能?
- 如果可能,我如何调整代码以在值+-SE的范围内计算此数据
如果问题不清楚,我道歉,但我已经尽力解释了。问候
这里有一种方法,我们在一轮中生成每一个可能的对,然后只过滤彼此相差20%以内的行。它的逻辑与您的略有不同,因此匹配次数较少,但如果您使用不同的阈值,如+/-35%,而不是+/-20%,则作为一种替代方法可能会很有用。
df <- df %>% mutate(row = row_number())
df %>%
left_join(df, by = "round") %>%
mutate(ratio = value.x / value.y) %>%
filter(row.x != row.y,
ratio %>% between(1/1.2, 1.2))
这里有一个变体,用于解决问题的第二部分。我计算每一个的值+/-SE,并找到每一轮中重叠的行对。
df <- df %>%
mutate(row = row_number()) %>%
mutate(low = value - SE,
high = value + SE)
df %>%
left_join(df, by = "round") %>%
filter(row.x != row.y,
(high.x >= low.y & high.x <= high.y) | (low.x >= low.y & low.x <= high.y))
round value.x SE.x row.x low.x high.x value.y SE.y row.y low.y high.y
1 4 100 20 12 80 120 120 3 13 117 123
2 4 120 3 13 117 123 100 20 12 80 120