我在R 中有以下数据集
data <- structure(list(BatcBatchNo = structure(c(9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L), .Label = c("Batch18200616", "Batch18200702",
"Batch18200703", "Batch18200704", "Batch18200705", "Batch18200708", "Batch18200709",
"Batch18200710", "Batch18200711", "Batch20200712", "Batch20200715", "Batch21200701",
"Batch21200703", "Batch21200704", "Batch21200705", "Batch21200706", "Batch21200708",
"Batch21200709", "Batch22200630", "Batch22200701", "Batch22200702", "Batch22200707",
"Batch23200620", "Batch23200701", "Batch23200702", "Batch23200703", "Batch23200704",
"Batch23200706", "Batch24200717", "Batch25200707", "Batch54200711", "Batch55200705",
"Batch55200706", "Batch55200707", "Batch56200701", "Batch56200702", "Batch56200704",
"Batch56200705", "Batch56200709", "Batch56200710", "Batch57200701", "Batch57200702",
"Batch57200703", "Batch57200704", "Batch57200706", "Batch57200708", "Batch57200709",
"Batch57200710", "Batch57200711", "Batch57200712", "Batch57200714", "Batch57200717",
"Batch58200701", "Batch58200702", "Batch58200703", "Batch58200704", "Batch58200705",
"Batch58200708", "Batch58200710", "Batch58200712", "Batch58200713", "Batch59200622",
"Batch59200701", "Batch59200702", "Batch59200704", "Batch59200705", "Batch59200706",
"Batch59200707", "Batch59200708", "Batch59200709", "Batch60200618", "Batch60200702",
"Batch60200705", "Batch60200708"), class = "factor"), SetValue = c(690,
690, 690, 690, 690, 690, 690, 690, 690, 690), ActualValue = c(705,
706, 706, 705, 705, 704, 704, 704, 705, 705), ONCondition = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), row.names = c(NA,
10L), class = "data.frame")
> data
BatcBatchNo SetValue ActualValue ONCondition
1 Batch18200711 690 705 TRUE
2 Batch18200711 690 706 TRUE
3 Batch18200711 690 706 TRUE
4 Batch18200711 690 705 TRUE
5 Batch18200711 690 705 TRUE
6 Batch18200711 690 704 TRUE
7 Batch18200711 690 704 TRUE
8 Batch18200711 690 704 TRUE
9 Batch18200711 690 705 TRUE
10 Batch18200711 690 705 TRUE
我需要计算每个批次的标准偏差&设置值。但在计算其标准偏差之前,我需要删除该批次中的异常值。
意味着我需要执行以下步骤
- 删除每个批次中实际值中的异常值。要逐批而不是在整个数据集上计算的异常值
- 对批次n执行标准偏差设置组合值
我试图使用dplyr
函数来计算标准偏差,但没有考虑异常值。
此代码不处理异常值
Output= Data%>%
group_by(BatchNo)%>%
group_by(SetValue)%>%
summarize(Mean= mean(ActualValue),SD= sd(ActualValue))
在这种情况下我该如何处理。
您可以使用filter
来移除"异常值与评论中提到的逻辑一致:
Data%>%
group_by(BatchNo) %>%
filter(ActualValue <= quantile(ActualValue, 0.99), ActualValue >= quantile(ActualValue, 0.01)) %>%
group_by(BatchNo, SetValue) %>%
summarize(Mean = mean(ActualValue), SD = sd(ActualValue))