我一直在努力解决教科书中的一个练习,在这个练习中,我面临着在工业过程的连续阶段之间统计不同事件的挑战。
与过程相关的信息:让受试者经历三个阶段的过程,分别为A、B和C阶段,第一个阶段是A,第二个阶段是B,最后是C;受试者可以在a或B阶段放弃该过程,然后从a点重新开始,每次该过程发生时,都会创建一个数据集,其中包含受试者的标识、该阶段发生的时间戳和唯一的VISIT_CODE在任何阶段,受试者都可能触发";ALERT";这将与TIMESTAMP、ALERT_CODE和受试者标识一起记录。
要计算的内容:我必须在R中创建一个代码,以计算受试者在a和B阶段之间、B和C阶段之间生成的警报数量,以及最后在C阶段之后生成的警报的数量。请注意,受试者可能会在某个时候放弃该过程,稍后从a点重新开始。
这本教科书给出了一个提示:"仔细观察受试者当前所处的阶段,然后确定ALERT是否是从阶段a生成并在阶段B之前生成的,但要记住,如果测试对象在阶段a放弃并触发ALERT,如果该ALERT的TIMESTAMP小于他们在阶段a的下一次尝试;
作为另一个提示,教科书揭示了C阶段后的ALERTS只有1,并且它是由测试对象W-6用ALTER_CODE AYUJ-3915716168触发的。数据集为:
阶段过程
TableA<-tribble(~STAGE, ~TEST_SUBJECT,~TIMESTAMP,~VISIT_CODE,
"A", "XYU-1", "10", "BKO",
"A", "XYU-1", "15", "JUJD",
"B", "XYU-1", "20", "DUDH",
"A", "FF-09", "25", "KSIWJD",
"B", "FF-09", "30", "AJAKAM",
"C", "FF-09", "35", "ZISKS",
"A", "UU-89", "40", "NNXJD",
"B", "UU-89", "45", "DDUWO",
"A", "I-44", "50", "JIWIW",
"A", "W-6", "55", "SHDN",
"B", "W-6", "60", "IWOLS",
"C", "W-6", "65", "JDDD",
"A", "U-90", "70", "DJDKSMS",
"B", "U-90", "75", "NDJSM",
"A", "T-87", "80", "DNDJDK")
警报数据集
TableB<-tribble(~TEST_SUBJECT,~TIMESTAMP,~ALERT_CODE,
"XYU-1", "11", "AYUJ-151571406",
"XYU-1", "12", "AYUJ-487008829",
"XYU-1", "28", "AYUJ-211990388",
"FF-09", "32", "AYUJ-4177221842",
"W-6", "56", "AYUJ-1300211351",
"W-6", "63", "AYUJ-3014305494",
"I-44", "67", "AYUJ-4454800551",
"U-90", "73", "AYUJ-1079921935",
"U-90", "76", "AYUJ-3348911727",
"U-90", "79", "AYUJ-2381219626",
"T-87", "82", "AYUJ-4778326278",
"W-6", "89", "AYUJ-3915716168")
解决方案:
教科书指出,这个问题的正确解决方案是:
阶段A&B包括来自在A阶段第n次尝试中放弃过程的受试者的警报 | 阶段B&C,包括在B阶段第n次尝试中放弃过程的受试者发出的警报 | C阶段后发出的警报 |
---|---|---|
AYUJ-151571406 | AYU J-211990388 | >td style="text align:central;">AYUJ-3915716168
这是一个data.table
通知,导致a-b-c之后的警报列表。。
library(data.table)
# Make tables data.table format
setDT(TableA)
setDT(TableB)
# set TiMESTAP to numeric
TableA[, TIMESTAMP := as.numeric(TIMESTAMP)]
TableB[, TIMESTAMP := as.numeric(TIMESTAMP)]
# Create data.table with Stage intervals by test subject
DT.interval <- TableA[, .(start = min(TIMESTAMP)), by = .(TEST_SUBJECT, STAGE)]
# Perform rolling join
TableB[, Stage := DT.interval[TableB,
STAGE,
on = .(TEST_SUBJECT, start = TIMESTAMP),
roll = Inf]][]
# Split alerts by stage
split(TableB[,3:4], by = "Stage")
# $A
# ALERT_CODE Stage
# 1: AYUJ-151571406 A
# 2: AYUJ-487008829 A
# 3: AYUJ-1300211351 A
# 4: AYUJ-4454800551 A
# 5: AYUJ-1079921935 A
# 6: AYUJ-4778326278 A
#
# $B
# ALERT_CODE Stage
# 1: AYUJ-211990388 B
# 2: AYUJ-4177221842 B
# 3: AYUJ-3014305494 B
# 4: AYUJ-3348911727 B
# 5: AYUJ-2381219626 B
#
# $C
# ALERT_CODE Stage
# 1: AYUJ-3915716168 C