我有一个csv。它包括每天每半小时的记录值。我想在半小时的块中slice
它(用"系统睡眠"文本分隔),并将每个分隔的块保存为独立的.csv文件以供进一步分析。我当前的代码;
df <- read.csv("datafile",h=T,sep=",")
M <- which(startsWith(df$ID, "system sleep."))
M2 <- M[1]
df2 <- slice(df,c(1:M2-1))
write.csv(write_csv(df2, file = paste0("test", df2$Time[1], "-", ".csv")))
我做M2 <- M[1]
,这样我就可以针对第一个系统睡眠。我尝试过使用M2 <- M[i]
,但到目前为止还不起作用。我可以切片并保存第一部分,但我想循环它,这样它就可以继续进行剩下的部分。也许我可以有一个不同的方法,但这是我迄今为止发现的最好的方法。
关于文件的外观,一个经过修改和简化的示例是:
ID | 日期 | 时间Rec | 值 | ||
---|---|---|---|---|---|
A1 | 2018/1/30 | 00:00 | 1 | 251 | |
A1 | 2018/1/30 | 00:01 | 2 | 368 | |
A1 | 2018/1/30 | 00:02 | 3 | 430 | |
系统睡眠 | td>A12018/1/30 | 00:30 | 1 | 195 | |
A1 | 2018/1/30 | 00:31 | 2 | 876 | |
A1 | 2018/1/30 | 00:32 | 3 | 864 | |
系统睡眠 | td>A12018/1/30 | 01:00 | 1 | 872 | |
A1 | 2018/1/30 | 01:01 | 2 | 120 | |
A1 | 2018/1/30 | 01:02 | 3 | 208 | |
系统睡眠 | td>(|||||
A1 | 2018/1/30 | 23:3910 | 002 |
您可以在数据集中添加一个块标识符列,而不是切片。之后,您可以将数据分割成块,并使用例如lapply
将数据导出到单独的csvs:中
注意:不确定是否要保留";系统睡眠";一行在下面的代码中,我决定放弃它。
# Add block identifier
dat$block <- cumsum(grepl("^sys", dat$ID))
# Get rid of "sys sleep"
dat <- dat[!grepl("^sys", dat$ID),]
# Split into blocks
dat_split <- split(dat, dat$block)
# Export
path <- tempdir()
foo <- lapply(dat_split, function(x) write.csv(x, file = file.path(path, paste0("test", x$Time[[1]], "-", ".csv")), row.names = FALSE))
# Check
fns <- list.files(path = path, pattern = "\.csv", full.names = TRUE)
lapply(fns, read.csv)
#> [[1]]
#> ID Day Time Rec value block
#> 1 A1 2018/1/30 00:00 1 251 0
#> 2 A1 2018/1/30 00:01 2 368 0
#> 3 A1 2018/1/30 00:02 3 430 0
#>
#> [[2]]
#> ID Day Time Rec value block
#> 1 A1 2018/1/30 00:30 1 195 1
#> 2 A1 2018/1/30 00:31 2 876 1
#> 3 A1 2018/1/30 00:32 3 864 1
#>
#> [[3]]
#> ID Day Time Rec value block
#> 1 A1 2018/1/30 01:00 1 872 2
#> 2 A1 2018/1/30 01:01 2 120 2
#> 3 A1 2018/1/30 01:02 3 208 2
#>
#> [[4]]
#> ID Day Time Rec value block
#> 1 A1 2018/1/30 23:39 10 2 3
数据
dat <- data.frame(
ID = c(
"A1", "A1", "A1",
"system sleep.", "A1", "A1", "A1",
"system sleep.", "A1", "A1", "A1", "system sleep.",
"A1"
),
Day = c(
"2018/1/30",
"2018/1/30", "2018/1/30", NA, "2018/1/30", "2018/1/30",
"2018/1/30", NA, "2018/1/30", "2018/1/30",
"2018/1/30", NA, "2018/1/30"
),
Time = c(
"00:00", "00:01",
"00:02", NA, "00:30", "00:31", "00:32", NA,
"01:00", "01:01", "01:02", NA, "23:39"
),
Rec = c(
"1", "2", "3", NA,
"1", "2", "3", NA, "1", "2", "3", NA,
"10"
),
value = c(
"251", "368",
"430", NA, "195", "876", "864", NA, "872", "120",
"208", NA, "002"
)
)