循环(或lapply)数据文件的切片函数,并将每个新零件保存为R中的不同csv



我有一个csv。它包括每天每半小时的记录值。我想在半小时的块中slice它(用"系统睡眠"文本分隔),并将每个分隔的块保存为独立的.csv文件以供进一步分析。我当前的代码;

df <- read.csv("datafile",h=T,sep=",")

M <- which(startsWith(df$ID, "system sleep."))

M2 <- M[1]

df2 <- slice(df,c(1:M2-1))

write.csv(write_csv(df2, file = paste0("test", df2$Time[1], "-", ".csv")))

我做M2 <- M[1],这样我就可以针对第一个系统睡眠。我尝试过使用M2 <- M[i],但到目前为止还不起作用。我可以切片并保存第一部分,但我想循环它,这样它就可以继续进行剩下的部分。也许我可以有一个不同的方法,但这是我迄今为止发现的最好的方法。

关于文件的外观,一个经过修改和简化的示例是:

时间td>A1td>A1td>(23:39
ID 日期Rec
A1 2018/1/30 00:00 1 251
A1 2018/1/30 00:01 2 368
A1 2018/1/30 00:02 3 430
系统睡眠2018/1/3000:301195
A1 2018/1/30 00:31 2 876
A1 2018/1/30 00:32 3 864
系统睡眠2018/1/3001:001872
A1 2018/1/30 01:01 2 120
A1 2018/1/30 01:02 3 208
系统睡眠
A1 2018/1/3010002

您可以在数据集中添加一个块标识符列,而不是切片。之后,您可以将数据分割成块,并使用例如lapply将数据导出到单独的csvs:中

注意:不确定是否要保留";系统睡眠";一行在下面的代码中,我决定放弃它。

# Add block identifier
dat$block <- cumsum(grepl("^sys", dat$ID))
# Get rid of "sys sleep"
dat <- dat[!grepl("^sys", dat$ID),]
# Split into blocks
dat_split <- split(dat, dat$block)
# Export
path <- tempdir()
foo <- lapply(dat_split, function(x) write.csv(x, file = file.path(path, paste0("test", x$Time[[1]], "-", ".csv")), row.names = FALSE))
# Check
fns <- list.files(path = path, pattern = "\.csv", full.names = TRUE)
lapply(fns, read.csv)
#> [[1]]
#>   ID       Day  Time Rec value block
#> 1 A1 2018/1/30 00:00   1   251     0
#> 2 A1 2018/1/30 00:01   2   368     0
#> 3 A1 2018/1/30 00:02   3   430     0
#> 
#> [[2]]
#>   ID       Day  Time Rec value block
#> 1 A1 2018/1/30 00:30   1   195     1
#> 2 A1 2018/1/30 00:31   2   876     1
#> 3 A1 2018/1/30 00:32   3   864     1
#> 
#> [[3]]
#>   ID       Day  Time Rec value block
#> 1 A1 2018/1/30 01:00   1   872     2
#> 2 A1 2018/1/30 01:01   2   120     2
#> 3 A1 2018/1/30 01:02   3   208     2
#> 
#> [[4]]
#>   ID       Day  Time Rec value block
#> 1 A1 2018/1/30 23:39  10     2     3

数据

dat <- data.frame(
ID = c(
"A1", "A1", "A1",
"system sleep.", "A1", "A1", "A1",
"system sleep.", "A1", "A1", "A1", "system sleep.",
"A1"
),
Day = c(
"2018/1/30",
"2018/1/30", "2018/1/30", NA, "2018/1/30", "2018/1/30",
"2018/1/30", NA, "2018/1/30", "2018/1/30",
"2018/1/30", NA, "2018/1/30"
),
Time = c(
"00:00", "00:01",
"00:02", NA, "00:30", "00:31", "00:32", NA,
"01:00", "01:01", "01:02", NA, "23:39"
),
Rec = c(
"1", "2", "3", NA,
"1", "2", "3", NA, "1", "2", "3", NA,
"10"
),
value = c(
"251", "368",
"430", NA, "195", "876", "864", NA, "872", "120",
"208", NA, "002"
)
)

相关内容

最新更新