r-将大数据表中的列提取到小数据表中,并保存在列表中



我从外部服务器获得一个数据表(不同产品的时间序列取决于日期(,它可以具有以下最大列数(日期始终是第一列,所有其他列可以存在也可以不存在,或者只有两个额外的列,或者其他什么(:

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'DEB Cal-2019' = rnorm(365, 2, 1), 'DEB Cal-2021' = rnorm(365, 2, 1),
'DEB Cal-2022' = rnorm(365, 2, 1), 'DEB Cal-2023' = rnorm(365, 2, 1),
'ATB Cal-2019' = rnorm(365, 2, 1), 'ATB Cal-2021' = rnorm(365, 2, 1),
'ATB Cal-2022' = rnorm(365, 2, 1), 'ATB Cal-2023' = rnorm(365, 2, 1),
'TTF Cal-2019' = rnorm(365, 2, 1), 'TTF Cal-2021' = rnorm(365, 2, 1),
'TTF Cal-2022' = rnorm(365, 2, 1), 'TTF Cal-2023' = rnorm(365, 2, 1),
'NCG Cal-2019' = rnorm(365, 2, 1), 'NCG Cal-2021' = rnorm(365, 2, 1),
'NCG Cal-2022' = rnorm(365, 2, 1), 'NCG Cal-2023' = rnorm(365, 2, 1),
'AUTVTP Cal-2019' = rnorm(365, 2, 1), 'AUTVTP Cal-2021' = rnorm(365, 2, 1),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), 'AUTVTP Cal-2023' = rnorm(365, 2, 1),
'ATW Cal-2019' = rnorm(365, 2, 1), 'ATW Cal-2021' = rnorm(365, 2, 1),
'ATW Cal-2022' = rnorm(365, 2, 1), 'ATW Cal-2023' = rnorm(365, 2, 1),
'BRN Cal-2019' = rnorm(365, 2, 1), 'BRN Cal-2021' = rnorm(365, 2, 1),
'BRN Cal-2022' = rnorm(365, 2, 1), 'BRN Cal-2023' = rnorm(365, 2, 1),
'FEUA MDEC1' = rnorm(365, 2, 1),
check.names = FALSE)

现在,我想保存/提取每个出现的列及其自己的数据表中的日期列。理想情况下,所有提取的数据表都会添加到列表中。我知道我应该用for循环来做这件事,但我解决不了

在我收到每个产品的单独数据表后,我必须对每个数据表执行以下操作(这里为AUTVTP Cal-2022使用了一个示例数据表(:

DT <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), check.names = FALSE)

DT <- DT %>%
mutate(month = format(date, '%b'), 
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = 'AUTVTP Cal-2022') %>%
relocate(`01`, .after = month)
## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
## Round all values of the data table to 2 digits: ##
DT <- DT %>% mutate_if(is.numeric, round, 2)

我该怎么做?

将形状重新设置为长格式,然后拆分。

split(
melt(dt.data, id.vars = "date"),
by = "variable", keep.by = FALSE)

然后,您可以使用lapply对列表进行迭代,并执行tidyverse代码所做的任何操作。

然而,一般来说,你不应该拆分data.table。它效率低下,而且通常没有必要。

编辑:

我建议你忘记分手。将代码包装成这样的函数:

foo <- function(DT, colname) {
DT <- DT[, c("date", colname), with = FALSE]
DT <- DT %>%
mutate(month = format(date, '%b'), 
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = colname) %>%
relocate(`01`, .after = month)

## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]

## Round all values of the data table to 2 digits: ##
DT %>% mutate_if(is.numeric, round, 2)
}

然后,当你在闪亮的应用程序中需要特定列的表时,你可以简单地调用这个函数:

foo(dt.data, 'DEB Cal-2019')

如果你坚持预先计算列表:

lapply(names(dt.data)[names(dt.data) != "date"], 
foo, DT = dt.data)

使用每个列表的第一列split.defaultcbind创建一个数据帧列表。

lapply(split.default(dt.data[, -1], names(dt.data[, -1])), cbind, dt.data[, 1])

最新更新