r语言 - 如何避免' for '循环操作列表



我在R中遇到了一个问题,因为我需要在列表上做一些操作,并使用列表中的这些值创建一个新的数据框架。如果我使用for循环,它会花费很长时间。不知道如何避免for循环,如何使用&;if + case_when&;没有for循环

在下面的代码中,有注释来解释我做了什么和发生了什么。

非常感谢!

#search in all rows of list "total"
for(i in 1:nrow(total)) {
#Take with total$Cad[[i]] a value from another list
val1 <- posdi[posdi$cad == str_to_upper(total$Cad[[i]]),]
#Check if "font" value from val1 is equal to "Taake" and take the value
val2 <- val1[val1$font == "Taake",]
#Format date value
thedate <- as.numeric(format(as.Date(total$TheDate[[i]], format="%Y-%m-%d"), '%Y%m%d'))
#And here comes where I can't continue easily. I want to do an IF and make a different 
#case_when if the result is between 1 and 5 or between 6 and 7
if(total$dia[[i]] >= 1 & total$dia[[i]] <= 5) {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 1,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 2,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 3,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 4)
} else {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 5,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 6,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 7,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 8)
}

#and finally, add that "fran" value, those three from the beggining and some from total list to a new dataframe
datosTel[nrow(datosTel) + 1,] = c(val2$cad, str_to_upper(total$Camp[[i]]), total$numsem[[i]], thedate, total$diasem[[i]], fran, 0)
}
#It works with the "for" loop, but it take so much time (it goes one by one and the list has more than 200K rows).
#How can I do it without that for loop and make the "if + case_when" correctly?

再次感谢你,祝你有美好的一天

如前所述,我的问题是FOR循环以及FOR中的IF和CASE_WHEN,因为如果没有循环,我不知道该怎么做

循环内的代码只触及当前元素([[i]]),并且您正在执行的所有操作默认情况下都是矢量化的(除了if,但我们可以直接用if_else替换它)。

因此,您可以用mutatetransmute语句替换整个循环(它们做同样的事情,transmute只是不保留现有的列,因此在您的情况下似乎更合适)。

此外,您可以通过合并两个分支并添加依赖于total$dia的偏移量来简化if

最后,您的case_when表达式恰好可以表示为findInterval表达式。

在下面我假设datosTel在循环之前是一个空表,并且我还对您可能需要调整的列名做了一些假设。

datosTel = total %>%
transmute(
cad = posdi$cad[posdi$cad == str_to_upper(Cad) & posdi$font == "Taake"],
Camp = str_to_upper(Camp),
numsem = numsem,
thedate = as.numeric(format(as.Date(TheDate, format="%Y-%m-%d"), '%Y%m%d')),
diasem = diasem,
offset = if_else(dia >= 1 & dia <= 5, 0, 4),
fran = offset + findInterval(secs, c(0, 1.5, 4, 8, 10, Inf)),
LAST_COLUMN = 0
) %>%
select(-offset)

(将LAST_COLUMN替换为实际的列名)

findInterval调用相当于:

case_when(
secs >= 0 & secs < 1.5 ~ 1,
secs >= 1.5 & secs < 4 ~ 2,
secs >= 4 & secs < 8 ~ 3,
secs >= 8 & secs < 10 ~ 4
)

最新更新