R 中的数据操作:如果我> i-1,则开始新行



我有一个包含许多值的长(一行)数据文件。它需要被分解成多行。虽然我需要这样做的具体原因并不重要,但逻辑是列I应该始终大于列I +1。即沿行应该减少的值。

我能想到的最好的方法是用'if then'风格的函数将数据帧分解成多行:I-1,开始新的一行。如果是<I-1,将此值保留在行中。>

#Example data but with similar format to my real data
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1) 

我希望它最终看起来像这样。

3 2 1
2 1 
1
3 2 1

我不是很精通引用I在数据帧中的位置的函数以及这需要的数据操作类型。如有任何建议,不胜感激。

将向量分成组很简单,但是如何最终存储数据取决于您要对结果做什么。下面是分割数据的简单方法:

vect <- unname(unlist(df))    # Convert the data to a simple vector
cut <- which(diff(vect) >= 0) # Find the points for splitting the vector
grps <- rep(1:4, diff(c(0, cut, length(vect))))  # Define the groups created
groups <- split(vect, grps)   # Create a list containing the groups
groups
# $`1`
# [1] 3 2 1
# 
# $`2`
# [1] 2 1
# 
# $`3`
# [1] 1
# 
# $`4`
# [1] 3 2 1

数据帧和矩阵要求所有的列都是相同的长度,所以这些不是你可以用来保存结果的结构。将一个矩阵我们需要垫用缺失值:

maxno <- max(sapply(groups, length))  # How long is the longest run?
t(sapply(groups, function(x) c(x, rep(NA, maxno - length(x)))))
#   [,1] [,2] [,3]
# 1    3    2    1
# 2    2    1   NA
# 3    1   NA   NA
# 4    3    2    1

这是一个简洁的解决方案。如果这解决了你的问题,请告诉我:

library(tidyverse)
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1) 
df %>%
pivot_longer(cols = everything(), names_to = "vars") %>%
mutate(smaller_than_prev = value < lag(value) | is.na(lag(value)),
num_falses = cumsum(smaller_than_prev == FALSE)) %>%
group_by(num_falses) %>%
mutate(row_num = row_number()) %>%
pivot_wider(names_from = row_num, values_from = value, values_fill = NA, names_prefix = "var") %>%
fill(c(`var1`, `var2`, `var3`), .direction = "downup") %>%
slice_head(n = 1) %>%
ungroup() %>%
select(`var1`, `var2`, `var3`)

我们可以在cumsumdiffsplit是非负的,即i>i - 1.

x <- df[1, ] |> unname()
r <- split(x, cumsum(c(1, diff(x)) >= 0))
r
# $`1`
# X1 X2 X3 
#  3  2  1 
# 
# $`2`
# X4 X5 
#  2  1 
# 
# $`3`
# X6 
#  1 
# 
# $`4`
# X7 X8 X9 
#  3  2  1 

为了创建一个数据框架,我们协调lengths和rbind

do.call(rbind, lapply(r, `length<-`, max(lengths(r))))
#   X1 X2 X3
# 1  3  2  1
# 2  2  1 NA
# 3  1 NA NA
# 4  3  2  1

对于"small increes">,这也是开箱即用的。我在;I - 1±tol。, OP谈论,

set.seed(424643)
(x2 <- x + rnorm(length(x), 0, .02))
#        X1        X2        X3        X4        X5        X6        X7        X8        X9 
# 2.9989375 1.9675093 0.9695195 2.0286091 0.9860200 0.9867120 3.0126058 2.0082577 1.0027076 
split(x2, cumsum(c(1, diff(x2)) >= 0))
# $`1`
#        X1        X2        X3 
# 2.9989375 1.9675093 0.9695195 
# 
# $`2`
#       X4       X5 
# 2.028609 0.986020 
# 
# $`3`
#       X6 
# 0.986712 
# 
# $`4`
#       X7       X8       X9 
# 3.012606 2.008258 1.002708

,我们可以通过一个小的容差值来调整零比较,在本例中为-.02

set.seed(219291)
(x2 <- x + rnorm(length(x), 0, .02))
#        X1        X2        X3        X4        X5        X6        X7        X8        X9 
# 2.9866361 2.0236431 1.0053049 2.0061573 1.0348428 1.0008761 3.0145685 2.0016665 0.9719804 
split(x2, cumsum(c(1, diff(x2)) >= 0 + -.02))
# $`1`
#        X1        X2        X3 
# 3.0109922 2.0061321 0.9900378 
# 
# $`2`
#        X4        X5 
# 1.9728080 0.9973932 
# 
# $`3`
#        X6 
# 0.9829894 
# 
# $`4`
#       X7       X8       X9 
# 3.003697 1.997184 0.984649 

数据:

df <- structure(list(X1 = 3, X2 = 2, X3 = 1, X4 = 2, X5 = 1, X6 = 1, 
X7 = 3, X8 = 2, X9 = 1), row.names = c(NA, -1L), class = "data.frame")

相关内容

  • 没有找到相关文章

最新更新