我有一个包含许多值的长(一行)数据文件。它需要被分解成多行。虽然我需要这样做的具体原因并不重要,但逻辑是列I应该始终大于列I +1。即沿行应该减少的值。
我能想到的最好的方法是用'if then'风格的函数将数据帧分解成多行:I-1,开始新的一行。如果是<I-1,将此值保留在行中。>
#Example data but with similar format to my real data
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1)
我希望它最终看起来像这样。
3 2 1
2 1
1
3 2 1
我不是很精通引用I在数据帧中的位置的函数以及这需要的数据操作类型。如有任何建议,不胜感激。
将向量分成组很简单,但是如何最终存储数据取决于您要对结果做什么。下面是分割数据的简单方法:
vect <- unname(unlist(df)) # Convert the data to a simple vector
cut <- which(diff(vect) >= 0) # Find the points for splitting the vector
grps <- rep(1:4, diff(c(0, cut, length(vect)))) # Define the groups created
groups <- split(vect, grps) # Create a list containing the groups
groups
# $`1`
# [1] 3 2 1
#
# $`2`
# [1] 2 1
#
# $`3`
# [1] 1
#
# $`4`
# [1] 3 2 1
数据帧和矩阵要求所有的列都是相同的长度,所以这些不是你可以用来保存结果的结构。将一个矩阵我们需要垫用缺失值:
maxno <- max(sapply(groups, length)) # How long is the longest run?
t(sapply(groups, function(x) c(x, rep(NA, maxno - length(x)))))
# [,1] [,2] [,3]
# 1 3 2 1
# 2 2 1 NA
# 3 1 NA NA
# 4 3 2 1
这是一个简洁的解决方案。如果这解决了你的问题,请告诉我:
library(tidyverse)
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1)
df %>%
pivot_longer(cols = everything(), names_to = "vars") %>%
mutate(smaller_than_prev = value < lag(value) | is.na(lag(value)),
num_falses = cumsum(smaller_than_prev == FALSE)) %>%
group_by(num_falses) %>%
mutate(row_num = row_number()) %>%
pivot_wider(names_from = row_num, values_from = value, values_fill = NA, names_prefix = "var") %>%
fill(c(`var1`, `var2`, `var3`), .direction = "downup") %>%
slice_head(n = 1) %>%
ungroup() %>%
select(`var1`, `var2`, `var3`)
我们可以在cumsum
中diff
和split
是非负的,即i>i - 1.
x <- df[1, ] |> unname()
r <- split(x, cumsum(c(1, diff(x)) >= 0))
r
# $`1`
# X1 X2 X3
# 3 2 1
#
# $`2`
# X4 X5
# 2 1
#
# $`3`
# X6
# 1
#
# $`4`
# X7 X8 X9
# 3 2 1
为了创建一个数据框架,我们协调length
s和rbind
。
do.call(rbind, lapply(r, `length<-`, max(lengths(r))))
# X1 X2 X3
# 1 3 2 1
# 2 2 1 NA
# 3 1 NA NA
# 4 3 2 1
对于"small increes">,这也是开箱即用的。我在;I - 1±tol。, OP谈论,
set.seed(424643)
(x2 <- x + rnorm(length(x), 0, .02))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
# 2.9989375 1.9675093 0.9695195 2.0286091 0.9860200 0.9867120 3.0126058 2.0082577 1.0027076
split(x2, cumsum(c(1, diff(x2)) >= 0))
# $`1`
# X1 X2 X3
# 2.9989375 1.9675093 0.9695195
#
# $`2`
# X4 X5
# 2.028609 0.986020
#
# $`3`
# X6
# 0.986712
#
# $`4`
# X7 X8 X9
# 3.012606 2.008258 1.002708
,我们可以通过一个小的容差值来调整零比较,在本例中为-.02
。
set.seed(219291)
(x2 <- x + rnorm(length(x), 0, .02))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
# 2.9866361 2.0236431 1.0053049 2.0061573 1.0348428 1.0008761 3.0145685 2.0016665 0.9719804
split(x2, cumsum(c(1, diff(x2)) >= 0 + -.02))
# $`1`
# X1 X2 X3
# 3.0109922 2.0061321 0.9900378
#
# $`2`
# X4 X5
# 1.9728080 0.9973932
#
# $`3`
# X6
# 0.9829894
#
# $`4`
# X7 X8 X9
# 3.003697 1.997184 0.984649
数据:
df <- structure(list(X1 = 3, X2 = 2, X3 = 1, X4 = 2, X5 = 1, X6 = 1,
X7 = 3, X8 = 2, X9 = 1), row.names = c(NA, -1L), class = "data.frame")