r语言 - 按行排序日期



我有以下数据集:

my_data <- structure(list(id = 1:5, d1 = structure(c(16764, 11375, 13873, 
10665, 14395), class = "Date"), d2 = structure(c(14487, 14709, 
11417, 13531, 13457), class = "Date"), d3 = structure(c(15706, 
13542, 16722, 17001, 12621), class = "Date"), d4 = structure(c(14628, 
11210, 14134, 14597, 15119), class = "Date"), d5 = structure(c(10664, 
18150, 15536, 16109, 12236), class = "Date")), row.names = c(NA, 
5L), class = "data.frame")
id         d1         d2         d3         d4         d5
1  1 2015-11-25 2009-08-31 2013-01-01 2010-01-19 1999-03-14
2  2 2001-02-22 2010-04-10 2007-01-29 2000-09-10 2019-09-11
3  3 2007-12-26 2001-04-05 2015-10-14 2008-09-12 2012-07-15
4  4 1999-03-15 2007-01-18 2016-07-19 2009-12-19 2014-02-08
5  5 2009-05-31 2006-11-05 2004-07-22 2011-05-25 2003-07-03
  • 对于每一行,我希望分配给d1的5个日期中最小的一个,分配给d2的5个日期中第二小的一个,分配给d3的5个日期中第三小的一个,分配给d4的5个日期中第四小的一个,分配给d5的5个日期中最小的一个。

这看起来像这样(例如第一行-重复其余99行):

id         d1         d2         d3         d4         d5
1  1 1999-03-14 2009-08-31 2010-01-19  2013-01-01 2015-11-25

我试着在Stackoverflow上寻找类似的例子,但到目前为止,我找不到任何与我想要完成的内容相匹配的内容(例如对数据帧的每一行进行排序)。

谁能告诉我怎么做这个?

谢谢!

注意:最近的例子我得到的工作(不涉及日期):

a = rnorm(100,100,100)
b = rnorm(100,100, 100)
c = rnorm(100,100,100)
d = rnorm(100,100,100)
e = rnorm(100,100,100)
df = data.frame(a,b,c,d,e)
head(df)
a          b         c         d         e
1  56.48320 -55.274406  83.32993 -96.55970 181.65859
2  94.17800  88.084876  31.58830 -44.06487 156.85318
3 257.56950  28.500542  34.76845  57.00857  51.76036
4 100.22489   6.946803  60.88848 116.18413  37.34444
5  21.77935  -7.538119  89.29565 -67.28311  43.98728
6 200.18950  -1.555829 123.91148 106.45983 107.50339
# now sort them (check if for each row, numbers increase from left to right)
n = data.frame(t(apply(df, 1, sort)))
X1         X2        X3        X4        X5
1 -96.559698 -55.274406  56.48320  83.32993 181.65859
2 -44.064873  31.588295  88.08488  94.17800 156.85318
3  28.500542  34.768449  51.76036  57.00857 257.56950
4   6.946803  37.344439  60.88848 100.22489 116.18413
5 -67.283113  -7.538119  21.77935  43.98728  89.29565
6  -1.555829 106.459828 107.50339 123.91148 200.18950

apply方法也适用于日期。它们只是被强迫到一个字符矩阵,但是我们可以强迫as.data.framelapplyas.Date在它上面。

my_data[-1] <- as.data.frame(t(apply(my_data[-1], 1, sort))) |> lapply(as.Date)

my_data
#   id         d1         d2         d3         d4         d5
# 1  1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
# 2  2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
# 3  3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
# 4  4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
# 5  5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

,

str(my_data)
# 'data.frame': 5 obs. of  6 variables:
# $ id: int  1 2 3 4 5
# $ d1: Date, format: "1999-03-14" "2000-09-10" "2001-04-05" "1999-03-15" ...
# $ d2: Date, format: "2009-08-31" "2001-02-22" "2007-12-26" "2007-01-18" ...
# $ d3: Date, format: "2010-01-19" "2007-01-29" "2008-09-12" "2009-12-19" ...
# $ d4: Date, format: "2013-01-01" "2010-04-10" "2012-07-15" "2014-02-08" ...
# $ d5: Date, format: "2015-11-25" "2019-09-11" "2015-10-14" "2016-07-19" ...

Base R:

在这里,我们对my_data的所有列进行排序后,将my_data转置,但使用apply函数对第一列进行排序。最后我们用cbind(my_data[1], ...

添加第一列
my_data_changed <- cbind(my_data[1], t(apply(my_data[-1], 1, sort)))
my_data_changed <- setnames(my_data_changed, colnames(my_data))
id         d1         d2         d3         d4         d5
1  1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
2  2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
3  3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
4  4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
5  5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

我们可以先使用pivot_longer,然后使用sort,再使用pivot_wider

技巧:mutate(value = sort(value)只改变value

library(dplyr)
library(tidyr)
my_data %>% 
pivot_longer(
-id
) %>% 
group_by(id) %>% 
mutate(value = sort(value)) %>% 
pivot_wider()
id d1         d2         d3         d4         d5        
<int> <date>     <date>     <date>     <date>     <date>    
1     1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
2     2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
3     3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
4     4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
5     5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

最新更新