我有一个包含多行和多列的dataframe
,我想检查每一行的行值是否高于该行最后一列的值。我的数据如下:
date ABI.BR NOVN.S ROG.S NESN.S NOVOb.CO HRMS.PA TTEF.PA OREP.PA SASY.PA LVMH.PA MAX
1 31.12.2001 7342987743 21072681660 19737660401 57324018814 3199830778 1226900000 105895000000 13740400000 6.4880e+09 1.2229e+10 163033294916
2 31.12.2002 6280730128 17748921146 20316428627 61501808862 3350848339 1242300000 92108991318 14288000000 7.4480e+09 1.2693e+10 144604417185
3 31.12.2003 6324404232 19755193920 20002601887 56367998444 3510991763 1230000000 93961037997 14029100000 8.0480e+09 1.1962e+10 122995279989
4 31.12.2004 7850834622 20947725570 19134043533 54889985325 3905732023 1331400000 87346032957 13641300000 1.5733e+10 1.2481e+10 123515812436
5 31.12.2005 12233279545 26451714210 22833829836 58587603997 4527870421 1427400000 122854000000 14532500000 2.8513e+10 1.3910e+10 171566971202
6 31.12.2006 12660668341 26602920050 26148301223 61238063838 5198638861 1514900000 126235000000 15790100000 2.9489e+10 1.5306e+10 173975809100
我现在的目标是检查列2:10
的每一行中的值是否高于列11
(即MAX
(中的相应值。如果一行中某列的值高于MAX
列的值,我希望用MAX
列的值覆盖该列的实际值。
例如,如果我有以下数据:
a b c MAX
2 3 4 3
4 5 6 6
6 7 9 8
我想在校正后的数据中有以下输出
a b c MAX
2 3 **3** 3
4 5 6 6
6 7 **8** 8
非常感谢你的帮助!
使用您的简短示例和库dplyr
,您可以执行:
Reprex
- 数据
df <- structure(list(a = c(2L, 4L, 6L), b = c(3L, 5L, 7L), c = c(4L,
6L, 9L), MAX = c(3L, 6L, 8L)), class = "data.frame", row.names = c(NA,
-3L))
- 代码
library(dplyr)
df %>% transmute(across(.cols = 1:3, ~ ifelse(.x > MAX, MAX, .x)))
- 输出
#> a b c
#> 1 2 3 3
#> 2 4 5 6
#> 3 6 7 8
创建于2022-03-31由reprex包(v2.0.1(
编辑:使用您的真实数据并遵循@langtang的评论保留第一列和最后一列,您可以执行以下操作:
Reprex
- 数据
structure(list(date = c("31.12.2001", "31.12.2002", "31.12.2003",
"31.12.2004", "31.12.2005", "31.12.2006"), ABI.BR = c(7342987743,
6280730128, 6324404232, 7850834622, 12233279545, 12660668341),
NOVN.S = c(21072681660, 17748921146, 19755193920, 20947725570,
26451714210, 26602920050), ROG.S = c(19737660401, 20316428627,
20002601887, 19134043533, 22833829836, 26148301223), NESN.S = c(57324018814,
61501808862, 56367998444, 54889985325, 58587603997, 61238063838
), NOVOb.CO = c(3199830778, 3350848339, 3510991763, 3905732023,
4527870421, 5198638861), HRMS.PA = c(1226900000L, 1242300000L,
1230000000L, 1331400000L, 1427400000L, 1514900000L), TTEF.PA = c(1.05895e+11,
92108991318, 93961037997, 87346032957, 1.22854e+11, 1.26235e+11
), OREP.PA = c(13740400000, 1.4288e+10, 14029100000, 13641300000,
14532500000, 15790100000), SASY.PA = c(6.488e+09, 7.448e+09,
8.048e+09, 1.5733e+10, 2.8513e+10, 2.9489e+10), LVMH.PA = c(1.2229e+10,
1.2693e+10, 1.1962e+10, 1.2481e+10, 1.391e+10, 1.5306e+10
), MAX = c(163033294916, 144604417185, 122995279989, 123515812436,
171566971202, 173975809100)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
- 代码
library(dplyr)
df %>% mutate(across(.cols = 2:10, ~ ifelse(.x > MAX, MAX, .x)))
- 输出
#> date ABI.BR NOVN.S ROG.S NESN.S NOVOb.CO
#> 1 31.12.2001 7342987743 21072681660 19737660401 57324018814 3199830778
#> 2 31.12.2002 6280730128 17748921146 20316428627 61501808862 3350848339
#> 3 31.12.2003 6324404232 19755193920 20002601887 56367998444 3510991763
#> 4 31.12.2004 7850834622 20947725570 19134043533 54889985325 3905732023
#> 5 31.12.2005 12233279545 26451714210 22833829836 58587603997 4527870421
#> 6 31.12.2006 12660668341 26602920050 26148301223 61238063838 5198638861
#> HRMS.PA TTEF.PA OREP.PA SASY.PA LVMH.PA MAX
#> 1 1226900000 105895000000 13740400000 6.4880e+09 1.2229e+10 163033294916
#> 2 1242300000 92108991318 14288000000 7.4480e+09 1.2693e+10 144604417185
#> 3 1230000000 93961037997 14029100000 8.0480e+09 1.1962e+10 122995279989
#> 4 1331400000 87346032957 13641300000 1.5733e+10 1.2481e+10 123515812436
#> 5 1427400000 122854000000 14532500000 2.8513e+10 1.3910e+10 171566971202
#> 6 1514900000 126235000000 15790100000 2.9489e+10 1.5306e+10 173975809100
创建于2022-03-31由reprex包(v2.0.1(
cols = c('a', 'b', 'c')
df[cols] = lapply(df[cols], (x) ifelse(x > df$MAX, df$MAX, x))
# a b c MAX
# 1 2 3 3 3
# 2 4 5 6 6
# 3 6 7 8 8