如何检查多列是否为有效百分比(不是负数或大于1的数字)



我正在尝试检查数据帧的多列是否具有有效的百分比。也就是说,没有负数或大于1的数字。

下面我使用dput((函数提供了一个数据示例。

structure(list(fightName = c("UFC Fight Night: Makhachev vs. Moises", 
"UFC Fight Night: Makhachev vs. Moises", "UFC Fight Night: Makhachev vs. Moises", 
"UFC Fight Night: Makhachev vs. Moises", "UFC Fight Night: Makhachev vs. Moises", 
"UFC Fight Night: Makhachev vs. Moises"), redFighterName = c("Alan Baudot", 
"Francisco Figueiredo", "Amanda Lemos", "Daniel Rodriguez", "Khalid Taha", 
"Gabriel Benitez"), redFighterHead = c(0.75, 0.57, 0.85, 0.8, 
0.27, 0.66), redFighterBody = c(0.16, 0.25, 0.14, 0.04, 0.36, 
0.22), redFighterLeg = c(0.08, 0.17, 0, 0.15, 0.36, 0.1), redFighterDistance = c(0.6, 
0.64, 0.85, 0.84, 0.9, 0.77), redFighterClinch = c(0.31, 0.14, 
0, 0.02, 0.09, 0.1), redFighterGround = c(0.08, 0.21, 0.14, 0.13, 
0, 0.12), redFighterResult = c("W", "W", "W", "W", "W", "W"), 
blueFighterName = c("Rodrigo Nascimento", "Malcolm Gordon", 
"Montserrat Conejo", "Preston Parsons", "Sergey Morozov", 
"Billy Quarantillo"), blueFighterHead = c(0.83, 0.86, 0.66, 
0.6, 0.9, 0.73), blueFighterBody = c(0.12, 0.04, 0.33, 0.17, 
0.04, 0.2), blueFighterLeg = c(0.04, 0.08, 0, 0.21, 0.06, 
0.07), blueFighterDistance = c(0.91, 0.47, 1, 1, 0.66, 0.61
), blueFighterClinch = c(0.08, 0.1, 0, 0, 0.12, 0.11), blueFighterGround = c(0, 
0.41, 0, 0, 0.22, 0.28), blueFighterResult = c("L", "L", 
"L", "L", "L", "L")), row.names = c(NA, 6L), class = "data.frame")

我想检查redFighterHead、redFighter Body等(都包含百分比数据(是否有有效的百分比。也就是说,不出现负数或大于1的数字。

有人能想出办法吗?

TO提供reprex:后更新

我会这样做:

library(tidyverse)
df %>%
select(where(is.numeric)) %>%
summarize(across(everything(), ~all(. >= 0 & . <= 1)))

这将为您提供信息,哪些列满足您的条件,哪些列不满足。还要注意,我使用了一个条件>=0和<1而不是>0和<1,因为0和1是有效的百分比!

另一个注意事项:我只在您的条件下检查了数字列,并留下了我们的字符列。

redFighterHead redFighterBody redFighterLeg redFighterDistance redFighterClinch redFighterGround blueFighterHead blueFighterBody blueFighterLeg blueFighterDistance
1           TRUE           TRUE          TRUE               TRUE             TRUE             TRUE            TRUE            TRUE           TRUE                TRUE
blueFighterClinch blueFighterGround
1              TRUE              TRUE

如果你只想打印关于你的数据的一般信息,你可以把这个代码包装到你的If条件中,并使用all:

if (all(df %>%
select(where(is.numeric)) %>%
summarize(across(everything(), ~all(. >= 0 & . <= 1))) == TRUE)) {
print("df has no values less than 0 or greater than 1")
} else {
print ("df only has values between 0 and 1")
}

附加说明:您当前的打印报表基本相同。第一个语句表示所有元素都在0和1之间,而第二个语句则完全相同。


旧版:

假设你的数据被称为df和你的列百分比,你可以做:

if (all(df$Percentages > 0 & df$Percentages < 1)) {
print("df has no values less than 0 or greater than 1")
} else {
print ("df only has values between 0 and 1")
}

最新更新