在字符串的R数据帧列中查找最大数目

对于数据帧的partiguar列中的每个单元格(这里我们将其简单地命名为df(，我想找到最初表示为字符串并嵌入字符串中的最大值和最小值。单元格中的任何逗号都没有特殊意义。这些数字不应该是一个百分比，因此，例如，如果出现50%，则50将被排除在考虑之外。数据帧的相关列看起来像这样：

| particular_col_name | 
| ------------------- | 
| First Row String10. This is also a string_5, and so is this 20, exclude70% |
| Second_Row_50%, number40. Number 4. number_15|

因此，应该创建两个标题为"maximum_number"one_answers"minimum number"的新列，在第一行的情况下，前者应该分别为20和5。请注意，70已被排除在外，因为它旁边有%符号。同样，第二行应将40和4放入新列中。

我在dplyr"mutate"运算符中尝试了几种方法(例如str_extract_all、regmatches、strsplit(，但它们要么给出错误消息(特别是关于输入列particular_col_name(，要么没有以适当的格式输出数据，以便于识别最大值和最小值。

如有任何帮助，我们将不胜感激。

library(tidyverse)
tibble(
particular_col_name = c(
"First Row String10. This is also a string_5, and so is this 20, exclude70%",
"Second_Row_50%, number40. Number 4. number_15",
"20% 30%"
)
) %>%
mutate(
numbers = particular_col_name %>% map(~ {
.x %>% str_remove_all("[0-9]+%") %>% str_extract_all("[0-9]+") %>% simplify() %>% as.numeric()
}),
min = numbers %>% map_dbl(~ .x %>% min() %>% na_if(Inf) %>% na_if(-Inf)),
max = numbers %>% map_dbl(~ .x %>% max() %>% na_if(Inf) %>% na_if(-Inf))
) %>%
select(-numbers)
#> Warning in min(.): no non-missing arguments to min; returning Inf
#> Warning in max(.): no non-missing arguments to max; returning -Inf
#> # A tibble: 3 x 3
#>   particular_col_name                                                  min   max
#>   <chr>                                                              <dbl> <dbl>
#> 1 First Row String10. This is also a string_5, and so is this 20, e…     5    20
#> 2 Second_Row_50%, number40. Number 4. number_15                          4    40
#> 3 20% 30%                                                               NA    NA

^{创建于2022-02-22由reprex包(v2.0.0(}

我们可以将str_extract_all与sapply:结合使用

library(stringr)
df$min <- sapply(str_extract_all(df$particular_col_name, "[0-9]+"), function(x) min(as.integer(x)))
df$max <- sapply(str_extract_all(df$particular_col_name, "[0-9]+"), function(x) max(as.integer(x)))

particular_col_name                                                          min   max
<chr>                                                                      <int> <int>
1 First Row String10. This is also a string_5, and so is this 20, exclude70%     5    70
2 Second_Row_50%, number40. Number 4. number_15                                  4    50

数据：

df <- structure(list(particular_col_name = c("First Row String10. This is also a string_5, and so is this 20, exclude70%", 
"Second_Row_50%, number40. Number 4. number_15"), min = 5:4, 
max = c(70L, 50L)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

相关内容

最新更新

热门标签：