如果我能在任何列中找到字符串Code
,我的目标是过滤我的数据。它应该在这个单词第一次出现时进行过滤。
我正在处理一个列表,包含Code
的列按其排列变化。所以我需要一个通用的方法来做这个,比如尝试。
一些示例数据:
# A tibble: 11 × 10
`Title:` `A&E weekly activity statistics, NHS…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Contact: Paul Steele - Unify2@dh.gsi.gov.uk NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA
3 Provider Level Data NA NA NA NA NA NA NA NA NA
4 NA NA NA A&E … NA NA NA A&E … NA NA
5 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
6 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
7 NA NA NA NA NA NA NA NA NA NA
8 Q30 RLN City… 1423 669 297 2389 202 0 18
9 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
10 Q30 5J9 Darl… 0 0 0 0 0 0 0
11 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
我的尝试:
vf %>% filter(row_number() >= across(everything(), ~ .x %>% which(. == 'Code')))
我得到这些错误:
Error in `filter()`:
! Problem while computing `..1 = row_number() >= ...`.
Caused by error in `across()`:
! Problem while computing column `Title:`.
Caused by error in `which()`:
! argument to 'which' is not logical
Run `rlang::last_error()` to see where the error occurred.
预期输出:
# A tibble: 7 × 10
`Title:` `A&E weekly activity statistics, NHS and indepen…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City… 1423 669 297 2389 202 0 18
5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darl… 0 0 0 0 0 0 0
7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
复制代码:
structure(list(`Title:` = c("Contact:", NA, "Provider Level Data",
NA, "SHA", "-", NA, "Q30", "Q30", "Q30", "Q30"), `A&E weekly activity statistics, NHS and independent sector organisations in England` = c("Paul Steele - Unify2@dh.gsi.gov.uk",
NA, NA, NA, "Code", "-", NA, "RLN", "RXP", "5J9", "RR7"), ...3 = c(NA,
NA, NA, NA, "Name", "England", NA, "City Hospitals Sunderland NHS Foundation Trust",
"County Durham And Darlington NHS Foundation Trust", "Darlington PCT",
"Gateshead Health NHS Foundation Trust"), ...4 = c(NA, NA, NA,
"A&E attendances", "Type 1 Departments - Major A&E", "283175",
NA, "1423", "2473", "0", "1251"), ...5 = c(NA, NA, NA, NA, "Type 2 Departments - Single Specialty",
"12906", NA, "669", "0", "0", "0"), ...6 = c(NA, NA, NA, NA,
"Type 3 Departments - Other A&E/Minor Injury Unit", "136985",
NA, "297", "2088", "0", "0"), ...7 = c(NA, NA, NA, NA, "Total attendances",
"433066", NA, "2389", "4561", "0", "1251"), ...8 = c(NA, NA,
NA, "A&E attendances > 4 hours from arrival to admission, transfer or discharge",
"Type 1 Departments - Major A&E", "15347", NA, "202", "89", "0",
"24"), ...9 = c(NA, NA, NA, NA, "Type 2 Departments - Single Specialty",
"34", NA, "0", "0", "0", "0"), ...10 = c(NA, NA, NA, NA, "Type 3 Departments - Other A&E/Minor Injury Unit",
"172", NA, "18", "0", "0", "0")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
下面的尝试似乎得到了我想要的:
data %>% filter(row_number() >= which(keep(.,~any(which(.=='Code'))) == 'Code'))
然而,我相信有比这更干净的尝试!
我做了什么:
- 筛选大于或等于任何列中包含
Code
的逻辑向量的行号。 keep
保留Code
所在的列,因此filter
可以继续工作。 输出:
# A tibble: 7 × 10
`Title:` `A&E weekly activity statistics, NHS and indepen…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City… 1423 669 297 2389 202 0 18
5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darl… 0 0 0 0 0 0 0
7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
有一个函数cumany
可以帮你完成任务。
对于第一次出现"Code"之后的所有情况,它都返回TRUE
df |>
filter(if_any(everything(), ~ cumany(.x %in% "Code")))
`Title:` `A&E weekly activity statistics, NHS and independent sector organisations in England` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - England 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City Hospita… 1423 669 297 2389 202 0 18
5 Q30 RXP County Durha… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darlington P… 0 0 0 0 0 0 0
7 Q30 RR7 Gateshead He… 1251 0 0 1251 24 0 0
您可以使用str_detect
来检测"Code"across
所有列与if_any
相结合,看看是否有任何列与字符串,并删除所有行之前的特定行,像这样(感谢@Darren Tsai):
library(stringr)
library(dplyr)
vf %>%
filter(row_number() >= which.max(if_any(everything(), str_detect, "Code")))
#> # A tibble: 7 × 10
#> `Title:` `A&E weekly activit…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
#> 2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
#> 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 Q30 RLN City… 1423 669 297 2389 202 0 18
#> 5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
#> 6 Q30 5J9 Darl… 0 0 0 0 0 0 0
#> 7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
由reprex包(v2.0.1)创建于2022-07-01
老答:
您可以使用str_detect
来检测"Code"across
所有列,并删除特定行之前的所有行,如下所示:
library(stringr)
library(dplyr)
vf %>%
filter(row_number() > if_all(everything(), ~!str_detect(., "Code")))
#> # A tibble: 6 × 10
#> `Title:` `A&E weekly activit…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
#> 2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
#> 3 Q30 RLN City… 1423 669 297 2389 202 0 18
#> 4 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
#> 5 Q30 5J9 Darl… 0 0 0 0 0 0 0
#> 6 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
由reprex包(v2.0.1)创建于2022-07-01