r-从数据帧的列名中提取数值

我有如下数据：

library(magrittr)
dat_I <- structure(list(`[0,25)` = c(0L, 2L, 252L, 3L, 34L, 0L, 2L, 65L, 
23L, 9L, 84L, 24L, 52L, 5L, 1L, 91L, 5L, 4L, 7L, 5L, 40L, 116L, 
12L), `[1000,1500)` = c(0L, 12L, 16L, 0L, 34L, 1L, 0L, 7L, 0L, 
0L, 2L, 0L, 4L, 11L, 1L, 0L, 0L, 6L, 8L, 0L, 2L, 8L, 0L), `[1500,1000000)` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), `[1500,3000)` = c(8L, 5L, 8L, 0L, 16L, 2L, 10L, 4L, 5L, 0L, 
4L, 3L, 0L, 6L, 4L, 0L, 49L, 7L, 6L, 0L, 1L, 2L, 0L), `[25,1000)` = c(0L, 
22L, 48L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 25L, 27L, 0L, 0L, 28L, 0L), `[25,1500)` = c(15L, 0L, 0L, 
0L, 0L, 0L, 23L, 0L, 23L, 0L, 0L, 25L, 0L, 0L, 0L, 0L, 5L, 0L, 
0L, 0L, 0L, 0L, 0L), `[25,250)` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 42L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), `[25,3000)` = c(0L, 0L, 0L, 33L, 0L, 0L, 0L, 0L, 0L, 63L, 
0L, 0L, 0L, 0L, 0L, 29L, 0L, 0L, 0L, 34L, 0L, 0L, 83L), `[25,500)` = c(0L, 
0L, 0L, 0L, 213L, 24L, 0L, 23L, 0L, 0L, 25L, 0L, 21L, 107L, 0L, 
0L, 0L, 0L, 0L, 0L, 23L, 0L, 0L), `[250,500)` = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), `[3000,1000000)` = c(2L, 1L, 1L, 7L, 1L, 0L, 
2L, 1L, 5L, 25L, 5L, 1L, 0L, 3L, 0L, 4L, 7L, 2L, 5L, 17L, 0L, 
5L, 19L), `[500,1000)` = c(0L, 0L, 0L, 0L, 122L, 9L, 0L, 11L, 
0L, 0L, 7L, 0L, 6L, 44L, 3L, 0L, 0L, 0L, 0L, 0L, 7L, 0L, 0L)), class = "data.frame", row.names = c("A", 
"B", "C", "D", 
"E", "F", "G", 
"H", "I", "J", "K", 
"L", "M", "N", 
"O", "P", "Q", 
"R", "S", "T", "U", 
"V", "W"))
dat_II <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L, 
`[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L, 
`[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L, 
`[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")

我想应用以下代码：

s_ordered_II <- stringi::stri_extract_all_regex(colnames(dat_II), "[[:alpha:]]+") %>%
unlist() %>% 
unique() %>% 
sort()
s_ordered_I <- stringi::stri_extract_all_regex(colnames(dat_I), "[[:alpha:]]+") %>%
unlist() %>% 
unique() %>% 
sort()

由于某些原因，它不起作用，尽管它以前使用过类似的代码。我不明白为什么。

有人能评论一下吗？

您使用的是"[[:alpha:]]+"，它将查找所有的alphabeta字符([:lower:]和[:upper:]的组合)。如果你想要数字，你应该使用"[[:digit:]]+"(或"[[:alnum:]]+")。除以下两项外，其余均参见?regex：

'[:alpha:]' Alphabetic characters: '[:lower:]' and '[:upper:]'.
'[:digit:]' Digits: '0 1 2 3 4 5 6 7 8 9'.

据此，

stringi::stri_extract_all_regex(colnames(dat_II), "[[:digit:]]+") %>%
unlist() %>% 
unique() %>% 
sort()
#  [1] "0"       "100"     "1000"    "1000000" "1500"    "25"      "250"     "3000"    "50"      "500"    
stringi::stri_extract_all_regex(colnames(dat_I), "[[:digit:]]+") %>%
unlist() %>% 
unique() %>% 
sort()
# [1] "0"       "1000"    "1000000" "1500"    "25"      "250"     "3000"    "500"

虽然这确实失去了[0,25)的配对。。。

相关内容

最新更新

热门标签：