使用列值中的地址范围在r中创建具有单个地址的新行



给定以下数据:

data <- data.frame("Street" = c("Example1", "Example2", "Example3", "Example4"),
"Number" = c("10-20a", "4b-8", NA, "14-16"))
Street Number
1 Example1 10-20a
2 Example2   4b-8
3 Example3   <NA>
4 Example4  14-16

我如何创建一个数据帧,每个单独的地址看起来像这样:

data_long <- data.frame("Street" = c(rep("Example1", 11), rep("Example2", 5),
"Example3", rep("Example4", 3)),
"Number" = c(10:19,"20a", "4b", 5:8, NA, c(14:16)))
Street Number
1  Example1     10
2  Example1     11
3  Example1     12
4  Example1     13
5  Example1     14
6  Example1     15
7  Example1     16
8  Example1     17
9  Example1     18
10 Example1     19
11 Example1    20a
12 Example2     4b
13 Example2      5
14 Example2      6
15 Example2      7
16 Example2      8
17 Example3   <NA>
18 Example4     14
19 Example4     15
20 Example4     16

我尝试将数字和字母分开,然后使用rowwisemutate,但它不起作用,因为然后字母在每个地址旁边。

我们将非na元素filter,提取一个或多个数字(d+)与字母([a-z])作为mutate中的'v1'列,循环使用map2之间提取的数字,获得序列(:)作为列表列,unnest (from tidyr)列表列扩展数据并根据case_when中的'v1'列替换'Number'中的值并与NA行绑定

library(dplyr)
library(stringr)
library(tidyr)
library(purrr)
data %>% 
filter(complete.cases(Number)) %>% 
mutate(v1 = str_extract(Number, "\d+[a-z]"),
Number = map2(readr::parse_number(Number), 
as.numeric(str_extract(Number, "-(\d+)", group = 1)), `:`)) %>% 
unnest(Number) %>% 
mutate(Number = case_when(str_detect(v1, as.character(Number)) 
~ v1, 
TRUE ~ as.character(Number)), v1 = NULL) %>% 
bind_rows(data %>%
filter(is.na(Number))) %>% 
arrange(Street)

与产出

# A tibble: 20 × 2
Street   Number
<chr>    <chr> 
1 Example1 10    
2 Example1 11    
3 Example1 12    
4 Example1 13    
5 Example1 14    
6 Example1 15    
7 Example1 16    
8 Example1 17    
9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 <NA>  
18 Example4 14    
19 Example4 15    
20 Example4 16    

这是在@akrun的帮助下生成的另一种方法(这里:如何展开行并在给定的开始和结束之间填充数字):

library(dplyr)
library(tidyr)
library(readr)
data %>% 
separate_rows(Number, sep = "-") %>% 
group_by(Street) %>% 
mutate(Number1 = ifelse(is.na(Number), 0, parse_number(Number))) %>% 
tidyr::complete(Number1 = full_seq(Number1, period = 1)) %>% 
mutate(Number = coalesce(Number, as.character(Number1)), .keep="unused")
Street   Number
<chr>    <chr> 
1 Example1 10    
2 Example1 11    
3 Example1 12    
4 Example1 13    
5 Example1 14    
6 Example1 15    
7 Example1 16    
8 Example1 17    
9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 0     
18 Example4 14    
19 Example4 15    
20 Example4 16 

最新更新