这是我的数据框架
df2 <- structure(list(Code = c("ICB-9_label_1", "1", "2", "3",
"4", "5", "1", "ICB-10_label_2", "3", "4", "5",
"1", "2", "3", "3", "5", "1", "2",
"3", "4", "5", "1", "2", "3", "4",
"5", "1", "2", "3", "4", "5", "1",
"2", "3", "4", "5", "1", "2", "3",
"4", "5", "1", "2", "3", "4", "5",
"1"), Description = c("description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here", "description here", "description here", "description here",
"description here")), row.names = c(NA, -47L), class = c("tbl_df",
"tbl", "data.frame"))
下面是表格的样子:
Code Description
ICB-9_label_1 description here
1 description here
2 description here
3 description here
4 description here
5 description here
1 description here
ICB-10_label_2 description here
3 description here
4 description here
我想创建名为"标签"的第三列。它会显示&;icb_9_label_1 &;一直往下,直到到达"icb_10_label_2&;"的行号,那么该列将显示";icb_10_label_2&;"一直往下。我不想重写第一列中的数字,因为1、2、3、4、5的值很重要。
有多种方法可以做到这一点。一种选择是提取具有'label'的行,而其他返回NA,然后使用fill
将NA元素更改为先前的非NA值
library(dplyr)
library(tidyr)
library(stringr)
df2 <- df2 %>%
mutate(Labels = str_extract(Code, '.*label.*')) %>%
fill(Labels, .direction = 'downup')
与产出
df2
# A tibble: 47 × 3
Code Description Labels
<chr> <chr> <chr>
1 ICB-9_label_1 description here ICB-9_label_1
2 1 description here ICB-9_label_1
3 2 description here ICB-9_label_1
4 3 description here ICB-9_label_1
5 4 description here ICB-9_label_1
6 5 description here ICB-9_label_1
7 1 description here ICB-9_label_1
8 ICB-10_label_2 description here ICB-10_label_2
9 3 description here ICB-10_label_2
10 4 description here ICB-10_label_2
# … with 37 more rows
或base R
与grep
、cumsum
合用
transform(df2, Labels = grep('label', Code,
value = TRUE)[cumsum(grepl('label', Code))])