我使用的是R编程语言。
我有一个看起来像这样的数据集:
id = 1:5
col1 = c("john", "henry", "adam", "jenna", "peter")
col2 = c("river B8C 9L4", "Field U9H 5E2 PP", "NA", "ocean A1B 5H1 dd", "dave")
col3 = c("matt", "steve", "forest K0Y 1U9 hu2", "NA", "NA")
col4 = c("Phone: 111 1111 111", "Phone: 222 2222", "Phone: 333 333 1113", "Phone: 444 111 1153", "Phone: 111 111 1121")
my_data = data.frame(id, col1, col2, col3, col4)
id col1 col2 col3 col4
1 1 john river B8C 9L4 matt Phone: 111 1111 111
2 2 henry Field U9H 5E2 PP steve Phone: 222 2222
3 3 adam NA forest K0Y 1U9 hu2 Phone: 333 333 1113
4 4 jenna ocean A1B 5H1 dd NA Phone: 444 111 1153
5 5 peter dave NA Phone: 111 111 1121
对于这个数据集,我想:
- 始终保留id列和第一列
- 并保持第一列以以下模式:字母数字字母数字字母数字
- 始终保留电话号码栏
它看起来像这样:
id col1 new_col col4
1 1 john river B8C 9L4 Phone: 111 1111 111
2 2 henry Field U9H 5E2 PP Phone: 222 2222
3 3 adam forest K0Y 1U9 hu2 Phone: 333 333 1113
4 4 jenna ocean A1B 5H1 Phone: 444 111 1153
我在网上找到了这个可以识别所需模式的REGEX代码:
> apply(my_data, 1, function(x) gsub('(([A-Z] ?[0-9]){3})|.', '\1', toString(x)))
[1] "B8C 9L4" "U9H 5E2" "K0Y 1U9" "A1B 5H1" ""
但是有人能告诉我如何在R中使用这个REGEX代码来完成我想要的结果吗?
谢谢!
library(tidyverse)
my_data%>%
pivot_longer(-c(id, col1))%>%
filter(str_detect(value, "([A-Z] ?[0-9]){3}|Phone:[0-9 ]+"))%>%
mutate(name = ifelse(str_detect(value,"Phone"),name, "new_col"))%>%
pivot_wider(values_fn = 'first')
# A tibble: 5 × 4
id col1 new_col col4
<int> <chr> <chr> <chr>
1 1 john river B8C 9L4 Phone: 111 1111 111
2 2 henry Field U9H 5E2 PP Phone: 222 2222
3 3 adam forest K0Y 1U9 hu2 Phone: 333 333 1113
4 4 jenna ocean A1B 5H1 dd Phone: 444 111 1153
5 5 peter NA Phone: 111 111 1121