我有一个包含年份的数据框(数据类型chr
(:
Years:
5 yrs
10 yrs
20 yrs
4 yrs
我只想保留整数来获取这样的数据框(数据类型num
(:
Years:
5
10
20
4
如何在 R 中执行此操作?
您需要提取数字并将其视为数字类型
df$year <- as.numeric(sub(" yrs", "", df$year))
根据您的其他要求,一个更通用的解决方案,但它也有局限性。 更复杂的years3
解决方案的好处是它更优雅地处理意想不到但很有可能的答案。
library(dplyr)
library(stringr)
library(purrr)
Years <- c("5 yrs",
"10 yrs",
"20 yrs",
"4 yrs",
"4-5 yrs",
"75 to 100 YEARS old",
">1 yearsmispelled or whatever")
df <- data.frame(Years)
# just the numbers but loses the -5 in 4-5
df$Years1 <- as.numeric(sub("(\d{1,4}).*", "\1", df$Years))
#> Warning: NAs introduced by coercion
# just the numbers but loses the -5 in 4-5 using str_extract
df$Years2 <- str_extract(df$Years, "[0-9]+")
# a lot more needed to account for averaging
df$Years3 <- str_extract_all(df$Years, "[0-9]+") %>%
purrr::map( ~ ifelse(length(.x) == 1,
as.numeric(.x),
mean(unlist(as.numeric(.x)))))
df
#> Years Years1 Years2 Years3
#> 1 5 yrs 5 5 5
#> 2 10 yrs 10 10 10
#> 3 20 yrs 20 20 20
#> 4 4 yrs 4 4 4
#> 5 4-5 yrs 4 4 4.5
#> 6 75 to 100 YEARS old 75 75 87.5
#> 7 >1 yearsmispelled or whatever NA 1 1
基本 R 解决方案:
clean_years <- as.numeric(gsub("\D", "", Years))
数据:
Years <- c("5 yrs",
"10 yrs",
"20 yrs",
"4 yrs",
"5 yrs")