R/dplyr:如何在数据框中只保留整数?



我有一个包含年份的数据框(数据类型chr(:

Years:
5 yrs
10 yrs
20 yrs
4 yrs

我只想保留整数来获取这样的数据框(数据类型num(:

Years:
5
10
20
4

如何在 R 中执行此操作?

您需要提取数字并将其视为数字类型

df$year <- as.numeric(sub(" yrs", "", df$year))

根据您的其他要求,一个更通用的解决方案,但它也有局限性。 更复杂的years3解决方案的好处是它更优雅地处理意想不到但很有可能的答案。

library(dplyr)
library(stringr)
library(purrr)
Years <- c("5 yrs",
"10 yrs",
"20 yrs",
"4 yrs",
"4-5 yrs",
"75 to 100 YEARS old",
">1 yearsmispelled or whatever")
df <- data.frame(Years)
# just the numbers but loses the -5 in 4-5
df$Years1 <- as.numeric(sub("(\d{1,4}).*", "\1", df$Years)) 
#> Warning: NAs introduced by coercion
# just the numbers but loses the -5 in 4-5 using str_extract
df$Years2 <- str_extract(df$Years, "[0-9]+")
# a lot more needed to account for averaging
df$Years3 <- str_extract_all(df$Years, "[0-9]+") %>%
purrr::map( ~ ifelse(length(.x) == 1, 
as.numeric(.x), 
mean(unlist(as.numeric(.x)))))
df
#>                           Years Years1 Years2 Years3
#> 1                         5 yrs      5      5      5
#> 2                        10 yrs     10     10     10
#> 3                        20 yrs     20     20     20
#> 4                         4 yrs      4      4      4
#> 5                       4-5 yrs      4      4    4.5
#> 6           75 to 100 YEARS old     75     75   87.5
#> 7 >1 yearsmispelled or whatever     NA      1      1

基本 R 解决方案:

clean_years <- as.numeric(gsub("\D", "", Years))

数据:

Years <- c("5 yrs",
"10 yrs",
"20 yrs",
"4 yrs",
"5 yrs")

最新更新