将混合数据帧(字符、矢量、整数)中的小写字母转换为大写字母,同时将数据类型保留在R中



我有一个由字符、整数和因子组成的混合数据帧,我想将其转换为大写。这是一个常见的问题(例如这里(,但我无法在不转换数据类型的情况下将字符AND因子更改为大写。下面的工作示例:


# create a three column dataframe with characters, integers and factors:
df <- data.frame(v1=letters[1:5],v2=1:5,v3=as.factor(letters[10:14]),stringsAsFactors=FALSE)
v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n
glimpse(df)
# v1 <chr> "a", "b", "c", "d", "e"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n

mutate_all和toupper变为大写,但将因子转换为字符:

df <- mutate_all(df, funs(toupper))
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <chr> "1", "2", "3", "4", "5"
# v3 <chr> "J", "K", "L", "M", "N"

mute_if和str_to_upper适用于is.character,但不适用于因素:

df <- df %>% mutate_if(is.character, str_to_upper)
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n

mutate_if和str_to_upper适用于is.factor,但将因子转换为字符:

df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, str_to_upper)
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <chr> "J", "K", "L", "M", "N"

理想情况下,我希望找到一个保留数据类型并可应用于任何数据帧的一揽子解决方案。

要在静心的回应基础上解决Thomas Moore的后续问题,您可以将列名更改为大写,并添加以下内容:

df %>% 
mutate(across(where(is.character), str_to_upper),
across(where(is.factor), ~ factor(str_to_upper(.x)))) %>%
rename_with(str_to_upper)
df %>% 
mutate(across(where(is.character), str_to_upper),
across(where(is.factor), ~ factor(str_to_upper(.x))))

toupperstr_to_upper将类更改为字符。你有两个选择:

  1. 转换回factor
df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, ~factor(str_to_upper(.)))
str(df)
#'data.frame':  5 obs. of  3 variables:
# $ v1: chr  "a" "b" "c" "d" ...
# $ v2: int  1 2 3 4 5
# $ v3: Factor w/ 5 levels "J","K","L","M",..: 1 2 3 4 5
  1. 更改因子变量的levels。将上面的步骤1和2合并为1
df <- df %>% mutate_if(~is.character(.) || is.factor(.), 
~if(is.factor(.)) {levels(.) <- toupper(levels(.));.} else toupper(.))

注意,_if_at_all动词在dplyr1.0.0中已被弃用,取而代之的是across

相关内容

  • 没有找到相关文章

最新更新