r语言 - 使用来自多个现有列的条件生成新列

  • 本文关键字:条件 新列 r语言 r dplyr
  • 更新时间 :
  • 英文 :


我试图在数据框中创建一个列,这取决于当前数据框中的值。这是我从

开始的数据帧的head()

编辑:这是我开始时为练习排除不必要列的数据框架。除了这两列,它还有很多其他的列:

> head(df)
# A tibble: 6 x 2
Responded `Response Rate` 
<chr>     <chr>                  
1 0%        0%                         
2 0%        0%                           
3 0%        0%                           
4 100%      100%                      
5 0%        0%                           
6 100%      0%          

我想要一个名为"完成率"使用以下条件填充值:

如果Responded0%,则值应为NA(或NULL- R中以无数据者为准)

else,取Response Rate

的值。,输出应为:

> head(df)
# A tibble: 6 x 3
Responded `Response Rate` `Completion Rate`
<chr>     <chr>           <chr>            
1 0%        0%              NA               
2 0%        0%              NA               
3 0%        0%              NA               
4 100%      100%            100%             
5 0%        0%              NA               
6 100%      0%              0% 

我尝试使用mutatereplace在没有任何临时步骤的情况下创建新列,没有任何乐趣。如果有人能示范一下怎么做,那就太好了。

然后我尝试通过首先创建一个列来构建Completion Rate:

df$"Completion Rate" <- df$`Response Rate`

,然后替换NA应该使用以下代码的列中的值:

df <- mutate(df, replace("Completion Rate", Responded == 0, NA, response_df$`Response Rate`))

出现以下错误:

> response_df <- mutate(response_df, replace("Completion Rate", Responded == 0, NA, response_df$`Response Rate`))
Error: Problem with `mutate()` input `..1`.
i `..1 = replace("Completion Rate", Responded == 0, NA, response_df$`Response Rate`)`.
x unused argument (response_df$`Response Rate`)
Run `rlang::last_error()` to see where the error occurred.

运行额外建议的错误检查代码:

> rlang::last_error()
<error/dplyr:::mutate_error>
Problem with `mutate()` input `..1`.
i `..1 = replace("Completion Rate", Responded == 0, NA, response_df$`Response Rate`)`.
x unused argument (response_df$`Response Rate`)
Backtrace:
1. dplyr::mutate(...)
6. base::.handleSimpleError(...)
7. dplyr:::h(simpleError(msg, call))
> rlang::last_trace()
<error/dplyr:::mutate_error>
Problem with `mutate()` input `..1`.
i `..1 = replace("Completion Rate", Responded == 0, NA, response_df$`Response Rate`)`.
x unused argument (response_df$`Response Rate`)
Backtrace:
x
1. +-dplyr::mutate(...)
2. +-dplyr:::mutate.data.frame(...)
3. | -dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
4. |   +-base::withCallingHandlers(...)
5. |   -mask$eval_all_mutate(quo)
6. -base::.handleSimpleError(...)
7.   -dplyr:::h(simpleError(msg, call))
<error/simpleError>
unused argument (response_df$`Response Rate`)

我尝试使用0%"0%"。我试着参考Completion Rate而不是Response Rate的"else"replace的论证。我尝试了= 0而不是== 0。这些给出了不同的错误。

使用ifelse-

library(dplyr)
df %>%
mutate(Completion_Rate = ifelse(Responded == '0%', NA, Response_Rate))
#  Responded Response_Rate Completion_Rate
#1        0%            0%            <NA>
#2        0%            0%            <NA>
#3        0%            0%            <NA>
#4      100%          100%            100%
#5        0%            0%            <NA>
#6      100%            0%              0%

以可重复的格式提供数据更容易提供帮助-

df <- structure(list(Responded = c("0%", "0%", "0%", "100%", "0%", 
"100%"), Response_Rate = c("0%", "0%", "0%", "100%", "0%", "0%"
)), row.names = c(NA, -6L), class = "data.frame")

您可以使用来自tidyverse的dplyr

library(dplyr)
df <- data.frame(Responded = c(0,0,0,100,0,100),
`Response Rate` = c(0,0,0,100,0,0))
print(df)
Responded `Response Rate`
1         0             0
2         0             0
3         0             0
4       100           100
5         0             0
6       100             0

df <- df %>%
mutate(`Completion Rate` <- ifelse(Responded==0, NA, `Response Rate`))
print(df)
Responded `Response Rate` `Completion Rate`
1         0             0              NA
2         0             0              NA
3         0             0              NA
4       100           100             100
5         0             0              NA
6       100             0               0

或者在字符串百分比

中有值
library(dplyr)
df <- data.frame(Responded = c('0%','0%','0%','100%','0%','100%'),
`Response Rate` = c('0%','0%','0%','100%','0%','0%'))
print(df)
Responded `Response Rate`
1        0%            0%
2        0%            0%
3        0%            0%
4      100%          100%
5        0%            0%
6      100%            0%
df <- df %>%
mutate(`Completion Rate` = ifelse(Responded=='0%', NA, `Response Rate`))
Responded `Response Rate` `Completion Rate`
1        0%            0%            <NA>
2        0%            0%            <NA>
3        0%            0%            <NA>
4      100%          100%            100%
5        0%            0%            <NA>
6      100%            0%              0%
> 

最新更新