claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 ClaimNumber~ MemberName~ Closed A very lon~ NULL Medical O~ 343.32 Strain F 47
2 ClaimNumber~ MemberName~ Closed 56980 Late Medical O~ 1253.04 Strain M 50
3 ClaimNumber~ MemberName~ Closed 44195 Late Indemnity 584.14 Strain M 62
4 ClaimNumber~ MemberName~ Open 44194 1015 Indemnity 2573.66 Fall/Slip F 49
5 ClaimNumber~ MemberName~ Closed 44194 9 Indemnity 547.39 Strain F 51
我有一个数据集,在字段内应该是数字的字符文本。例如:injury_date和injury_time。我想将这些子集/过滤到另一个数据帧。将它们从当前的框架中移除,但不会丢失记录。Claim_cost和injured_worker_experience也有类似的记录。
将这些子集划分出来的最有效方法是什么?
单独的帧
claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 ClaimNumber~ MemberName~ Closed A very lon~ NULL Medical O~ 343.32 Strain F 47
2 ClaimNumber~ MemberName~ Closed 56980 Late Medical O~ 1253.04 Strain M 50
3 ClaimNumber~ MemberName~ Closed 44195 Late Indemnity 584.14 Strain M 62
试试这个
read_csv("file path", col_type=cols(as_doubl(),as_numeric()))
你一个接一个地解析它,一次一个,试一试。
这里的技巧是将感兴趣的列从字符转换为数值,然后使用Base R的complete.case()
函数过滤转换和原始数据帧。
df<-read.table(header=TRUE, text="claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
ClaimNumber~ MemberName~ Closed 'A very lon~' NULL 'Medical O~' 343.32 Strain F 47
ClaimNumber~ MemberName~ Closed 56980 Late 'Medical O~' 1253.04 Strain M 50
ClaimNumber~ MemberName~ Closed 44195 Late Indemnity 584.14 Strain M 62
ClaimNumber~ MemberName~ Open 44194 1015 Indemnity 2573.66 Fall/Slip F 49
ClaimNumber~ MemberName~ Closed 44194 9 Indemnity 547.39 Strain F 51")
str(df)
library(dplyr)
#create df with converted columns
# May need to edit the array to handle all the necessary column names
df2<-df %>% mutate_at(c("injury_date", "injury_time"), as.numeric)
#retrieve rows with numeric values
df_onlyNums <- df2[complete.cases(df2), ]
#retrieve rows with non numeric values
df_char <- df[!complete.cases(df2), ]