r语言 - 清理数据集,其中的字符应仅为数字。如何对这些行进行子集化?我不想失去他们


claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
<chr>        <chr>       <chr>        <chr>       <chr>       <chr>      <chr>      <chr>        <chr>  <chr>           
1 ClaimNumber~ MemberName~ Closed       A very lon~ NULL        Medical O~ 343.32     Strain       F      47              
2 ClaimNumber~ MemberName~ Closed       56980       Late        Medical O~ 1253.04    Strain       M      50              
3 ClaimNumber~ MemberName~ Closed       44195       Late         Indemnity  584.14     Strain       M      62              
4 ClaimNumber~ MemberName~ Open         44194       1015        Indemnity  2573.66    Fall/Slip    F      49              
5 ClaimNumber~ MemberName~ Closed       44194       9           Indemnity  547.39     Strain       F      51

我有一个数据集,在字段内应该是数字的字符文本。例如:injury_date和injury_time。我想将这些子集/过滤到另一个数据帧。将它们从当前的框架中移除,但不会丢失记录。Claim_cost和injured_worker_experience也有类似的记录。

将这些子集划分出来的最有效方法是什么?

单独的帧

claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
<chr>        <chr>       <chr>        <chr>       <chr>       <chr>      <chr>      <chr>        <chr>  <chr>           
1 ClaimNumber~ MemberName~ Closed       A very lon~ NULL        Medical O~ 343.32     Strain       F      47              
2 ClaimNumber~ MemberName~ Closed       56980       Late        Medical O~ 1253.04    Strain       M      50              
3 ClaimNumber~ MemberName~ Closed       44195       Late         Indemnity  584.14     Strain       M      62          

试试这个

read_csv("file path", col_type=cols(as_doubl(),as_numeric())) 

你一个接一个地解析它,一次一个,试一试。

这里的技巧是将感兴趣的列从字符转换为数值,然后使用Base R的complete.case()函数过滤转换和原始数据帧。

df<-read.table(header=TRUE, text="claim_number member_name claim_status injury_date injury_time claim_type claim_cost injury_cause gender injured_worker_~
ClaimNumber~ MemberName~ Closed       'A very lon~' NULL        'Medical O~' 343.32     Strain       F      47              
ClaimNumber~ MemberName~ Closed       56980       Late        'Medical O~' 1253.04    Strain       M      50              
ClaimNumber~ MemberName~ Closed       44195       Late         Indemnity  584.14     Strain       M      62              
ClaimNumber~ MemberName~ Open         44194       1015        Indemnity  2573.66    Fall/Slip    F      49              
ClaimNumber~ MemberName~ Closed       44194       9           Indemnity  547.39     Strain       F      51")
str(df)
library(dplyr)
#create df with converted columns
#  May need to edit the array to handle all the necessary column names
df2<-df %>% mutate_at(c("injury_date", "injury_time"), as.numeric)
#retrieve rows with numeric values
df_onlyNums <- df2[complete.cases(df2), ]
#retrieve rows with non numeric values
df_char <- df[!complete.cases(df2), ]

相关内容

  • 没有找到相关文章

最新更新