R - 无法从数据帧的"Date"列中提取三个单独列中的日期、月份和年份,格式为 "dd/mm/yyyy" 或 "dd/m/yyyy"



我正试图使用

library(dplyr)
library(tidyr)
library(stringr)
# Dataframe has "Date" column and date in the format "dd/mm/yyyy" or "dd/m/yyyy"
df <- data.frame(Date = c("10/1/2001", "15/01/2010", "15/2/2010", "20/02/2010", "25/3/2010", "31/03/2010"))
# extract into three columns
df %>% extract(Date, c("Day", "Month", "Year"), "([^/]+), ([^/]+), ([^)]+)")

但是上面的代码返回:

Day Month Year
1 <NA>  <NA> <NA>
2 <NA>  <NA> <NA>
3 <NA>  <NA> <NA>
4 <NA>  <NA> <NA>
5 <NA>  <NA> <NA>
6 <NA>  <NA> <NA>

如何按预期正确提取结果中的日期:

Day Month Year
1 10  1 2010
2 15  1 2010
3 15  2 2010
4 20  2 2010
5 25  3 2010
6 31  3 2010

在这种情况下使用separate可能更容易

df %>% 
separate("Date", into=c("Day","Month","Year"), sep="/") %>% 
mutate(Month=str_replace(Month, "^0",""))

将所有内容保持为字符值。如果希望值为数字,请使用

df %>% 
separate("Date", into=c("Day","Month","Year"), sep="/", convert=TRUE)

您的正则表达式模式是关闭的。使用这个版本:

df %>% extract(Date, c("Day", "Month", "Year"), "(\d+)/(\d+)/(\d+)")

我们可以使用lubridate:

library(lubridate)
library(dplyr)
df %>% 
mutate(Date = dmy(Date), # if your Date column is character type
across(Date, funs(year, month, day)))
Date Date_year Date_month Date_day
1 2001-01-10      2001          1       10
2 2010-01-15      2010          1       15
3 2010-02-15      2010          2       15
4 2010-02-20      2010          2       20
5 2010-03-25      2010          3       25
6 2010-03-31      2010          3       31

我们可以从base R中使用read.table

read.table(text = df$Date, sep="/", header = FALSE, 
col.names = c("Day", "Month", "Year"))
Day Month Year
1  10     1 2001
2  15     1 2010
3  15     2 2010
4  20     2 2010
5  25     3 2010
6  31     3 2010

相关内容

最新更新