是否有人成功将TX RRC的佣金数据导入R?
如果有,是怎么做的?我下载的数据集是ASCII格式集之一。
我从UIC数据集中获得了1000条记录的样本到r中。前两个字节是记录类型。据报道,每条记录有622个字符,但我发现在ASCII文件中情况并非如此。每个记录类型(KEY)都有唯一的字段。
它看起来像是一系列记录的开始,包括一个井文件,以"&;01&;"开头,后面是几行固定宽度格式的信息,井文件以下一个以"&;01&;"开头的开始结束。
仅供参考,我发现LaF包在这个上是最有效的。
这些ASCII文件没有列标头,字段由"}"
字符分隔。至少我查过这两个。
下面的例子是在Oil &气田数据,排天然气年度报告气田表。我选择了文件"gsf384b"
,因为它是最小的8.03 KB。
注意baseread.table
抛出错误,数据被readr::read_delim
读入。
d1 <- "~/so_temp"
d2 <- "documents_20230401"
path <- file.path(d1, d2)
fl <- list.files(path, full.names = TRUE)
df1 <- read.table(fl, sep = "}")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 71 did not have 31 elements
df1 <- readr::read_delim(fl, delim = "}", col_names = FALSE)
#> Rows: 91 Columns: 31
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "}"
#> chr (9): X2, X3, X4, X5, X6, X7, X8, X16, X17
#> dbl (3): X1, X9, X10
#> num (3): X12, X13, X15
#> lgl (16): X11, X14, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
readr::spec(df1)
#> cols(
#> X1 = col_double(),
#> X2 = col_character(),
#> X3 = col_character(),
#> X4 = col_character(),
#> X5 = col_character(),
#> X6 = col_character(),
#> X7 = col_character(),
#> X8 = col_character(),
#> X9 = col_double(),
#> X10 = col_double(),
#> X11 = col_logical(),
#> X12 = col_number(),
#> X13 = col_number(),
#> X14 = col_logical(),
#> X15 = col_number(),
#> X16 = col_character(),
#> X17 = col_character(),
#> X18 = col_logical(),
#> X19 = col_logical(),
#> X20 = col_logical(),
#> X21 = col_logical(),
#> X22 = col_logical(),
#> X23 = col_logical(),
#> X24 = col_logical(),
#> X25 = col_logical(),
#> X26 = col_logical(),
#> X27 = col_logical(),
#> X28 = col_logical(),
#> X29 = col_logical(),
#> X30 = col_logical(),
#> X31 = col_logical()
#> )
# columns with all NA
i_na <- which(sapply(df1, (x) all(is.na(x))))
# remove those columns and print the data
df1[-i_na]
#> # A tibble: 91 × 15
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X12 X13
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2022 8A <NA> WILDCAT N/A M N/A N/A 5 0 0 0
#> 2 2022 8A <NA> ACKERLY (D… MART… <NA> 11-2… 8419 1 0 0 0
#> 3 2022 8A <NA> ALAN JOHNS… COTT… <NA> 2-13… 3684 1 0 0 0
#> 4 2022 8A <NA> ANNE TANDY… KING <NA> 10-0… 5106 0 0 0 0
#> 5 2022 8A <NA> ARICK (YAT… FLOYD <NA> 9-04… 1345 0 0 0 0
#> 6 2022 8A <NA> ARMSTRONG … COTT… <NA> 6-25… 6248 0 0 0 0
#> 7 2022 8A <NA> BALE (YATE… GAIN… <NA> 2-13… 3422 0 0 0 0
#> 8 2022 8A <NA> BECKER (YA… TERRY <NA> 2-03… 3128 3 0 0 0
#> 9 2022 8A <NA> BIRNIE (CO… MOTL… <NA> 8-26… 8457 2 0 0 0
#> 10 2022 8A <NA> BIRNIE (ST… MOTL… <NA> 11-0… 8527 0 0 0 0
#> # … with 81 more rows, and 3 more variables: X15 <dbl>, X16 <chr>, X17 <chr>
创建于2023-04-01 with reprex v2.0.2