IBM 1600/6250大型机EBCDIC和ASCII格式数据导入R.



是否有人成功将TX RRC的佣金数据导入R?

如果有,是怎么做的?我下载的数据集是ASCII格式集之一。

我从UIC数据集中获得了1000条记录的样本到r中。前两个字节是记录类型。据报道,每条记录有622个字符,但我发现在ASCII文件中情况并非如此。每个记录类型(KEY)都有唯一的字段。

它看起来像是一系列记录的开始,包括一个井文件,以"&;01&;"开头,后面是几行固定宽度格式的信息,井文件以下一个以"&;01&;"开头的开始结束。

仅供参考,我发现LaF包在这个上是最有效的。

这些ASCII文件没有列标头,字段由"}"字符分隔。至少我查过这两个。

下面的例子是在Oil &气田数据,排天然气年度报告气田表。我选择了文件"gsf384b",因为它是最小的8.03 KB。

注意baseread.table抛出错误,数据被readr::read_delim读入。

d1 <- "~/so_temp"
d2 <- "documents_20230401"
path <- file.path(d1, d2)
fl <- list.files(path, full.names = TRUE)
df1 <- read.table(fl, sep = "}")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 71 did not have 31 elements
df1 <- readr::read_delim(fl, delim = "}", col_names = FALSE)
#> Rows: 91 Columns: 31
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "}"
#> chr  (9): X2, X3, X4, X5, X6, X7, X8, X16, X17
#> dbl  (3): X1, X9, X10
#> num  (3): X12, X13, X15
#> lgl (16): X11, X14, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
readr::spec(df1)
#> cols(
#>   X1 = col_double(),
#>   X2 = col_character(),
#>   X3 = col_character(),
#>   X4 = col_character(),
#>   X5 = col_character(),
#>   X6 = col_character(),
#>   X7 = col_character(),
#>   X8 = col_character(),
#>   X9 = col_double(),
#>   X10 = col_double(),
#>   X11 = col_logical(),
#>   X12 = col_number(),
#>   X13 = col_number(),
#>   X14 = col_logical(),
#>   X15 = col_number(),
#>   X16 = col_character(),
#>   X17 = col_character(),
#>   X18 = col_logical(),
#>   X19 = col_logical(),
#>   X20 = col_logical(),
#>   X21 = col_logical(),
#>   X22 = col_logical(),
#>   X23 = col_logical(),
#>   X24 = col_logical(),
#>   X25 = col_logical(),
#>   X26 = col_logical(),
#>   X27 = col_logical(),
#>   X28 = col_logical(),
#>   X29 = col_logical(),
#>   X30 = col_logical(),
#>   X31 = col_logical()
#> )
# columns with all NA
i_na <- which(sapply(df1, (x) all(is.na(x))))
# remove those columns and print the data
df1[-i_na]
#> # A tibble: 91 × 15
#>       X1 X2    X3    X4          X5    X6    X7    X8       X9   X10   X12   X13
#>    <dbl> <chr> <chr> <chr>       <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#>  1  2022 8A    <NA>  WILDCAT     N/A   M     N/A   N/A       5     0     0     0
#>  2  2022 8A    <NA>  ACKERLY (D… MART… <NA>  11-2… 8419      1     0     0     0
#>  3  2022 8A    <NA>  ALAN JOHNS… COTT… <NA>  2-13… 3684      1     0     0     0
#>  4  2022 8A    <NA>  ANNE TANDY… KING  <NA>  10-0… 5106      0     0     0     0
#>  5  2022 8A    <NA>  ARICK (YAT… FLOYD <NA>  9-04… 1345      0     0     0     0
#>  6  2022 8A    <NA>  ARMSTRONG … COTT… <NA>  6-25… 6248      0     0     0     0
#>  7  2022 8A    <NA>  BALE (YATE… GAIN… <NA>  2-13… 3422      0     0     0     0
#>  8  2022 8A    <NA>  BECKER (YA… TERRY <NA>  2-03… 3128      3     0     0     0
#>  9  2022 8A    <NA>  BIRNIE (CO… MOTL… <NA>  8-26… 8457      2     0     0     0
#> 10  2022 8A    <NA>  BIRNIE (ST… MOTL… <NA>  11-0… 8527      0     0     0     0
#> # … with 81 more rows, and 3 more variables: X15 <dbl>, X16 <chr>, X17 <chr>

创建于2023-04-01 with reprex v2.0.2

最新更新