替换有0行,数据有25个错误



我正在尝试制作一个包含25个不同密码的列表,以对照另一个包含50个密码的列表进行检查,然后返回匹配项。这是一个关于密码的大学项目。这个想法是25个是最常用的密码,我想让R告诉我我的50个密码中哪一个与最常用的25个匹配。然而,我不断收到以下错误:

Error in $<-.data.frame(*tmp*, "Percent", value = character(0)) :
replacement has 0 rows, data has 25

我正在使用以下代码

makeCounts <- function(x) {
return(x=list("count"=sum(grepl(x, Final_DF$pswd, ignore.case=TRUE))))  
}
#creates a local variable named tmp which is removed afterwards
printCounts <- function(ct) {
tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF$Pswd) * 100)))
print(tmp[order(-tmp$Count),], row.names=FALSE)
}
# create top 25 mostly commonly used pswds
worst.pass <- c("password", "123456", "12345678", "qwerty", "abc123", 
"monkey", "1234567", "Qwertyuiop", "123", "dragon", 
"000000", "1111111", "iloveyou", "1234", "12345", 
"1234567890", "1q2w3e4r5t", "ashely", "shadow", "123123", 
"654321", "superman", "sunshine", "tinkle", "football")
worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
printCounts(worst.ct)

包含我的50个密码的数据包含在我的数据帧Final_DF$Pswd中,如下

> Final_DF$Pswd
[1] "monkey"       "iloveyou"     "dragon"       "jbI2pnK$xi"   "password"     "computer"     "!qessw"      
[8] "tUNh&SSm6!"   "sunshine"     "wYrUeWV"      "superman"     "samsung"      "utoXGe6$"     "master"      
[15] "wjZC&OvXX"    "0R1cNTm9sGir" "Fbuu2bs89?"   "pokemon"      "secret"       "x&W1TjO59"    "buster"      
[22] "purple"       "shine"        "flower"       "marina"       "Tg%OQT$0"     "SbDUV&nOX"    "peanut"      
[29] "angel"        "?1LOEc4Zfk"   "computer"     "spiderman"    "nothing"      "$M6LgmQgv$"   "orange"      
[36] "knight"       "american"     "outback"      "TfuRpt3PiZ"   "air"          "surf"         "lEi2a$$eyz"  
[43] "date"         "V$683rx$p"    "newcastle"    "estate"       "foxy"         "ginger"       "coffee"      
[50] "legs" 

当我运行printCounts(worst.ct)读取时显示错误的回溯

Error in `$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) : 
replacement has 0 rows, data has 25 
4.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
"replacement has %d rows, data has %d"), N, nrows), domain = NA) 
3.
`$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) 
2.
`$<-`(`*tmp*`, "Percent", value = character(0)) 
1.
printCounts(worst.ct) 

我读了一些论坛帖子,我不确定这是否与NA价值观有关?我是R的新手,看着这个已经有一段时间了。

有人能告诉我哪里出了问题吗?

> dput(Final_DF)
structure(list(gender = c("female", "male", "male", "female", 
"female", "male", "male", "male", "male", "female", "male", "male", 
"female", "female", "female", "female", "male", "female", "male", 
"male", "female", "female", "female", "female", "female", "female", 
"male", "female", "female", "female", "female", "female", "female", 
"female", "male", "male", "female", "female", "male", "female", 
"female", "male", "female", "female", "male", "male", "male", 
"male", "male", "male"), age = structure(c(47L, 43L, 65L, 24L, 
44L, 60L, 26L, 25L, 62L, 23L, 44L, 61L, 27L, 47L, 18L, 23L, 34L, 
77L, 71L, 19L, 64L, 61L, 22L, 55L, 45L, 29L, 21L, 64L, 43L, 20L, 
32L, 55L, 68L, 21L, 81L, 43L, 63L, 72L, 38L, 20L, 66L, 39L, 64L, 
20L, 73L, 21L, 53L, 75L, 69L, 82L), class = c("variable", "integer"
), varname = "Age"), web_browser = structure(c(1L, 1L, 4L, 1L, 
3L, 3L, 2L, 1L, 4L, 1L, 1L, 1L, 3L, 4L, 1L, 2L, 1L, 3L, 3L, 2L, 
1L, 1L, 1L, 3L, 4L, 3L, 4L, 4L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 
1L, 2L, 3L, 4L, 2L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 4L, 1L), .Label = c("Chrome", 
"Internet Explorer", "Firefox", "Netscape"), class = c("variable", 
"factor"), varname = "Browser"), Pswd = c("monkey", "iloveyou", 
"dragon", "jbI2pnK$xi", "password", "computer", "!qessw", "tUNh&SSm6!", 
"sunshine", "wYrUeWV", "superman", "samsung", "utoXGe6$", "master", 
"wjZC&OvXX", "0R1cNTm9sGir", "Fbuu2bs89?", "pokemon", "secret", 
"x&W1TjO59", "buster", "purple", "shine", "flower", "marina", 
"Tg%OQT$0", "SbDUV&nOX", "peanut", "angel", "?1LOEc4Zfk", "computer", 
"spiderman", "nothing", "$M6LgmQgv$", "orange", "knight", "american", 
"outback", "TfuRpt3PiZ", "air", "surf", "lEi2a$$eyz", "date", 
"V$683rx$p", "newcastle", "estate", "foxy", "ginger", "coffee", 
"legs"), pswd_length = c(6L, 8L, 6L, 10L, 8L, 8L, 6L, 10L, 8L, 
7L, 8L, 7L, 8L, 6L, 9L, 12L, 10L, 7L, 6L, 9L, 6L, 6L, 5L, 6L, 
6L, 8L, 9L, 6L, 5L, 10L, 8L, 9L, 7L, 10L, 6L, 6L, 8L, 7L, 10L, 
3L, 4L, 10L, 4L, 9L, 9L, 6L, 4L, 6L, 6L, 4L), last.num = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 9, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA)), row.names = c(NA, -50L), class = "data.frame")

函数中出现了一些错误。

  1. makeCounts引用pswd,但Final_DF具有Pswdpswd_length。R正在为进行部分匹配,我猜它不是你想要的。让我们首先通过设置选项[1]:来证明它在使用什么

    options(warnPartialMatchDollar = TRUE) # see ?options
    worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
    # Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    ### ...repeated...
    

    更糟糕的是,如果你查看这个变量(解决问题的一部分是检查你正在制作和使用的变量(,你会发现它实际上是空的/无用的,其中所有值都是0:

    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 0
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    

    如果您更改函数以使用正确的列名,它不会提供这样的警告,并且它确实包含一些非零元素:

    makeCounts <- function(x) {
    return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))  
    }
    table(unlist(worst.ct))
    #  0  1 
    # 19  6 
    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 1
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    
  2. printCounts函数中,您引用的是nrow(Final_DF$Pswd),它总是会生成NULL。你试过这个吗?

    nrow(Final_DF$Pswd)
    # NULL
    nrow(Final_DF)
    # [1] 50
    

    相反,将该行重写为

    tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
    
  3. 这不是语法错误,但你的函数依赖于一个既没有在其中定义也没有传递给它的变量是一种糟糕的做法:这意味着当相同的参数传递给它时,函数可能会表现得不同,这会破坏再现性(这会使故障排除变得相当困难(。

    我建议将Final_DF作为函数的参数,并每次传递它。

    printCounts <- function(ct, Final_DF) {
    tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
    tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
    print(tmp[order(-tmp$Count),], row.names=FALSE)
    }
    printCounts(worst.ct)
    # Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default
    printCounts(worst.ct, Final_DF) # no error here
    

    对于这种情况,我建议您不要为其提供默认值。这也使您能够使用相同的功能和不同的"最后的";密码帧,以防您正在测试(单元测试(或测试(训练/测试采样(或测试。

在这些更改之后,我得到了这个:

printCounts(worst.ct, Final_DF)
#        Term Count Percent
#    password     1   2.00%
#      monkey     1   2.00%
#      dragon     1   2.00%
#    iloveyou     1   2.00%
#    superman     1   2.00%
#    sunshine     1   2.00%
#      123456     0   0.00%
#    12345678     0   0.00%
#      qwerty     0   0.00%
#      abc123     0   0.00%
#     1234567     0   0.00%
#  Qwertyuiop     0   0.00%
#         123     0   0.00%
#      000000     0   0.00%
#     1111111     0   0.00%
#        1234     0   0.00%
#       12345     0   0.00%
#  1234567890     0   0.00%
#  1q2w3e4r5t     0   0.00%
#      ashely     0   0.00%
#      shadow     0   0.00%
#      123123     0   0.00%
#      654321     0   0.00%
#      tinkle     0   0.00%
#    football     0   0.00%

注:

  1. 我在~/.Rprofile(以及任何特定于项目的.Rprofile初始化文件(中设置了options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE),原因就是:$静默地进行部分匹配,这可能会非常有问题。有了警告,至少您可以看到R在后台推断的内容。还有第三个选项,warnPartialMatchArgs,具有相同的意图。。。但是,太多的包作者无意中依赖于这种行为,所以由于缺乏时间/能力来解决所有问题,我选择了抑制这种噪音。

    特别是如果这种部分匹配行为让你感到惊讶,我强烈建议你自己设置前两个选项。在最好的情况下,它不会产生任何警告,并且您可以放心地知道您正在采取措施来生成更具弹性的代码;最坏的情况是,它很嘈杂,您最终会厌倦这种嘈杂,并修复懒惰的代码。

    请参阅?options了解这三个选项以及其他许多可用选项。(软件包也可以设置自己的选项;无论好坏,一个选项在概念上与Windows的注册表相似,因为它对R是全局的,并且可以有任意的键和值。(

如果你只想检查一个(一组(密码是否在一组坏密码中,你可以使用

Final_DF$Pswd %in% worst.pass

这将为您提供一个向量TRUEFALSE。您可以运行sum(Final_DF$Pswd %in% worst.pass)来获取错误密码匹配的总数,或者运行table(Final_DF$Pswd[Final_DF$Pswd %in% worst.pass])来快速查看匹配情况。

然而,如果你的意图是检查一个不断添加密码的集合(我猜这是你的意图,因为你制作了这些函数(,以下可能会有用:

result <- c()
for (i in 1:length(Final_DF$Pswd)) {
if (Final_DF$Pswd[i] %in% worst.pass) {
result[i] <- which(worst.pass == Final_DF$Pswd[i])
} else
result[i] <- NA
}
table(worst.pass[result[!is.na(result)]])

结果是一个包含匹配计数的表。在您的情况下,

dragon iloveyou   monkey password sunshine superman 
1        1        1        1        1        1 

请注意,对于大量的密码,循环是不可取的。在这种情况下,整洁的tidyverse方法将值得一看

最新更新