如何在 java 中使用信用卡正则表达式提高屏蔽方法的性能



>我有这个函数来识别输入字符串中的正则表达式并屏蔽它而不带最后 4 位数字:

public CharSequence obfuscate(CharSequence data) {
String[] result = data.toString().replaceAll("[^a-zA-Z0-9-_*]", " ").trim().replaceAll(" +", " ").split(" ");
for(String str : result){
String originalString = str;
String cleanString = str.replaceAll("[-_]","");
CardType cardType = CardType.detect(cleanString);
if(!CardType.UNKNOWN.equals(cardType)){
String maskedReplacement = maskWithoutLast4Digits(cleanString ,replacement);
data = data.toString().replace(originalString , maskedReplacement);
}
}
return data;
}
static String maskWithoutLast4Digits(String input , String replacement) {
if(input.length() < 4){
return input;
}
return input.replaceAll(".(?=.{4})", replacement);
}

模式枚举

public enum CardType {
UNKNOWN,
VISA("^4[0-9]{12}(?:[0-9]{3}){0,2}$"),
MASTERCARD("^(?:5[1-5]|2(?!2([01]|20)|7(2[1-9]|3))[2-7])\d{14}$"),
AMERICAN_EXPRESS("^3[47][0-9]{13}$"),
DINERS_CLUB("^3(?:0[0-5]|[68][0-9])[0-9]{11}$"),
DISCOVER("^6(?:011|[45][0-9]{2})[0-9]{12}$");
private Pattern pattern;
CardType() {
this.pattern = null;
}
CardType(String pattern) {
this.pattern = Pattern.compile(pattern);
}
public static CardType detect(String cardNumber) {
for (CardType cardType : CardType.values()) {
if (null == cardType.pattern) continue;
if (cardType.pattern.matcher(cardNumber).matches()) return cardType;
}
return UNKNOWN;
}

public Pattern getPattern() {
return pattern;
}
}

输入 1: "有效的美国运通卡:371449635398431"。

输出 1: "有效的美国运通卡:**8431">

输入2: "无效的信用卡:1234222222222"//不执行任何信用卡模式

输出2: "无效的信用卡:1234222222222">

输入3: "带有垃圾字符的有效美国运通卡:<3714-4963-5398-431>">

输出: "带有垃圾字符的有效美国运通卡:<****8431>">

这不是进行屏蔽的最佳方法,因为将为大型HTML中的每个标签和大型文本文件中的每个行调用此方法。 我如何提高此方法的性能

这篇文章完全基于上面答案中的评论,特别是OP的评论:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

如果您倾向于正则表达式,并且想从特定字符串中检索电话号码和/或信用卡号,则可以使用以下 Java 正则表达式:

String regex = String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"
+ "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"       // Phone Numbers
+ "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";   // Credit Cards

要使用此正则表达式字符串,您需要通过模式/匹配器机制运行它,例如:

String strg = "Valid Phone #: <+1 (212) 555-3456> - "
+ "Valid American Express card 24 with garbage 33.6 characters: <3714-4963-5398-431>";
final java.util.List<String> numbers = new java.util.ArrayList<>();
final String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"       // Phone Numbers
+ "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"  // Credit Cards
+ "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";
final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); // the regex
final java.util.regex.Matcher matcher = pattern.matcher(strg); // your string
while (matcher.find()) { 
numbers.add(matcher.group()); 
}

for (String str : numbers) {
System.out.println(str);
}

使用上面提供的字符串,控制台窗口将显示:

+1 (212) 555-3456
3714-4963-5398-431

请考虑这些原始电话号码和信用卡号子字符串。将这些字符串放入重复变量中,如origPhoneNumorigcreditCardNum。现在验证数字。您已经提供了用于验证上一个答案中的信用卡号的工具。这是一个验证电话号码的方法:

public static boolean isValidPhoneNumber(String phoneNumber) {
return phoneNumber.matches("^(?!\b(0)\1+\b)(\+?\d{1,3}[. -]?)?"
+ "\(?\d{3}\)?([. -]?)\d{3}\3\d{4}$");
}

我已经成功地针对来自许多不同国家的电话号码以许多不同的格式测试了上面提供的正则表达式字符串。它还针对许多不同格式的许多不同信用卡号进行了测试,再次取得了成功。当然,总会有一些格式可能会导致特定问题,因为在数据生成源头的数字条目显然没有规则。

以我在这篇文章顶部显示的评论行为例:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

无法区分哪个号码应该是电话号码,哪个应该是信用卡号,除非它像上面的字符串那样在字符串中明确说明文本。明天或下周可能不会,因为这里看起来没有任何数据输入规则在起作用。

该字符串表示12345678的电话号码,即 8 位数字。该字符串还指示1234567890的信用卡号。在国际上,电话号码的范围可以从 9 位到 13 位不等,具体取决于国家/地区。在本地,位数范围将再次变小,具体取决于国家/地区。由于电话号码(国际)具有如此多的数字范围,因此无法知道被视为信用卡号的数字实际上是信用卡号,除非字符串在号码之前或之后告诉您。如果有的话,它将在下一个输入字符串中。?

为此,我留给你决定如何处理这种情况,但不管它是什么,不要指望它的速度很快。就像我在上一个答案的开头写的那样:

Wouldn't it be nice if all validations were done before the card numbers
went into the database (or data files).
>编辑:根据您在先前答案下的最新评论:

我做了一个小演示:

// Place this code into a method or event somewhere...
String inputString = "my phone number is +54 123 344-4567 and CC 2222 4053 4324 8877 bla bla bla";
System.out.println("Input:  " + inputString);
System.out.println();
final java.util.List<String> numbers = new java.util.ArrayList<>();

final String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"       // Phone Numbers
+ "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"  // Credit Cards
+ "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";
final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
final java.util.regex.Matcher matcher = pattern.matcher(inputString); 
while (matcher.find()) { 
numbers.add(matcher.group()); 
}

String outputString = inputString;

for (String str : numbers) {
//System.out.println(str);  // Uncomment for testing.
// Is substring a valid Phone Number?
int len = str.replaceAll("\D","").length();  // Crushed number length
if (isValidPhoneNumber(str)) {
outputString = outputString.replace(str, maskAllExceptLast(str, 3, "x"));
}
else if (isValidCreditCardNumber(str)) {
outputString = outputString.replace(str, 
maskAllExceptLast(str.replaceAll("\D",""), 4, "*"));
}
}
System.out.println("Output: " + outputString);

支持方法....

public static String maskAllExceptLast (String inputString, int exceptLast_N, String... maskCharacter) {
if(inputString.length() < exceptLast_N){
return inputString;
}
String mask = "*";  // Default mask character.
if (maskCharacter.length > 0) {
mask = maskCharacter[0];
}
return inputString.replaceAll(".(?=.{" + exceptLast_N + "})", mask);
}
/**
* Method to validate a supplied phone number. Currently validates phone
* numbers supplied in the following fashion:
* <pre>
*
*      Phone number 1234567890 validation result: true
*      Phone number 123-456-7890 validation result: true
*      Phone number 123-456-7890 x1234 validation result: true
*      Phone number 123-456-7890 ext1234 validation result: true
*      Phone number (123)-456-7890 validation result: true
*      Phone number 123.456.7890 validation result: true
*      Phone number 123 456 7890 validation result: true
*      Phone number 01 123 456 7890 validation result: true
*      Phone number 1 123-456-7890 validation result: true
*      Phone number 1-123-456-7890 validation result: true</pre>
*
* @param phoneNumber (String) The phone number to check.<br>
*
* @return (boolean) True is returned if the supplied phone number is valid.
*         False if it isn't.
*/
public static boolean isValidPhoneNumber(String phoneNumber) {
boolean isValid = false;
long len = phoneNumber.replaceAll("\D","").length(); // Crush the phone Number into only digits
// Check phone Number's length range. Must be from 8 to 12 digits long
if (len < 8 || len > 12) {
return false;
}
// Validate phone numbers of format "xxxxxxxx to xxxxxxxxxxxx"
else if (phoneNumber.matches("\d+")) {
isValid = true;
}
//validating phone number with -, . or spaces
else if (phoneNumber.matches("^(\+\d{1,3}( )?)?((\(\d{1,3}\))|\d{1,3})[- .]?\d{3,4}[- .]?\d{4}$")) {
isValid = true;
}
/* Validating phone number with -, . or spaces and long distance prefix.
This regex also ensures:
- The actual number (withoug LD prefix) should be 10 digits only.
- For North American, numbers with area code may be surrounded 
with parentheses ().
- The country code can be 1 to 3 digits long. Optionally may be 
preceded by a + sign.
- There may be dashes, spaces, dots or no spaces between country 
code, area code and the rest of the number.
- A valid phone number cannot be all zeros.                 */
else if (phoneNumber.matches("^(?!\b(0)\1+\b)(\+?\d{1,3}[. -]?)?"
+ "\(?\d{3}\)?([. -]?)\d{3}\3\d{4}$")) {
isValid = true;
}
//validating phone number with extension length from 3 to 5
else if (phoneNumber.matches("\d{3}-\d{3}-\d{4}\s(x|(ext))\d{3,5}")) {
isValid = true;
} 
//validating phone number where area code is in braces ()
else if (phoneNumber.matches("^(\(\d{1,3}\)|\d{1,3})[- .]?\d{2,4}[- .]?\d{4}$")) {
isValid = true;
} 
//return false if nothing matches the input
else {
isValid = false;
}
return isValid;
}
/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'. First this method validates for a correct Card 
* Network Number. The supported networks are:<pre>
* 
*    Number            Card Network
*    ====================================
*      2               Mastercard (BIN 2-Series) This is NEW!!
*      30, 36, 38, 39  Diners-Club
*      34, 37          American Express
*      35              JBC
*      4               Visa
*      5               Mastercard
*      6               Discovery</pre><br>
* 
* Next, the overall Credit Card number is checked with the 'Luhn Algorithm' 
* for validity.<br>
*
* @param cardNumber (String)
*
* @return (Boolean) True if valid, false if not.
*/
public static boolean isValidCreditCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
// Strip card number of all non-digit characters.
cardNumber = cardNumber.replaceAll("\D", "");

long len = cardNumber.length();
if (len < 14 || len > 16) {   // Only going to 16 digits here 
return false;
}

// Validate Card Network
String[] cardNetworks = {"2", "30", "34", "35", "36", "37", "38", "39", "4", "5", "6"};
String cardNetNum = cardNumber.substring(0, (cardNumber.startsWith("3") ? 2 : 1));
boolean pass = false;
for (String netNum : cardNetworks) {
if (netNum.equals(cardNetNum)) {
pass = true;
break;
}
}
if (!pass) {
return false;  // Invalid Card Network
}
// Validate card number with the 'Luhn algorithm'.
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
nSum += d / 10;
nSum += d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}

上面的代码绝不会很快!

调整正则表达式或代码以满足您的特定需求

如果在卡号进入数据库(或数据文件)之前完成所有验证,那不是很好吗?

如果您想要的是速度,我不相信将 RegEx 用于代码的任何部分一定是最好的选择,因为处理正则表达式会消耗大量时间。例如,以在maskWithoutLast4Digits()方法中执行字符串掩码的行为例:

static String maskWithoutLast4Digits(String input, String replacement) {
if(input.length() <= 4){
return input;    // There is nothing to mask!
}
return input.replaceAll(".(?=.{4})", replacement);
}

并将其替换为以下代码:

static String maskWithoutLast4Digits(String input, String replacement) {
if (input.length() <= 4) {
return input; // There is nothing to mask!
}
char[] chars = input.toCharArray();
Arrays.fill(chars, 0, chars.length - 4, replacement);
return new String(chars);
}

您可能会发现,整个代码将在单个信用卡号字符串上执行任务,几乎是使用正则表达式的方法的两倍。这是一个相当大的差异。事实上,如果你通过探查器运行代码,你可能会发现,对于处理的每个字符串,带有正则表达式的方法可能会逐渐变慢,而第二种方法将使事情以更恒定的速度流动。

不同的信用卡基本上以特定的单个数字开头,除了少数卡,例如,如果信用卡号以 3 开头,那么它始终是美国运通、晚餐俱乐部或 Carte Blanche 支付网络的一部分。如果卡以 4 开头,则它是 Visa。以 5 开头的卡号是万事达卡的一部分,而以 6 开头的卡属于发现网络。

Card                   Starts With                   No. of Digits
==================================================================
American Express       can be 34 or usually 37       15
JBC                    35                            16
Diners Club            usually 36 or can be 38       14
VISA                   4                             16
Mastercard             5                             16
Discovery              6                             16

您不需要正则表达式来确定信用卡号是否以这些值中的任何一个开头,并且应该注意的是,某些卡不一定总是包含相同的数字(如果数字)。这可能取决于发卡机构,我相信你已经知道了,但绝不会少,作为Visa,万事达卡和发现支付网络一部分的信用卡有16位数字,而那些属于美国运通支付网络的信用卡只有15位数字。虽然信用卡最常见的是有 16 位数字,但它们可能只有 13 位,多达 19 位。我没有搜索过你的正则表达式,但我相信他们已经覆盖了(对吧?

要删除正则表达式的使用,您可以改用switch/case机制,例如:

// Demo card number...
String cardNumber = "371449635398431";

/* Remove all Characters other than digits. 
Don't want them for validation.      */
cardNumber = cardNumber.replaceAll("\D", ""); // Remove all Characters other than digits
String cardName;  // Used to store the card's name 
switch (cardNumber.substring(0, 1)) {
case "3":
String typeNum = cardNumber.substring(0, 2);
switch(typeNum) {
case "34": case "37":
cardName = "American-Express";
break;
case "35":
cardName = "JBC";
break;        
case "30": case "36": case "38": case "39":
cardName = "Diners-Club";
break;
default: 
cardName = "UNKNOWN";
}
break;
case "4":
cardName = "Visa";
break;
case "5":
cardName= "Mastercard";
break;
case "6":
cardName = "Discovery";
break;
default:
cardName = "UNKNOWN";
}

如果您要在此代码上运行速度测试,而不是遍历一堆正则表达式,我相信即使您还想检查每个case内处理的每个卡号的长度,您也会发现速度有了相当大的提高。

验证信用卡号的最佳方法是使用Luhn公式(也称为Luhn 算法),该公式基本上遵循以下方案:

  1. 首先将卡号的每个奇数位的值加倍 您正在验证。如果得到任何给定加倍的总和 操作大于 9(例如,7 x 2 = 14 或 9 x 2 = 18), 然后将该总和的数字相加(例如,14:1 + 4 = 5 或 18:1 + 8 = 9)。
  2. 现在将所有结果数字相加,包括偶数数字, 你没有乘以二。
  3. 如果您收到的总额以 0 结尾,则卡号有效 根据卢恩算法;否则无效。

当然,如果使用,整个过程可以放入一个方便的方法中,例如:

/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'.
*
* @param cardNumber (String)
*
* @return (Boolean)
*/
public static boolean isValidCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
cardNumber = cardNumber.replaceAll("\D", "");

// Luhn algorithm
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
// We add two digits to handle 
// cases that make two digits  
// after doubling 
nSum += d / 10;
nSum += d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}

要将所有这些放在一起,您的代码可能类似于以下内容:

public static String validateCreditCardNumber(String cardNumber) {
// Remove all Characters other than digits
cardNumber = cardNumber.replaceAll("\D", ""); // Remove all Characters other than digits
String cardName;  // Used to store the card's name 
switch (cardNumber.substring(0, 1)) {
case "3":
String typeNum = cardNumber.substring(0, 2);
switch(typeNum) {
case "34": case "37":
cardName = "American-Express";
break;
case "35":
cardName = "JBC";
break;        
case "30": case "36": case "38": case "39":
cardName = "Diners-Club";
break;
default: 
cardName = "UNKNOWN";
}
break;
case "4":
cardName = "Visa";
break;
case "5":
cardName= "Mastercard";
break;
case "6":
cardName = "Discovery";
break;
default:
cardName = "UNKNOWN";
}

if (!cardName.equals("UNKNOWN") && isValidCardNumber(cardNumber)) {
return ("The " + cardName + " card number (" + maskWithoutLast4Digits(cardNumber, '*') + ") is VALID!");
}
else {
return ("The " + cardName + " card number (" +  maskWithoutLast4Digits(cardNumber, '*') + ") is NOT VALID!");
}
}
public static String maskWithoutLast4Digits (String input, char replacement) {
if (input.length() <= 4) {
return input; // Nothing to mask
}
char[] buf = input.toCharArray();
Arrays.fill(buf, 0, buf.length - 4, replacement);
return new String(buf);
}
/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'.
*
* @param cardNumber (String)
*
* @return (Boolean)
*/
public static boolean isValidCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
cardNumber = cardNumber.replaceAll("\D", "");

// Luhn algorithm
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
// We add two digits to handle 
// cases that make two digits  
// after doubling 
nSum += d / 10;
nSum += d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}

并且基本上使用上述:

// Demo card number...
String cardNumber = "371449635398431";

String isItValid = validateCreditCardNumber(cardNumber);
System.out.println(isItValid);

输出到控制台将是:

The American-Express card number (***********8431) is VALID!

我不完全确定您的输出将流向何处,但最好在显示之前将其归档到某个地方,因为您将始终受到该过程的速度限制。此外,将数据分解为可管理的块并使用多个执行器服务线程来处理数据将大大提高速度,因为使用较新的JDK(高于Java8)之一并利用一些较新的方法。

最新更新