需要帮助将 C# "Byte Math"转换为 Python



我们有一个旧的自定义C#哈希算法,用于为PII目的屏蔽电子邮件地址。我正在尝试构建该算法的Python版本,但我很难处理C#和Python处理字节/字节数组的方式的差异,从而产生错误的哈希值。作为参考,这是Python 2.7,但Python 3+解决方案也同样适用。

C#代码:

using System.Text;
using System.Security;
using System.Security.Cryptography;
public class Program
{
public static void Main()
{
string emailAddressStr = "my@email.com";
emailAddressStr = emailAddressStr.Trim().ToLower();
SHA256 objCrypt = new SHA256Managed();
byte[] b = (new ASCIIEncoding()).GetBytes(emailAddressStr);
byte[] bRet = objCrypt.ComputeHash(b);
string retStr = "";
byte c;
for (int i = 0; i < bRet.Length; i++)
{
c = (byte)bRet[i];
retStr += ((char)(c / 10 + 97)).ToString().ToLower();
retStr += ((char)(c % 10 + 97)).ToString().ToLower();
}
Console.WriteLine(retStr);
}
}

返回的(正确的(值是uhgbnaijlgchcfqcrgpicdvczapepbtifiwagitbecjfqalhufudieofyfdhzera

Python翻译:

import hashlib
emltst = "my@email.com"
emltst = emltst.strip().lower()
b = bytearray(bytes(emltst).encode("ascii"))
bRet = bytearray(bytes(hashlib.sha256(b)))
emailhash=""
for i in bRet:
c = bytes(i)
emailhash = emailhash + str(chr((i / 10) + 97)).lower()
emailhash = emailhash + str(chr((i % 10) + 97)).lower()
print(emailhash)

我在这里得到的(不正确的(值是galfkejhfafdfedchcgfidhcdclbjikgkbjjlgdcgedceimaejeifakajhfekceifggc

代码的"业务端"处于循环中,c无法很好地在语言之间进行转换。C#为计算生成一个数值,但在Python中,c是一个字符串(所以我使用i(。我已经遍历了这两组代码,我知道我在循环之前生成了相同的哈希值。我希望这里有人能帮我。TIA!

编辑(2020-04-09(

Oguz-Ozgul有一个很好的解决方案。我在工作中找到了一位精明的程序员,他提出了这个可行的Python 3解决方案(其中包含了更广泛的解决方案的代码,即接收电子邮件列表并使用PySpark编写表格(:

myfile=sys.argv[1]
with open(myfile) as fql:
insql=fql.read()
emails=[]
emails=insql.splitlines()
mytable=sys.argv[2]
def getSha256Hash(email):
b = bytearray(bytes(email, 'ascii'))
res = hashlib.sha256(b)
bRet = bytearray.fromhex(res.hexdigest())
emailhash=""
for i in bRet:
c1 = i / 10 + 97
c2 = i % 10 + 97
c1 = int(c1)
c2 = int(c2)
emailhash = emailhash + str(chr(c1)).lower()
emailhash = emailhash + str(chr(c2)).lower()
return(emailhash)
###################################
emailhashes = []
isascii = lambda s: len(s) == len(s.encode())
for e in emails:
e = e.strip().lower()
if isascii(e) == True:
emailhashret = getSha256Hash(e)
emailhashes.append(emailhashret)
findf = spark.createDataFrame(emailhashes, StringType())
spark.sql("SET spark.sql.hive.convertMetastoreParquet=false")
findf.repartition(1).write.format("parquet").mode("overwrite").saveAsTable(mytable)

到此为止(python 3.0(

注:

  1. hashAlgorithm.update需要编码字符串,因此为b"my@email.com">
  2. chr((i/10(+97失败,返回"expect-int-found float",因此//

import hashlib
emltst = b"my@email.com"
emltst = emltst.strip().lower()
hashAlgorithm = hashlib.sha256()
hashAlgorithm.update(emltst)
# Thanks to Mark Meyer for pointing out.
# bytearray(bytes( are redundant
bRet = hashAlgorithm.digest()
emailhash=""
for i in bRet:
c = bytes(i)
emailhash = emailhash + str(chr((i // 10) + 97)).lower()
emailhash = emailhash + str(chr((i % 10) + 97)).lower()
print(emailhash)

输出:

uhgbnaijlgchcfqcrgpicdvczapepbtifiwagitbecjfqalhufudieofyfdhzera                                                      

相关内容

最新更新