Scala测试:如何在没有硬编码的情况下安全、干净地断言长度异常消息?



我有以下代码,用于(sha) spark数据框架中的哈希列:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}
object hashing {
def process(hashFieldNames: List[String])(df: DataFrame) = {
hashFieldNames.foldLeft(df) { case (df, hashField) =>
df.withColumn(hashField, sha2(col(hashField), 256))
}
}
}

现在在一个单独的文件中,我正在使用AnyWordSpec测试测试我的hashing.process,如下所示:

"The hashing .process " should {
// some cases here that complete succesfully 
"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)

val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
val thrown = intercept[org.apache.spark.sql.AnalysisException] {
val hashedResultDf =
hashing.process(hashFieldNames)(badDf) 

}
assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety. 

通常,据我所知,人们会想要硬编码整个错误消息,以确保它确实是我们所期望的。但是,这条消息很长,我想知道是否没有更好的方法。

基本上,我有两个问题:

)。只匹配错误消息的开头部分是否被认为是一种好做法用正则表达式跟进?我的想法是这样的:thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" + regexpattern ;(.*))

b。)如果a.)被认为是一种拙劣的方法,你对如何正确地做到这一点有什么有效的建议吗?

注意:上面的代码可能出现小错误,我为SO post调整了它。但是你应该明白。

好,回答我自己的问题。我现在是这样解决的:

"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)

val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
//val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
IngestionHashing.process(hashFieldNames)(badDf)  

}
thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}

为潜在的建议/改进留下这个帖子。根据https://github.com/databricks/scala-style-guide#intercepting-exceptions的说法,解决方案仍然不理想。

您不应该断言异常消息(除非它们被呈现给用户,或者下游依赖于它们)。如果抛出异常是合约的一部分,那么您应该抛出带有给定错误代码的特定类型的异常,并且测试应该断言这一点。如果不是,谁在乎留言说了什么?