-
-
Notifications
You must be signed in to change notification settings - Fork 358
DataFrameSuiteBase
DataFrameSuiteBase
enables you to check if two DataFrames are equal. It also provides an easy way to get SparkContext
and sqlContext
. SparkContext
and SqlContext
are initialized before all testcases, So you can access them inside any test case.
For Java users the same functionality is supported by JavaDataFrameSuiteBase
.
You can assert the DataFrames equality using method assertDataFrameEquals
.
Additional Requirements
In early version of spark-testing-base
the spark-hive
dependency was marked as provided
, so you may need to add the spark-hive
package to your build if you are doing DataFrame tests (note that it can be included in just the test scope).
Example:
class test extends FunSuite with DataFrameSuiteBase {
test("simple test") {
val sqlCtx = sqlContext
import sqlCtx.implicits._
val input1 = sc.parallelize(List(1, 2, 3)).toDF
assertDataFrameEquals(input1, input1) // equal
val input2 = sc.parallelize(List(4, 5, 6)).toDF
intercept[org.scalatest.exceptions.TestFailedException] {
assertDataFrameEquals(input1, input2) // not equal
}
}
}
When DataFrames contains doubles, you can compare them with acceptable tolerance for ex. (5 == 4.999)
. You can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals
Example:
class test extends FunSuite with DataFrameSuiteBase {
test("simple test") {
val sqlCtx = sqlContext
import sqlCtx.implicits._
val input1 = sc.parallelize(List[(Int, Double)]((1, 1.1), (2, 2.2), (3, 3.3))).toDF
val input2 = sc.parallelize(List[(Int, Double)]((1, 1.2), (2, 2.3), (3, 3.4))).toDF
assertDataFrameApproximateEquals(input1, input2, 0.11) // equal
intercept[org.scalatest.exceptions.TestFailedException] {
assertDataFrameApproximateEquals(input1, input2, 0.05) // not equal
}
}
}