Skip to content
Holden Karau edited this page Aug 22, 2017 · 10 revisions

DataFrameSuiteBase enables you to check if two DataFrames are equal. It also provides an easy way to get SparkContext and sqlContext. SparkContext and SqlContext are initialized before all testcases, So you can access them inside any test case.

For Java users the same functionality is supported by JavaDataFrameSuiteBase.

You can assert the DataFrames equality using method assertDataFrameEquals.

Additional Requirements

In early version of spark-testing-base the spark-hive dependency was marked as provided, so you may need to add the spark-hive package to your build if you are doing DataFrame tests (note that it can be included in just the test scope).

Example:

class test extends FunSuite with DataFrameSuiteBase {
  test("simple test") {
    val sqlCtx = sqlContext
    import sqlCtx.implicits._

    val input1 = sc.parallelize(List(1, 2, 3)).toDF
    assertDataFrameEquals(input1, input1) // equal

    val input2 = sc.parallelize(List(4, 5, 6)).toDF
    intercept[org.scalatest.exceptions.TestFailedException] {
        assertDataFrameEquals(input1, input2) // not equal
    }
  }
}

When DataFrames contains doubles, you can compare them with acceptable tolerance for ex. (5 == 4.999). You can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals

Example:

class test extends FunSuite with DataFrameSuiteBase {
  test("simple test") {
    val sqlCtx = sqlContext
    import sqlCtx.implicits._

    val input1 = sc.parallelize(List[(Int, Double)]((1, 1.1), (2, 2.2), (3, 3.3))).toDF
    val input2 = sc.parallelize(List[(Int, Double)]((1, 1.2), (2, 2.3), (3, 3.4))).toDF
    assertDataFrameApproximateEquals(input1, input2, 0.11) // equal

    intercept[org.scalatest.exceptions.TestFailedException] {
      assertDataFrameApproximateEquals(input1, input2, 0.05) // not equal
    }
  }
}
Clone this wiki locally