Skip to content
Marcellus Tavares edited this page Jan 3, 2018 · 6 revisions

DatasetSuiteBase enables you to check if two Datasets are equal. It also provides an easy way to get SparkContext and sqlContext. DatasetSuiteBase also extends DataFrameSuiteBase you can use it to check for DataFrame equality. SparkContext and SqlContext are initialized before all testcases, So you can access them inside any test case.

For Java users the same functionality is supported by JavaDatasetSuiteBase.

You can assert the Datasets equality using method assertDatasetEquals. This method could be customized by overriding equals method for the given class type.

Example:

class test extends FunSuite with DatasetSuiteBase {
  test("simple test") {
    val sqlCtx = sqlContext
    import sqlCtx.implicits._

    val input1 = sc.parallelize(List(1, 2, 3)).toDS
    assertDatasetEquals(input1, input1) // equal

    val input2 = sc.parallelize(List(4, 5, 6)).toDS
    intercept[org.scalatest.exceptions.TestFailedException] {
        assertDatasetEquals(input1, input2) // not equal
    }
  }
}

When Datasets contains doubles, you can compare them with acceptable tolerance for ex. (5 == 4.999). You can assert that the Datasets approximately equal using method assertDatasetApproximateEquals.

Example:

class test extends FunSuite with DatasetSuiteBase {
  test("simple test") {
    val sqlCtx = sqlContext
    import sqlCtx.implicits._

    val input1 = sc.parallelize(List[(Int, Double)]((1, 1.1), (2, 2.2), (3, 3.3))).toDS
    val input2 = sc.parallelize(List[(Int, Double)]((1, 1.2), (2, 2.3), (3, 3.4))).toDS
    assertDatasetApproximateEquals(input1, input2, 0.11) // equal

    intercept[org.scalatest.exceptions.TestFailedException] {
      assertDatasetApproximateEquals(input1, input2, 0.05) // not equal
    }
  }
}
Clone this wiki locally