The missing SparklyR EDA toolkit (for use in R). Quick, efficient, and easy to use.
Wrap and graph the outputs from the SparkEDA package for Spark.
You will need a few things to get started.
-
Grab the SparkEDA Jar from my website
-
Download this repository to your R home or other location Note: You will need devtools installed in R (as this package is not yet up on CRAN
install.packages("devtools")
library("devtools")
Then
devtools::install_github('GabeChurch/sparkedatools')
- Edit your SparklyR Configuration (in R)
You need to add the SparkEDA jar for the package to work in R.
conf = spark_config()
This is the important line
#This is the configuration option
conf$'sparklyr.jars.default'= "/system/path/to/sparkeda_2.11-2.07.jar"
sc = spark_connect(master = "yarn-client", config = conf, version = '2.3.2')
The ORDER IS IMPORTANT. Must be BEFORE you have connected and AFTER you have instantiated your spark_config in R.
- Enjoy being able to visualize and understand your giant data-sets like never before!