prashan_pul #1

prashanC · 2016-10-03T02:24:40Z

No description provided.

… cleaning.

Improving documentation and identifying potential bug in CC calculation.

`sbt/sbt doc` used to fail. This fixed it.

Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.

The bug was due to a misunderstanding of the activeSetOpt parameter to Graph.mapReduceTriplets. Passing EdgeDirection.Both causes mapReduceTriplets to run only on edges with *both* vertices in the active set. This commit adds EdgeDirection.Either, which causes mapReduceTriplets to run on edges with *either* vertex in the active set. This is what connected components needed.

…ing serialization support for GraphImpl to address issues with failed closure capture.

…aphx Conflicts: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala

…aphx

…maintain backwards compatibility.

Improved logic of finding new files in FileInputDStream Earlier, if HDFS has a hiccup and reports a existence of a new file (mod time T sec) at time T + 1 sec, then fileStream could have missed that file. With this change, it should be able to find files that are delayed by up to <batch size> seconds. That is, even if file is reported at T + <batch time> sec, file stream should be able to catch it. The new logic, at a high level, is as follows. It keeps track of the new files it found in the previous interval and mod time of the oldest of those files (lets call it X). Then in the current interval, it will ignore those files that were seen in the previous interval and those which have mod time older than X. So if a new file gets reported by HDFS that in the current interval, but has mod time in the previous interval, it will be considered. However, if the mod time earlier than the previous interval (that is, earlier than X), they will be ignored. This is the current limitation, and future version would improve this behavior further. Also reduced line lengths in DStream to <=100 chars.

@tyro89

This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.

@tyro89

Switch from MUTF8 to UTF8 in PySpark serializers. This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.

Updated Spark Streaming Programming Guide Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome. In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here - http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html The major changes are: - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources - An example right after overview to quickly give an idea of what Spark Streaming program looks like - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs) - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode - Added information about linking and using the external input sources like Kafka and Flume - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery. Todos: - Links to the docs of external Kafka, Flume, etc - Illustrate window operation with figure as well as example. Author: Tathagata Das <[email protected]> == Merge branch commits == commit 18ff105 Author: Tathagata Das <[email protected]> Date: Tue Jan 28 21:49:30 2014 -0800 Fixed a lot of broken links. commit 34a5a60 Author: Tathagata Das <[email protected]> Date: Tue Jan 28 18:02:28 2014 -0800 Updated github url to use SPARK_GITHUB_URL variable. commit f338a60 Author: Tathagata Das <[email protected]> Date: Mon Jan 27 22:42:42 2014 -0800 More updates based on Patrick and Harvey's comments. commit 89a81ff Author: Tathagata Das <[email protected]> Date: Mon Jan 27 13:08:34 2014 -0800 Updated docs based on Patricks PR comments. commit d5b6196 Author: Tathagata Das <[email protected]> Date: Sun Jan 26 20:15:58 2014 -0800 Added spark.streaming.unpersist config and info on StreamingListener interface. commit e3dcb46 Author: Tathagata Das <[email protected]> Date: Sun Jan 26 18:41:12 2014 -0800 Fixed docs on StreamingContext.getOrCreate. commit 6c29524 Author: Tathagata Das <[email protected]> Date: Thu Jan 23 18:49:39 2014 -0800 Added example and figure for window operations, and links to Kafka and Flume API docs. commit f06b964 Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:49:12 2014 -0800 Fixed missing endhighlight tag in the MLlib guide. commit 036a7d4 Merge: eab351d a1cd185 Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:17:42 2014 -0800 Merge remote-tracking branch 'apache/master' into docs-update commit eab351d Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:17:15 2014 -0800 Update Spark Streaming Programming Guide.

Issue with failed worker registrations I've been going through the spark source after having some odd issues with workers dying and not coming back. After some digging (I'm very new to scala and spark) I believe I've found a worker registration issue. It looks to me like a failed registration follows the same code path as a successful registration which end up with workers believing they are connected (since they received a `RegisteredWorker` event) even tho they are not registered on the Master. This is a quick fix that I hope addresses this issue (assuming I didn't completely miss-read the code and I'm about to look like a silly person :P) I'm opening this pr now to start a chat with you guys while I do some more testing on my side :) Author: Erik Selin <[email protected]> == Merge branch commits == commit 973012f Author: Erik Selin <[email protected]> Date: Tue Jan 28 23:36:12 2014 -0500 break logwarning into two lines to respect line character limit. commit e3754dc Author: Erik Selin <[email protected]> Date: Tue Jan 28 21:16:21 2014 -0500 add log warning when worker registration fails due to attempt to re-register on same address. commit 14baca2 Author: Erik Selin <[email protected]> Date: Wed Jan 22 21:23:26 2014 -0500 address code style comment commit 71c0d7e Author: Erik Selin <[email protected]> Date: Wed Jan 22 16:01:42 2014 -0500 Make a failed registration not persist, not send a `RegisteredWordker` event and not run `schedule` but rather send a `RegisterWorkerFailed` message to the worker attempting to register.

Added spark.shuffle.file.buffer.kb to configuration doc. Author: Reynold Xin <[email protected]> == Merge branch commits == commit 0eea1d7 Author: Reynold Xin <[email protected]> Date: Wed Jan 29 14:40:48 2014 -0800 Added spark.shuffle.file.buffer.kb to configuration doc.

Add GraphX to assembly/pom.xml Author: Ankur Dave <[email protected]> == Merge branch commits == commit bb0b33e Author: Ankur Dave <[email protected]> Date: Fri Jan 31 15:24:52 2014 -0800 Add GraphX to assembly/pom.xml

Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code. This PR is to change it to => to get some consistency on the Scala code. If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of => And remove unused imports found in TwitterInputDStream.scala while I was there =) Author: Henry Saputra <[email protected]> == Merge branch commits == commit 29c1771 Author: Henry Saputra <[email protected]> Date: Sat Feb 1 22:05:16 2014 -0800 Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.

Remove explicit conversion to PairRDDFunctions in cogroup() As SparkContext._ is already imported, using the implicit conversion appears to make the code much cleaner. Perhaps there was some sinister reason for doing the conversion explicitly, however. Author: Aaron Davidson <[email protected]> == Merge branch commits == commit aa4a63f Author: Aaron Davidson <[email protected]> Date: Sun Feb 2 23:48:04 2014 -0800 Remove explicit conversion to PairRDDFunctions in cogroup() As SparkContext._ is already imported, using the implicit conversion appears to make the code much cleaner. Perhaps there was some sinister reason for doing the converion explicitly, however.

Refactor RDD sampling and add randomSplit to RDD (update) Replace SampledRDD by PartitionwiseSampledRDD, which accepts a RandomSampler instance as input. The current sample with/without replacement can be easily integrated via BernoulliSampler and PoissonSampler. The benefits are: 1) RDD.randomSplit is implemented in the same way, related to https://github.com/apache/incubator-spark/pull/513 2) Stratified sampling and importance sampling can be implemented in the same manner as well. Unit tests are included for samplers and RDD.randomSplit. This should performance better than my previous request where the BernoulliSampler creates many Iterator instances: https://github.com/apache/incubator-spark/pull/513 Author: Xiangrui Meng <[email protected]> == Merge branch commits == commit e8ce957 Author: Xiangrui Meng <[email protected]> Date: Mon Feb 3 12:21:08 2014 -0800 more docs to PartitionwiseSampledRDD commit fbb4586 Author: Xiangrui Meng <[email protected]> Date: Mon Feb 3 00:44:23 2014 -0800 move XORShiftRandom to util.random and use it in BernoulliSampler commit 987456b Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 11:06:59 2014 -0800 relax assertions in SortingSuite because the RangePartitioner has large variance in this case commit 3690aae Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 09:56:28 2014 -0800 test split ratio of RDD.randomSplit commit 8a410bc Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 09:25:22 2014 -0800 add a test to ensure seed distribution and minor style update commit ce7e866 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 18:06:22 2014 -0800 minor style change commit 750912b Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 18:04:54 2014 -0800 fix some long lines commit c446a25 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 17:59:59 2014 -0800 add complement to BernoulliSampler and minor style changes commit dbe2bc2 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 17:45:08 2014 -0800 switch to partition-wise sampling for better performance commit a1fca52 Merge: ac712e4 cf6128f Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 16:33:09 2014 -0800 Merge branch 'sample' of github.com:mengxr/incubator-spark into sample commit cf6128f Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 14:40:07 2014 -0800 set SampledRDD deprecated in 1.0 commit f430f84 Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 14:38:59 2014 -0800 update code style commit a8b5e20 Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 12:56:27 2014 -0800 move package random to util.random commit ab0fa2c Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 12:50:35 2014 -0800 add Apache headers and update code style commit 985609f Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 11:49:25 2014 -0800 add new lines commit b21bddf Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 11:46:35 2014 -0800 move samplers to random.IndependentRandomSampler and add tests commit c02dacb Author: Xiangrui Meng <[email protected]> Date: Sat Jan 25 15:20:24 2014 -0800 add RandomSampler commit 8ff7ba3 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 24 13:23:22 2014 -0800 init impl of IndependentlySampledRDD

Fixed typo in scaladoc Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 0a77f78 Author: Stevo Slavić <[email protected]> Date: Tue Feb 4 15:30:27 2014 +0100 Fixed typo in scaladoc

Fixed wrong path to compute-classpath.cmd compute-classpath.cmd is in bin, not in sbin directory Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 23deca3 Author: Stevo Slavić <[email protected]> Date: Tue Feb 4 15:01:47 2014 +0100 Fixed wrong path to compute-classpath.cmd compute-classpath.cmd is in bin, not in sbin directory

Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff. Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 1e43ba0 Author: Stevo Slavić <[email protected]> Date: Wed Feb 5 14:48:29 2014 +0100 Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff.

…#544. Fixed warnings in test compilation. This commit fixes two problems: a redundant import, and a deprecated function. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit da9d2e1 Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:41:51 2014 -0800 Fixed warnings in test compilation. This commit fixes two problems: a redundant import, and a deprecated function.

remove actorToWorker in master.scala, which is actually not used actorToWorker is actually not used in the code....just remove it Author: CodingCat <[email protected]> == Merge branch commits == commit 52656c2 Author: CodingCat <[email protected]> Date: Thu Feb 6 00:28:26 2014 -0500 remove actorToWorker in master.scala, which is actually not used

…s #526. spark on yarn - yarn-client mode doesn't always exit immediately https://spark-project.atlassian.net/browse/SPARK-1049 If you run in the yarn-client mode but you don't get all the workers you requested right away and then you exit your application, the application master stays around until it gets the number of workers you initially requested. This is a waste of resources. The AM should exit immediately upon the client going away. This fix simply checks to see if the driver closed while its waiting for the initial # of workers. Author: Thomas Graves <[email protected]> == Merge branch commits == commit 03f40a6 Author: Thomas Graves <[email protected]> Date: Fri Jan 31 11:23:10 2014 -0600 spark on yarn - yarn-client mode doesn't always exit immediately

Fix off-by-one error with task progress info log. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit 29798fc Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 13:40:01 2014 -0800 Fix off-by-one error with task progress info log.

Python api additions Author: Prashant Sharma <[email protected]> == Merge branch commits == commit 8b51591 Author: Prashant Sharma <[email protected]> Date: Fri Jan 24 11:50:29 2014 +0530 Josh's and Patricks review comments. commit d37f967 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 17:27:17 2014 +0530 fixed doc tests commit 27cb54b Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 16:48:43 2014 +0530 Added keys and values methods for PairFunctions in python commit 4ce76b3 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 13:51:26 2014 +0530 Added foreachPartition commit 05f0534 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 13:02:59 2014 +0530 Added coalesce fucntion to python API commit 6568d2c Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 12:52:44 2014 +0530 added repartition function to python API.

SPARK-1056. Fix header comment in Executor to not imply that it's only u... ...sed for Mesos and Standalone. Author: Sandy Ryza <[email protected]> == Merge branch commits == commit 1f2443d Author: Sandy Ryza <[email protected]> Date: Thu Feb 6 15:03:50 2014 -0800 SPARK-1056. Fix header comment in Executor to not imply that it's only used for Mesos and Standalone

Inform DAG scheduler about all started/finished tasks. Previously, the DAG scheduler was not always informed when tasks started and finished. The simplest example here is for speculated tasks: the DAGScheduler was only told about the first attempt of a task, meaning that SparkListeners were also not told about multiple task attempts, so users can't see what's going on with speculation in the UI. The DAGScheduler also wasn't always told about finished tasks, so in the UI, some tasks will never be shown as finished (this occurs, for example, if a task set gets killed). The other problem is that the fairness accounting was wrong -- the number of running tasks in a pool was decreased when a task set was considered done, even if all of its tasks hadn't yet finished. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit c8d547d Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 16:47:33 2014 -0800 Addressed Reynold's review comments. Always use a TaskEndReason (remove the option), and explicitly signal when we don't know the reason. Also, always tell DAGScheduler (and associated listeners) about started tasks, even when they're speculated. commit 3fee1e2 Author: Kay Ousterhout <[email protected]> Date: Wed Jan 8 22:58:13 2014 -0800 Fixed broken test and improved logging commit ff12fca Author: Kay Ousterhout <[email protected]> Date: Sun Dec 29 21:08:20 2013 -0800 Inform DAG scheduler about all finished tasks. Previously, the DAG scheduler was not always informed when tasks finished. For example, when a task set was aborted, the DAG scheduler was never told when the tasks in that task set finished. The DAG scheduler was also never told about the completion of speculated tasks. This led to confusion with SparkListeners because information about the completion of those tasks was never passed on to the listeners (so in the UI, for example, some tasks will never be shown as finished). The other problem is that the fairness accounting was wrong -- the number of running tasks in a pool was decreased when a task set was considered done, even if all of its tasks hadn't yet finished.

Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit e603784 Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:34:41 2014 -0800 Re-add check for empty set of failed stages commit d258f0e Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 23:35:41 2014 -0800 Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs.

External spilling - generalize batching logic The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way. Author: Andrew Or <[email protected]> == Merge branch commits == commit 3ddeb7e Author: Andrew Or <[email protected]> Date: Wed Feb 5 12:09:32 2014 -0800 Also privatize fields commit 090544a Author: Andrew Or <[email protected]> Date: Wed Feb 5 10:58:23 2014 -0800 Privatize methods commit 13920c9 Author: Andrew Or <[email protected]> Date: Tue Feb 4 16:34:15 2014 -0800 Update docs commit bd5a1d7 Author: Andrew Or <[email protected]> Date: Tue Feb 4 13:44:24 2014 -0800 Typo: phyiscal -> physical commit 287ef44 Author: Andrew Or <[email protected]> Date: Tue Feb 4 13:38:32 2014 -0800 Avoid reading the entire batch into memory; also simplify streaming logic Additionally, address formatting comments. commit 3df7005 Merge: a531d2e 164489d Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:27:49 2014 -0800 Merge branch 'master' of github.com:andrewor14/incubator-spark commit a531d2e Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF. commit 164489d Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF.

SPARK-1062 Add rdd.intersection(otherRdd) method Author: Andrew Ash <[email protected]> == Merge branch commits == commit 5d9982b Author: Andrew Ash <[email protected]> Date: Thu Feb 6 18:11:45 2014 -0800 Minor fixes - style: (v,null) => (v, null) - mention the shuffle in Javadoc commit b86d02f Author: Andrew Ash <[email protected]> Date: Sun Feb 2 13:17:40 2014 -0800 Overload .intersection() for numPartitions and custom Partitioner commit bcaa349 Author: Andrew Ash <[email protected]> Date: Sun Feb 2 13:05:40 2014 -0800 Better naming of parameters in intersection's filter commit b10a6af Author: Andrew Ash <[email protected]> Date: Sat Jan 25 23:06:26 2014 -0800 Follow spark code format conventions of tab => 2 spaces commit 965256e Author: Andrew Ash <[email protected]> Date: Fri Jan 24 00:28:01 2014 -0800 Add rdd.intersection(otherRdd) method

@pwendell

tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax Author: Martin Jaggi <[email protected]> == Merge branch commits == commit 0364bfa Author: Martin Jaggi <[email protected]> Date: Fri Feb 7 03:19:38 2014 +0100 minor polishing, as suggested by @pwendell commit dcd2142 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 18:04:26 2014 +0100 enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit bbafafd Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 17:31:29 2014 +0100 split MLlib documentation by techniques and linked from the main mllib-guide.md site commit d1c5212 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:59:43 2014 +0100 enable mathjax formula in the .md documentation files code by @shivaram commit d73948d Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:57:23 2014 +0100 minor update on how to compile the documentation

Make sbt download an atomic operation Modifies the `sbt/sbt` script to gracefully recover when a previous invocation died in the middle of downloading the SBT jar. Author: Jey Kottalam <[email protected]> == Merge branch commits == commit 6c600eb Author: Jey Kottalam <[email protected]> Date: Fri Jan 17 10:43:54 2014 -0800 Make sbt download an atomic operation

Kill drivers in postStop() for Worker. JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068 Author: Qiuzhuang Lian <[email protected]> == Merge branch commits == commit 9c19ce6 Author: Qiuzhuang Lian <[email protected]> Date: Sat Feb 8 16:07:39 2014 +0800 Kill drivers in postStop() for Worker. JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068

@pwendell

Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <[email protected]> == Merge branch commits == commit 1b00a8a Author: Mark Hamstra <[email protected]> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT

SPARK-1066: Add developer scripts to repository. These are some developer scripts I've been maintaining in a separate public repo. This patch adds them to the Spark repository so they can evolve here and are clearly accessible to all committers. I may do some small additional clean-up in this PR, but wanted to put them here in case others want to review. There are a few types of scripts here: 1. A tool to merge pull requests. 2. A script for packaging releases. 3. A script for auditing release candidates. Author: Patrick Wendell <[email protected]> == Merge branch commits == commit 5d5d331 Author: Patrick Wendell <[email protected]> Date: Sat Feb 8 22:11:47 2014 -0800 SPARK-1066: Add developer scripts to repository.

[WIP] SPARK-1067: Default log4j initialization causes errors for those not using log4j To fix this - we add a check when initializing log4j. Author: Patrick Wendell <[email protected]> == Merge branch commits == commit ffdce51 Author: Patrick Wendell <[email protected]> Date: Fri Feb 7 15:22:29 2014 -0800 Logging fix

Added example Python code for sort I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code. Author: jyotiska <[email protected]> == Merge branch commits == commit 8ad8faf Author: jyotiska <[email protected]> Date: Sun Feb 9 11:00:41 2014 +0530 Added comments in code on collect() method commit 6f98f1e Author: jyotiska <[email protected]> Date: Sat Feb 8 13:12:37 2014 +0530 Updated python example code sort.py commit 945e39a Author: jyotiska <[email protected]> Date: Sat Feb 8 12:59:09 2014 +0530 Added example python code for sort

[SPARK-1060] startJettyServer should explicitly use IP information https://spark-project.atlassian.net/browse/SPARK-1060 In the current implementation, the webserver in Master/Worker is started with val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers) inside startJettyServer: val server = new Server(currentPort) //here, the Server will take "0.0.0.0" as the hostname, i.e. will always bind to the IP address of the first NIC this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when starting the web server, for the reason stated above, it will always bind to the N1's address Author: CodingCat <[email protected]> == Merge branch commits == commit 6c6d9a8 Author: CodingCat <[email protected]> Date: Thu Feb 6 14:53:34 2014 -0500 startJettyServer should explicitly use IP information

ankurdave and others added 30 commits January 13, 2014 14:54

Add graph loader links to doc

97cd27e

Added unpersisting and modified testsuite to better test out metadata…

27311b1

… cleaning.

Merge pull request #2 from jegonzal/GraphXCCIssue

8038da2

Improving documentation and identifying potential bug in CC calculation.

Updated JavaStreamingContext to make scaladoc compile.

30328c3

`sbt/sbt doc` used to fail. This fixed it.

Merge branch 'master' into graphx

e2d25d2

Merge pull request #410 from rxin/scaladoc1

01c0d72

Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.

Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx

dc041cd

Improved file input stream further.

c0bb38e

Remove aggregateNeighbors

1bd5cef

Improvements in example code for the programming guide as well as add…

cfe4a29

…ing serialization support for GraphImpl to address issues with failed closure capture.

Merge remote-tracking branch 'apache/master' into filestream-fix

1233b3d

Miscel doc update.

02a8f54

Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…

a4e12af

…aphx Conflicts: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala

Made more things private.

87f335d

Updated GraphGenerator.

ae06d2c

Moved PartitionStrategy's into an object.

1dce9ce

Yarn Client refactor

79a5ba3

Yarn workerRunnable refactor

161ab93

Minor changes in graphx programming guide.

622b7f7

Finished second pass on pregel docs.

552de5d

Address comments to fix code formats

4c22c55

Moved SVDPlusPlusConf into SVDPlusPlus object itself.

8e5c732

More cleanup.

9317286

Updated doc for PageRank.

0b18bfb

Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…

0fbc0b0

…aphx

Fix for Kryo Serializer

d4cd5de

Finished documenting vertexrdd.

ee8931d

Add default value for HadoopRDD's cloneRecords constructor arg, to …

9e84e70

…maintain backwards compatibility.

JoshRosen and others added 30 commits January 28, 2014 20:20

Merge pull request #527 from ankurdave/graphx-assembly-pom

a8cf3ec

Add GraphX to assembly/pom.xml Author: Ankur Dave <[email protected]> == Merge branch commits == commit bb0b33e Author: Ankur Dave <[email protected]> Date: Fri Jan 31 15:24:52 2014 -0800 Add GraphX to assembly/pom.xml

Merge pull request #535 from sslavic/patch-2. Closes #535.

0c05cd3

Fixed typo in scaladoc Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 0a77f78 Author: Stevo Slavić <[email protected]> Date: Tue Feb 4 15:30:27 2014 +0100 Fixed typo in scaladoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prashan_pul #1

prashan_pul #1

prashanC commented Oct 3, 2016

prashan_pul #1

Are you sure you want to change the base?

prashan_pul #1

Conversation

prashanC commented Oct 3, 2016