Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prashan_pul #1

Open
wants to merge 3,444 commits into
base: master
Choose a base branch
from
Open

prashan_pul #1

wants to merge 3,444 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jan 13, 2014

  1. Configuration menu
    Copy the full SHA
    97cd27e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    27311b1 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #2 from jegonzal/GraphXCCIssue

    Improving documentation and identifying potential bug in CC calculation.
    ankurdave committed Jan 13, 2014
    Configuration menu
    Copy the full SHA
    8038da2 View commit details
    Browse the repository at this point in the history
  4. Updated JavaStreamingContext to make scaladoc compile.

    `sbt/sbt doc` used to fail. This fixed it.
    rxin committed Jan 13, 2014
    Configuration menu
    Copy the full SHA
    30328c3 View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2014

  1. Configuration menu
    Copy the full SHA
    e2d25d2 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #410 from rxin/scaladoc1

    Updated JavaStreamingContext to make scaladoc compile.
    
    `sbt/sbt doc` used to fail. This fixed it.
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    01c0d72 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    dc041cd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c0bb38e View commit details
    Browse the repository at this point in the history
  5. Remove aggregateNeighbors

    ankurdave committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    1bd5cef View commit details
    Browse the repository at this point in the history
  6. Add EdgeDirection.Either and use it to fix CC bug

    The bug was due to a misunderstanding of the activeSetOpt parameter to
    Graph.mapReduceTriplets. Passing EdgeDirection.Both causes
    mapReduceTriplets to run only on edges with *both* vertices in the
    active set. This commit adds EdgeDirection.Either, which causes
    mapReduceTriplets to run on edges with *either* vertex in the active
    set. This is what connected components needed.
    ankurdave committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    ae4b75d View commit details
    Browse the repository at this point in the history
  7. Improvements in example code for the programming guide as well as add…

    …ing serialization support for GraphImpl to address issues with failed closure capture.
    jegonzal committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    cfe4a29 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    1233b3d View commit details
    Browse the repository at this point in the history
  9. Miscel doc update.

    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    02a8f54 View commit details
    Browse the repository at this point in the history
  10. Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…

    …aphx
    
    Conflicts:
    	graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    a4e12af View commit details
    Browse the repository at this point in the history
  11. Made more things private.

    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    87f335d View commit details
    Browse the repository at this point in the history
  12. Updated GraphGenerator.

    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    ae06d2c View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    1dce9ce View commit details
    Browse the repository at this point in the history
  14. Yarn Client refactor

    colorant committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    79a5ba3 View commit details
    Browse the repository at this point in the history
  15. Yarn workerRunnable refactor

    colorant committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    161ab93 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    622b7f7 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    552de5d View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    4c22c55 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    8e5c732 View commit details
    Browse the repository at this point in the history
  20. More cleanup.

    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    9317286 View commit details
    Browse the repository at this point in the history
  21. Updated doc for PageRank.

    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    0b18bfb View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    0fbc0b0 View commit details
    Browse the repository at this point in the history
  23. Fix for Kryo Serializer

    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    d4cd5de View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    ee8931d View commit details
    Browse the repository at this point in the history
  25. Add default value for HadoopRDD's cloneRecords constructor arg, to …

    …maintain backwards compatibility.
    harveyfeng committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    9e84e70 View commit details
    Browse the repository at this point in the history
  26. Merge pull request #411 from tdas/filestream-fix

    Improved logic of finding new files in FileInputDStream
    
    Earlier, if HDFS has a hiccup and reports a existence of a new file (mod time T sec) at time T + 1 sec, then fileStream could have missed that file. With this change, it should be able to find files that are delayed by up to <batch size> seconds. That is, even if file is reported at T + <batch time> sec, file stream should be able to catch it.
    
    The new logic, at a high level, is as follows. It keeps track of the new files it found in the previous interval and mod time of the oldest of those files (lets call it X). Then in the current interval, it will ignore those files that were seen in the previous interval and those which have mod time older than X. So if a new file gets reported by HDFS that in the current interval, but has mod time in the previous interval, it will be considered. However, if the mod time earlier than the previous interval (that is, earlier than X), they will be ignored. This is the current limitation, and future version would improve this behavior further.
    
    Also reduced line lengths in DStream to <=100 chars.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    a2fee38 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    33022d6 View commit details
    Browse the repository at this point in the history
  28. Merge pull request #412 from harveyfeng/master

    Add default value for HadoopRDD's `cloneRecords` constructor arg
    
    Small mend to https://github.com/apache/incubator-spark/pull/359/files#diff-1 for backwards compatibility
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    b07bc02 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    cc93c2a View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    8399341 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    d4d9ece View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    84d6af8 View commit details
    Browse the repository at this point in the history
  33. Fix infinite loop in GraphGenerators.generateRandomEdges

    The loop occurred when numEdges < numVertices. This commit fixes it by
    allowing generateRandomEdges to generate a multigraph.
    ankurdave committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    c6023be View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    59e4384 View commit details
    Browse the repository at this point in the history
  35. Improve scaladoc links

    ankurdave committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    c28e5a0 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    e14a14b View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    67795db View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    6f6f8c9 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    c6dbfd1 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    76ebdae View commit details
    Browse the repository at this point in the history
  41. Merge pull request #409 from tdas/unpersist

    Automatically unpersisting RDDs that have been cleaned up from DStreams
    
    Earlier RDDs generated by DStreams were forgotten but not unpersisted. The system relied on the natural BlockManager LRU to drop the data. The cleaner.ttl was a hammer to clean up RDDs but it is something that needs to be set separately and need to be set very conservatively (at best, few minutes). This automatic unpersisting allows the system to handle this automatically, which reduces memory usage. As a side effect it will also improve GC performance as there are less number of objects stored in memory. In fact, for some workloads, it may allow RDDs to be cached as deserialized, which speeds up processing without too much GC overheads.
    
    This is disabled by default. To enable it set configuration spark.streaming.unpersist to true. In future release, this will be set to true by default.
    
    Also, reduced sleep time in TaskSchedulerImpl.stop() from 5 second to 1 second. From my conversation with Matei, there does not seem to be any good reason for the sleep for letting messages be sent out be so long.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    08b9fec View commit details
    Browse the repository at this point in the history
  42. Finish 6f6f8c9

    ankurdave committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    2cd9358 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    af645be View commit details
    Browse the repository at this point in the history
  44. Merge pull request #401 from andrewor14/master

    External sorting - Add number of bytes spilled to Web UI
    
    Additionally, update test suite for external sorting to induce spilling.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    0ca0d4d View commit details
    Browse the repository at this point in the history
  45. Code clean up for mllib

    soulmachine committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    0d94d74 View commit details
    Browse the repository at this point in the history
  46. Since getLong() and getInt() have side effect, get back parentheses, …

    …and remove an empty line
    soulmachine committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    12386b3 View commit details
    Browse the repository at this point in the history
  47. Merge pull request #413 from rxin/scaladoc

    Adjusted visibility of various components and documentation for 0.9.0 release.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    68641bc View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    4bafc4f View commit details
    Browse the repository at this point in the history
  49. Merge pull request #408 from pwendell/external-serializers

    Improvements to external sorting
    
    1. Adds the option of compressing outputs.
    2. Adds batching to the serialization to prevent OOM on the read side.
    3. Slight renaming of config options.
    4. Use Spark's buffer size for reads in addition to writes.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    945fe7a View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    80e73ed View commit details
    Browse the repository at this point in the history
  51. Merge pull request #367 from ankurdave/graphx

    GraphX: Unifying Graphs and Tables
    
    GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/.
    
    Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak.
    
    Tasks left:
    - [x] Graph-level uncache
    - [x] Uncache previous iterations in Pregel
    - [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release)
    - [x] - Describe GC issue with GraphLab
    - [ ] Write `docs/graphx-programming-guide.md`
    - [x] - Mention future Bagel support in docs
    - [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again.
    - [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx
    - [x] Make Graph serializable to work around capture in Spark shell
    - [x] Rename graph -> graphx in package name and subproject
    - [x] Remove standalone PageRank
    - [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    4a805af View commit details
    Browse the repository at this point in the history
  52. Indent two spaces

    soulmachine committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    c2852cf View commit details
    Browse the repository at this point in the history
  53. Merge pull request #380 from mateiz/py-bayes

    Add Naive Bayes to Python MLlib, and some API fixes
    
    - Added a Python wrapper for Naive Bayes
    - Updated the Scala Naive Bayes to match the style of our other
      algorithms better and in particular make it easier to call from Java
      (added builder pattern, removed default value in train method)
    - Updated Python MLlib functions to not require a SparkContext; we can
      get that from the RDD the user gives
    - Added a toString method in LabeledPoint
    - Made the Python MLlib tests run as part of run-tests as well (before
      they could only be run individually through each file)
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    fdaabdc View commit details
    Browse the repository at this point in the history
  54. Removed StreamingContext.registerInputStream and registerOutputStream…

    … - they were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.
    tdas committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    4e497db View commit details
    Browse the repository at this point in the history
  55. Configuration menu
    Copy the full SHA
    0984647 View commit details
    Browse the repository at this point in the history
  56. Merge pull request #415 from pwendell/shuffle-compress

    Enable compression by default for spills
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    055be5c View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    a3da468 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    845e568 View commit details
    Browse the repository at this point in the history
  59. Merge remote-tracking branch 'apache/master' into filestream-fix

    Conflicts:
    	streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
    tdas committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    f8e239e View commit details
    Browse the repository at this point in the history
  60. Fixed loose ends in docs.

    tdas committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    f8bd828 View commit details
    Browse the repository at this point in the history
  61. Merge pull request #416 from tdas/filestream-fix

    Removed unnecessary DStream operations and updated docs
    
    Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons.
    
    Updated docs, specially added package documentation for streaming package.
    
    Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    980250b View commit details
    Browse the repository at this point in the history
  62. Modifications as suggested in PR feedback-

    - more variants of mapPartitions added to JavaRDDLike
    - move setGenerator to JavaRDDLike
    - clean up
    Saurabh Rawat committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    1442cd5 View commit details
    Browse the repository at this point in the history
  63. Add missing header files

    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    2303479 View commit details
    Browse the repository at this point in the history
  64. Merge pull request #420 from pwendell/header-files

    Add missing header files
    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    fa75e5e View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    57fcfc7 View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    486f37c View commit details
    Browse the repository at this point in the history
  67. Merge pull request #423 from jegonzal/GraphXProgrammingGuide

    Improving the graphx-programming-guide
    
    This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide.
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    3fcc68b View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    0bba773 View commit details
    Browse the repository at this point in the history
  69. Broadcast variable visibility change & doc update.

    Note that previously Broadcast class was accidentally marked as private[spark]. It needs to be public
    for broadcast variables to work. Also exposing the broadcast varaible id.
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    71b3007 View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    6a12b9e View commit details
    Browse the repository at this point in the history
  71. Configuration menu
    Copy the full SHA
    f8c12e9 View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    55db774 View commit details
    Browse the repository at this point in the history
  73. Maintain Serializable API compatibility by reverting back to java.io.…

    …Serializable for Broadcast and Accumulator.
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    1b5623f View commit details
    Browse the repository at this point in the history
  74. Configuration menu
    Copy the full SHA
    f12e506 View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    6f965a4 View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    938e4a0 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    b683608 View commit details
    Browse the repository at this point in the history
  78. Configuration menu
    Copy the full SHA
    5b3a3e2 View commit details
    Browse the repository at this point in the history
  79. Merge pull request #425 from rxin/scaladoc

    API doc update & make Broadcast public
    
    In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that.
    
    Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator.
    
    This should be cherry picked into branch-0.9 as well for 0.9.0 release.
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    2ce23a5 View commit details
    Browse the repository at this point in the history
  80. Configuration menu
    Copy the full SHA
    8ea2cd5 View commit details
    Browse the repository at this point in the history
  81. Style fix

    pwendell committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    b1b22b7 View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    8ea056d View commit details
    Browse the repository at this point in the history
  83. Merge pull request #427 from pwendell/deprecate-aggregator

    Deprecate rather than remove old combineValuesByKey function
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    d601a76 View commit details
    Browse the repository at this point in the history
  84. Merge pull request #429 from ankurdave/graphx-examples-pom.xml

    Add GraphX dependency to examples/pom.xml
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    193a075 View commit details
    Browse the repository at this point in the history
  85. Merge pull request #428 from pwendell/writeable-objects

    Don't clone records for text files
    rxin committed Jan 14, 2014
    Configuration menu
    Copy the full SHA
    74b46ac View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2014

  1. Configuration menu
    Copy the full SHA
    1210ec2 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #431 from ankurdave/graphx-caching-doc

    Describe caching and uncaching in GraphX programming guide
    rxin committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    ad294db View commit details
    Browse the repository at this point in the history
  3. Merge pull request #424 from jegonzal/GraphXProgrammingGuide

    Additional edits for clarity in the graphx programming guide.
    
    Added an overview of the Graph and GraphOps functions and fixed numerous typos.
    rxin committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    3a386e2 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    148757e View commit details
    Browse the repository at this point in the history
  5. VertexID -> VertexId

    ankurdave committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    f4d9019 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    147a943 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    dfb1524 View commit details
    Browse the repository at this point in the history
  8. Changed SparkConf to not be serializable. And also fixed unit-test lo…

    …g paths in log4j.properties of external modules.
    tdas committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    1f4718c View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    0e15bd7 View commit details
    Browse the repository at this point in the history
  10. Merge pull request #434 from rxin/graphxmaven

    Fixed SVDPlusPlusSuite in Maven build.
    
    This should go into 0.9.0 also.
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    087487e View commit details
    Browse the repository at this point in the history
  11. Merge pull request #435 from tdas/filestream-fix

    Fixed the flaky tests by making SparkConf not serializable
    
    SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that.
    
    @mateiz @pwendell
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    139c24e View commit details
    Browse the repository at this point in the history
  12. Expose method and class - so that we can use it from user code (parti…

    …cularly since checkpoint directory is autogenerated now
    mridulm committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    0aea33d View commit details
    Browse the repository at this point in the history
  13. Merge pull request #436 from ankurdave/VertexId-case

    Rename VertexID -> VertexId in GraphX
    rxin committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    3d9e66d View commit details
    Browse the repository at this point in the history
  14. remove "-XX:+UseCompressedStrings" option

    remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
    CrazyJvm committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    263933d View commit details
    Browse the repository at this point in the history
  15. Merge pull request #366 from colorant/yarn-dev

    More yarn code refactor
    
    Try to retrive common code in yarn alpha/stable for  client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them.
    
    Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way.
    
    So just make it run for these two files firstly.
    tgravescs committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    cef2af9 View commit details
    Browse the repository at this point in the history
  16. Merge pull request #433 from markhamstra/debFix

    Updated Debian packaging
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    494d3c0 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    9259d70 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    00a3f7e View commit details
    Browse the repository at this point in the history
  19. Merge pull request #441 from pwendell/graphx-build

    GraphX shouldn't list Spark as provided.
    
    I noticed this when building an application against GraphX to audit the released artifacts.
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    5fecd25 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    9e63753 View commit details
    Browse the repository at this point in the history
  21. Merge pull request #443 from tdas/filestream-fix

    Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.
    
    Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming].
    
    Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    2a05403 View commit details
    Browse the repository at this point in the history
  22. Merge pull request #442 from pwendell/standalone

    Workers should use working directory as spark home if it's not specified
    
    If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    59f475c View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    2ffdaef View commit details
    Browse the repository at this point in the history
  24. Merge pull request #444 from mateiz/py-version

    Clarify that Python 2.7 is only needed for MLlib
    pwendell committed Jan 15, 2014
    Configuration menu
    Copy the full SHA
    4f0c361 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2014

  1. Fail rather than hanging if a task crashes the JVM.

    Prior to this commit, if a task crashes the JVM, the task (and
    all other tasks running on that executor) is marked at KILLED rather
    than FAILED.  As a result, the TaskSetManager will retry the task
    indefiniteily rather than failing the job after maxFailures. This
    commit fixes that problem by marking tasks as FAILED rather than
    killed when an executor is lost.
    
    The downside of this commit is that if task A fails because another
    task running on the same executor caused the VM to crash, the failure
    will incorrectly be counted as a failure of task A. This should not
    be an issue because we typically set maxFailures to 3, and it is
    unlikely that a task will be co-located with a JVM-crashing task
    multiple times.
    kayousterhout committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    a268d63 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #439 from CrazyJvm/master

    SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide
    
    remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
    rxin committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    0675ca5 View commit details
    Browse the repository at this point in the history
  3. fix "set MASTER automatically fails" bug.

    spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. 
    The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
    CrazyJvm committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    7a0c5b5 View commit details
    Browse the repository at this point in the history
  4. fix some format problem.

    CrazyJvm committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    8400536 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #414 from soulmachine/code-style

    Code clean up for mllib
    
    * Removed unnecessary parentheses
    * Removed unused imports
    * Simplified `filter...size()` to `count ...`
    * Removed obsoleted parameters' comments
    rxin committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    84595ea View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    718a13c View commit details
    Browse the repository at this point in the history
  7. Merge pull request #445 from kayousterhout/exec_lost

    Fail rather than hanging if a task crashes the JVM.
    
    Prior to this commit, if a task crashes the JVM, the task (and
    all other tasks running on that executor) is marked at KILLED rather
    than FAILED.  As a result, the TaskSetManager will retry the task
    indefinitely rather than failing the job after maxFailures. Eventually,
    this makes the job hang, because the Standalone Scheduler removes
    the application after 10 works have failed, and then the app is left
    in a state where it's disconnected from the master and waiting to reconnect.
    This commit fixes that problem by marking tasks as FAILED rather than
    killed when an executor is lost.
    
    The downside of this commit is that if task A fails because another
    task running on the same executor caused the VM to crash, the failure
    will incorrectly be counted as a failure of task A. This should not
    be an issue because we typically set maxFailures to 3, and it is
    unlikely that a task will be co-located with a JVM-crashing task
    multiple times.
    rxin committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    c06a307 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4e510b0 View commit details
    Browse the repository at this point in the history
  9. Address review comments

    mridulm committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    1a0da89 View commit details
    Browse the repository at this point in the history
  10. Use method, not variable

    mridulm committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    edd82c5 View commit details
    Browse the repository at this point in the history
  11. Updated java API docs for streaming, along with very minor changes in…

    … the code examples.
    tdas committed Jan 16, 2014
    Configuration menu
    Copy the full SHA
    11e6534 View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2014

  1. Configuration menu
    Copy the full SHA
    fcb4fc6 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #438 from ScrapCodes/clone-records-java-api

    Clone records java api
    pwendell committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    d4fd89e View commit details
    Browse the repository at this point in the history
  3. Merge pull request #451 from Qiuzhuang/master

    Fixed Window spark shell launch script error.
    
     JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
    pwendell committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    d749d47 View commit details
    Browse the repository at this point in the history
  4. Address review comment

    mridulm committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    b690e11 View commit details
    Browse the repository at this point in the history
  5. changes from PR

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    d28bf41 View commit details
    Browse the repository at this point in the history
  6. use 0-indexing

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    cb13b15 View commit details
    Browse the repository at this point in the history
  7. replace this.type with SVD

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    eb2d8c4 View commit details
    Browse the repository at this point in the history
  8. add rename computeSVD

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    dbec69b View commit details
    Browse the repository at this point in the history
  9. prettify

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    c9b4845 View commit details
    Browse the repository at this point in the history
  10. 0index docs

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    5c639d7 View commit details
    Browse the repository at this point in the history
  11. make example 0-indexed

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    4e96757 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    caf97a2 View commit details
    Browse the repository at this point in the history
  13. rename to MatrixSVD

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    fa32998 View commit details
    Browse the repository at this point in the history
  14. rename to MatrixSVD

    rezazadeh committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    85b95d0 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2014

  1. Configuration menu
    Copy the full SHA
    e91ad3f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5316bca View commit details
    Browse the repository at this point in the history
  3. Merge pull request #461 from pwendell/master

    Use renamed shuffle spill config in CoGroupedRDD.scala
    
    This one got missed when it was renamed.
    pwendell committed Jan 18, 2014
    Configuration menu
    Copy the full SHA
    aa981e4 View commit details
    Browse the repository at this point in the history
  4. Allow files added through SparkContext.addFile() to be overwritten

    This is useful for the cases when a file needs to be refreshed and downloaded
    by the executors periodically.
    
    Signed-off-by: Yinan Li <[email protected]>
    liyinan926 committed Jan 18, 2014
    Configuration menu
    Copy the full SHA
    fd833e7 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2014

  1. Merge pull request #462 from mateiz/conf-file-fix

    Remove Typesafe Config usage and conf files to fix nested property names
    
    With Typesafe Config we had the subtle problem of no longer allowing
    nested property names, which are used for a few of our properties:
    http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html
    
    This PR is for branch 0.9 but should be added into master too.
    (cherry picked from commit 34e911c)
    
    Signed-off-by: Patrick Wendell <[email protected]>
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    bf56995 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #426 from mateiz/py-ml-tests

    Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)
    
    We disabled these earlier because Jenkins didn't have these versions.
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    4c16f79 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #437 from mridulm/master

    Minor api usability changes
    
    - Expose checkpoint directory - since it is autogenerated now
    - null check for jars
    - Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    73dfd42 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #459 from srowen/UpdaterL2Regularization

    Correct L2 regularized weight update with canonical form
    
    Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    fe8a354 View commit details
    Browse the repository at this point in the history
  5. Addressed comments from Reynold

    Signed-off-by: Yinan Li <[email protected]>
    liyinan926 committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    584323c View commit details
    Browse the repository at this point in the history
  6. LocalSparkContext for MLlib

    ajtulloch committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    720836a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ceb79a3 View commit details
    Browse the repository at this point in the history
  8. update comment

    tgravescs committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    dd56b21 View commit details
    Browse the repository at this point in the history
  9. Merge pull request #458 from tdas/docs-update

    Updated java API docs for streaming, along with very minor changes in the code examples.
    
    Docs updated for:
    Scala: StreamingContext, DStream, PairDStreamFunctions
    Java: JavaStreamingContext, JavaDStream, JavaPairDStream
    
    Example updated:
    JavaQueueStream: Not use deprecated method
    ActorWordCount: Use the public interface the right way.
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    256a355 View commit details
    Browse the repository at this point in the history
  10. Merge pull request #470 from tgravescs/fix_spark_examples_yarn

    Only log error on missing jar to allow spark examples to jar.
    
    Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs.  To make that nicer  so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
    pwendell committed Jan 19, 2014
    Configuration menu
    Copy the full SHA
    792d908 View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2014

  1. Configuration menu
    Copy the full SHA
    f9a95d6 View commit details
    Browse the repository at this point in the history
  2. fix for SPARK-1027

    change TestClient & Worker to Some("xxx")
    
    kill manager if it is started
    
    remove unnecessary .get when fetch "SPARK_HOME" values
    CodingCat committed Jan 20, 2014
    Configuration menu
    Copy the full SHA
    29f4b6a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3e85b87 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2014

  1. Configuration menu
    Copy the full SHA
    cdb003e View commit details
    Browse the repository at this point in the history
  2. Minor fixes

    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    54867e9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1b29914 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c324ac1 View commit details
    Browse the repository at this point in the history
  5. Fixing speculation bug

    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    f84400e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    de526ad View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d46df96 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2e95174 View commit details
    Browse the repository at this point in the history
  9. Restricting /lib to top level directory in .gitignore

    This patch was proposed by Sean Mackrory.
    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    e437069 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    e0b741d View commit details
    Browse the repository at this point in the history
  11. Merge pull request #483 from pwendell/gitignore

    Restricting /lib to top level directory in .gitignore
    
    This patch was proposed by Sean Mackrory.
    rxin committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    7373ffb View commit details
    Browse the repository at this point in the history
  12. Merge pull request #482 from tdas/streaming-example-fix

    Added StreamingContext.awaitTermination to streaming examples
    
    StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn.
    
    This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait.
    
    The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.
    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    0367981 View commit details
    Browse the repository at this point in the history
  13. Merge pull request #449 from CrazyJvm/master

    SPARK-1028 : fix "set MASTER automatically fails" bug.
    
    spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem.
    The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
    rxin committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    6b4eed7 View commit details
    Browse the repository at this point in the history
  14. Adding small code comment

    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    a917a87 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    65869f8 View commit details
    Browse the repository at this point in the history
  16. Merge pull request #484 from tdas/run-example-fix

    Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.
    
    bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script.
    
    Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts.
    
    @matei Please check this soon I want this to go in 0.9-rc4
    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    c67d3d8 View commit details
    Browse the repository at this point in the history
  17. Style clean-up

    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    a9bcc98 View commit details
    Browse the repository at this point in the history
  18. Merge pull request #480 from pwendell/0.9-fixes

    Handful of 0.9 fixes
    
    This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate.
    
    @mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.
    pwendell committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    77b986f View commit details
    Browse the repository at this point in the history
  19. Incorporate Tom's comments - update doc and code to reflect that core…

    … requests may not always be honored
    sryza committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    adf4261 View commit details
    Browse the repository at this point in the history
  20. Fixed import order

    ajtulloch committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    3a067b4 View commit details
    Browse the repository at this point in the history
  21. Merge pull request #469 from ajtulloch/use-local-spark-context-in-tes…

    …ts-for-mllib
    
    [MLlib] Use a LocalSparkContext trait in test suites
    
    Replaces the 9 instances of
    
    ```scala
    class XXXSuite extends FunSuite with BeforeAndAfterAll {
      @transient private var sc: SparkContext = _
    
      override def beforeAll() {
        sc = new SparkContext("local", "test")
      }
    
      override def afterAll() {
        sc.stop()
        System.clearProperty("spark.driver.port")
      }
    ```
    
    with
    
    ```scala
    class XXXSuite extends FunSuite with LocalSparkContext {
    ```
    rxin committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    f854498 View commit details
    Browse the repository at this point in the history
  22. Clarify spark.default.parallelism

    It's the task count across the cluster, not per worker, per machine, per core, or anything else.
    ash211 committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    069bb94 View commit details
    Browse the repository at this point in the history
  23. Merge pull request #489 from ash211/patch-6

    Clarify spark.default.parallelism
    
    It's the task count across the cluster, not per worker, per machine, per core, or anything else.
    rxin committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    749f842 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2014

  1. Replace the code to check for Option != None with Option.isDefined ca…

    …ll in Scala code.
    
    This hopefully will make the code cleaner.
    hsaputra committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    90ea9d5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    36f9a64 View commit details
    Browse the repository at this point in the history
  3. Fixed bug where task set managers are added to queue twice

    This bug leads to a small performance hit because task
    set managers will get offered each rejected resource
    offer twice, but doesn't lead to any incorrect functionality.
    kayousterhout committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    19da82c View commit details
    Browse the repository at this point in the history
  4. Merge pull request #315 from rezazadeh/sparsesvd

    Sparse SVD
    
    # Singular Value Decomposition
    Given an *m x n* matrix *A*, compute matrices *U, S, V* such that
    
    *A = U * S * V^T*
    
    There is no restriction on m, but we require n^2 doubles to fit in memory.
    Further, n should be less than m.
    
    The decomposition is computed by first computing *A^TA = V S^2 V^T*,
    computing svd locally on that (since n x n is small),
    from which we recover S and V.
    Then we compute U via easy matrix multiplication
    as *U =  A * V * S^-1*
    
    Only singular vectors associated with the largest k singular values
    If there are k such values, then the dimensions of the return will be:
    
    * *S* is *k x k* and diagonal, holding the singular values on diagonal.
    * *U* is *m x k* and satisfies U^T*U = eye(k).
    * *V* is *n x k* and satisfies V^TV = eye(k).
    
    All input and output is expected in sparse matrix format, 0-indexed
    as tuples of the form ((i,j),value) all in RDDs.
    
    # Testing
    Tests included. They test:
    - Decomposition promise (A = USV^T)
    - For small matrices, output is compared to that of jblas
    - Rank 1 matrix test included
    - Full Rank matrix test included
    - Middle-rank matrix forced via k included
    
    # Example Usage
    
    import org.apache.spark.SparkContext
    import org.apache.spark.mllib.linalg.SVD
    import org.apache.spark.mllib.linalg.SparseMatrix
    import org.apache.spark.mllib.linalg.MatrixyEntry
    
    // Load and parse the data file
    val data = sc.textFile("mllib/data/als/test.data").map { line =>
          val parts = line.split(',')
          MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble)
    }
    val m = 4
    val n = 4
    
    // recover top 1 singular vector
    val decomposed = SVD.sparseSVD(SparseMatrix(data, m, n), 1)
    
    println("singular values = " + decomposed.S.data.toArray.mkString)
    
    # Documentation
    Added to docs/mllib-guide.md
    mateiz committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    d009b17 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #493 from kayousterhout/double_add

    Fixed bug where task set managers are added to queue twice
    
    @mateiz can you verify that this is a bug and wasn't intentional? (https://github.com/apache/incubator-spark/commit/90a04dab8d9a2a9a372cea7cdf46cc0fd0f2f76c#diff-7fa4f84a961750c374f2120ca70e96edR551)
    
    This bug leads to a small performance hit because task
    set managers will get offered each rejected resource
    offer twice, but doesn't lead to any incorrect functionality.
    
    Thanks to @hdc1112 for pointing this out.
    mateiz committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    5bcfd79 View commit details
    Browse the repository at this point in the history
  6. Merge pull request #478 from sryza/sandy-spark-1033

    SPARK-1033. Ask for cores in Yarn container requests
    
    Tested on a pseudo-distributed cluster against the Fair Scheduler and observed a worker taking more than a single core.
    pwendell committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    576c4a4 View commit details
    Browse the repository at this point in the history
  7. Depend on Commons Math explicitly instead of accidentally getting it …

    …from Hadoop (which stops working in 2.2.x) and also use the newer commons-math3
    srowen committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    fd0c5b8 View commit details
    Browse the repository at this point in the history
  8. Merge pull request #492 from skicavs/master

    fixed job name and usage information for the JavaSparkPi example
    pwendell committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    a1238bb View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    4476398 View commit details
    Browse the repository at this point in the history
  10. Merge pull request #495 from srowen/GraphXCommonsMathDependency

    Fix graphx Commons Math dependency
    
    `graphx` depends on Commons Math (2.x) in `SVDPlusPlus.scala`. However the module doesn't declare this dependency. It happens to work because it is included by Hadoop artifacts. But, I can tell you this isn't true as of a month or so ago. Building versus recent Hadoop would fail. (That's how we noticed.)
    
    The simple fix is to declare the dependency, as it should be. But it's also worth noting that `commons-math` is the old-ish 2.x line, while `commons-math3` is where newer 3.x releases are. Drop-in replacement, but different artifact and package name. Changing this only usage to `commons-math3` works, tests pass, and isn't surprising that it does, so is probably also worth changing. (A comment in some test code also references `commons-math3`, FWIW.)
    
    It does raise another question though: `mllib` looks like it uses the `jblas` `DoubleMatrix` for general purpose vector/matrix stuff. Should `graphx` really use Commons Math for this? Beyond the tiny scope here but worth asking.
    pwendell committed Jan 22, 2014
    Configuration menu
    Copy the full SHA
    3184fac View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2014

  1. refactor sparkHome to val

    clean code
    CodingCat committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    2b3c461 View commit details
    Browse the repository at this point in the history
  2. Fix bug in worker clean-up in UI

    Introduced in d5a96fe. This should be picked into 0.8 and 0.9 as well.
    pwendell committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    6285513 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #447 from CodingCat/SPARK-1027

    fix for SPARK-1027
    
    fix for SPARK-1027  (https://spark-project.atlassian.net/browse/SPARK-1027)
    
    FIXES
    
    1. change sparkhome from String to Option(String) in ApplicationDesc
    
    2. remove sparkhome parameter in LaunchExecutor message
    
    3. adjust involved files
    pwendell committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    034dce2 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #496 from pwendell/master

    Fix bug in worker clean-up in UI
    
    Introduced in d5a96fe (/cc @aarondav).
    
    This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
    pwendell committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    a1cd185 View commit details
    Browse the repository at this point in the history
  5. Replace commons-math with jblas

    jdk8 committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    cc0fd33 View commit details
    Browse the repository at this point in the history
  6. Add jblas dependency

    jdk8 committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    a5a513e View commit details
    Browse the repository at this point in the history
  7. Add jblas dependency

    jdk8 committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    19a01c1 View commit details
    Browse the repository at this point in the history
  8. fixed ClassTag in mapPartitions

    eklavya committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    60e7457 View commit details
    Browse the repository at this point in the history
  9. Merge pull request #499 from jianpingjwang/dev1

    Replace commons-math with jblas in SVDPlusPlus
    rxin committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    a2b47da View commit details
    Browse the repository at this point in the history
  10. Merge pull request #406 from eklavya/master

    Extending Java API coverage
    
    Hi,
    
    I have added three new methods to JavaRDD.
    
    Please review and merge.
    JoshRosen committed Jan 23, 2014
    Configuration menu
    Copy the full SHA
    fad6aac View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    0035dbb View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    6156990 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2014

  1. Remove Hadoop object cloning and warn users making Hadoop RDD's.

    The code introduced in #359 used Hadoop's WritableUtils.clone() to
    duplicate objects when reading from Hadoop files. Some users have
    reported exceptions when cloning data in verious file formats,
    including Avro and another custom format.
    
    This patch removes that functionality to ensure stability for the
    0.9 release. Instead, it puts a clear warning in the documentation
    that copying may be necessary for Hadoop data sets.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    7101017 View commit details
    Browse the repository at this point in the history
  2. Fix bug on read-side of external sort when using Snappy.

    This case wasn't handled correctly and this patch fixes it.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    0213b40 View commit details
    Browse the repository at this point in the history
  3. Response to Matei's review

    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    c58d4ea View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f830684 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    268ecbd View commit details
    Browse the repository at this point in the history
  6. Merge pull request #501 from JoshRosen/cartesian-rdd-fixes

    Fix two bugs in PySpark cartesian(): SPARK-978 and SPARK-1034
    
    This pull request fixes two bugs in PySpark's `cartesian()` method:
    
    - [SPARK-978](https://spark-project.atlassian.net/browse/SPARK-978): PySpark's cartesian method throws ClassCastException exception
    - [SPARK-1034](https://spark-project.atlassian.net/browse/SPARK-1034): Py4JException on PySpark Cartesian Result
    
    The JIRAs have more details describing the fixes.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    cad3002 View commit details
    Browse the repository at this point in the history
  7. Merge pull request #502 from pwendell/clone-1

    Remove Hadoop object cloning and warn users making Hadoop RDD's.
    
    The code introduced in #359 used Hadoop's WritableUtils.clone() to
    duplicate objects when reading from Hadoop files. Some users have
    reported exceptions when cloning data in various file formats,
    including Avro and another custom format.
    
    This patch removes that functionality to ensure stability for the
    0.9 release. Instead, it puts a clear warning in the documentation
    that copying may be necessary for Hadoop data sets.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    c319617 View commit details
    Browse the repository at this point in the history
  8. Minor fix

    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    ff44732 View commit details
    Browse the repository at this point in the history
  9. Merge pull request #503 from pwendell/master

    Fix bug on read-side of external sort when using Snappy.
    
    This case wasn't handled correctly and this patch fixes it.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    3d6e754 View commit details
    Browse the repository at this point in the history
  10. Deprecate mapPartitionsWithSplit in PySpark.

    Also, replace the last reference to it in the docs.
    
    This fixes SPARK-1026.
    JoshRosen committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    4cebb79 View commit details
    Browse the repository at this point in the history
  11. Merge pull request #505 from JoshRosen/SPARK-1026

    Deprecate mapPartitionsWithSplit in PySpark (SPARK-1026)
    
    This commit deprecates `mapPartitionsWithSplit` in PySpark (see [SPARK-1026](https://spark-project.atlassian.net/browse/SPARK-1026) and removes the remaining references to it from the docs.
    pwendell committed Jan 24, 2014
    Configuration menu
    Copy the full SHA
    05be704 View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2014

  1. Increase JUnit test verbosity under SBT.

    Upgrade junit-interface plugin from 0.9 to 0.10.
    
    I noticed that the JavaAPISuite tests didn't
    appear to display any output locally or under
    Jenkins, making it difficult to know whether they
    were running.  This change increases the verbosity
    to more closely match the ScalaTest tests.
    JoshRosen committed Jan 26, 2014
    Configuration menu
    Copy the full SHA
    531d9d7 View commit details
    Browse the repository at this point in the history
  2. Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040)

    This fixes an issue where collectAsMap() could
    fail when called on a JavaPairRDD that was derived
    by transforming a non-JavaPairRDD.
    
    The root problem was that we were creating the
    JavaPairRDD's ClassTag by casting a
    ClassTag[AnyRef] to a ClassTag[Tuple2[K2, V2]].
    To fix this, I cast a ClassTag[Tuple2[_, _]]
    instead, since this actually produces a ClassTag
    of the appropriate type because ClassTags don't
    capture type parameters:
    
    scala> implicitly[ClassTag[Tuple2[_, _]]] == implicitly[ClassTag[Tuple2[Int, Int]]]
    res8: Boolean = true
    
    scala> implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[Tuple2[Int, Int]]] == implicitly[ClassTag[Tuple2[Int, Int]]]
    res9: Boolean = false
    JoshRosen committed Jan 26, 2014
    Configuration menu
    Copy the full SHA
    740e865 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #511 from JoshRosen/SPARK-1040

    Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040)
    
    This fixes [SPARK-1040](https://spark-project.atlassian.net/browse/SPARK-1040), an issue where JavaPairRDD.collectAsMap() could sometimes fail with ClassCastException.  I applied the same fix to the Spark Streaming Java APIs.  The commit message describes the fix in more detail.
    
    I also increased the verbosity of JUnit test output under SBT to make it easier to verify that the Java tests are actually running.
    rxin committed Jan 26, 2014
    Configuration menu
    Copy the full SHA
    c66a2ef View commit details
    Browse the repository at this point in the history
  4. Merge pull request #504 from JoshRosen/SPARK-1025

    Fix PySpark hang when input files are deleted (SPARK-1025)
    
    This pull request addresses [SPARK-1025](https://spark-project.atlassian.net/browse/SPARK-1025), an issue where PySpark could hang if its input files were deleted.
    rxin committed Jan 26, 2014
    Configuration menu
    Copy the full SHA
    c40619d View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2014

  1. Configuration menu
    Copy the full SHA
    6a5af7b View commit details
    Browse the repository at this point in the history
  2. Merge pull request #460 from srowen/RandomInitialALSVectors

    Choose initial user/item vectors uniformly on the unit sphere
    
    ...rather than within the unit square to possibly avoid bias in the initial state and improve convergence.
    
    The current implementation picks the N vector elements uniformly at random from [0,1). This means they all point into one quadrant of the vector space. As N gets just a little large, the vector tend strongly to point into the "corner", towards (1,1,1...,1). The vectors are not unit vectors either.
    
    I suggest choosing the elements as Gaussian ~ N(0,1) and normalizing. This gets you uniform random choices on the unit sphere which is more what's of interest here. It has worked a little better for me in the past.
    
    This is pretty minor but wanted to warm up suggesting a few tweaks to ALS.
    Please excuse my Scala, pretty new to it.
    
    Author: Sean Owen <[email protected]>
    
    == Merge branch commits ==
    
    commit 492b13a
    Author: Sean Owen <[email protected]>
    Date:   Mon Jan 27 08:05:25 2014 +0000
    
        Style: spaces around binary operators
    
    commit ce2b5b5
    Author: Sean Owen <[email protected]>
    Date:   Sun Jan 19 22:50:03 2014 +0000
    
        Generate factors with all positive components, per discussion in https://github.com/apache/incubator-spark/pull/460
    
    commit b6f7a8a
    Author: Sean Owen <[email protected]>
    Date:   Sat Jan 18 15:54:42 2014 +0000
    
        Choose initial user/item vectors uniformly on the unit sphere rather than within the unit square to possibly avoid bias in the initial state and improve convergence
    srowen authored and pwendell committed Jan 27, 2014
    Configuration menu
    Copy the full SHA
    f67ce3e View commit details
    Browse the repository at this point in the history
  3. Merge pull request #490 from hsaputra/modify_checkoption_with_isdefined

    Replace the check for None Option with isDefined and isEmpty in Scala code
    
    Propose to replace the Scala check for Option "!= None" with Option.isDefined and "=== None" with Option.isEmpty.
    
    I think this, using method call if possible then operator function plus argument, will make the Scala code easier to read and understand.
    
    Pass compile and tests.
    rxin committed Jan 27, 2014
    Configuration menu
    Copy the full SHA
    f16c21e View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2014

  1. Merge pull request #516 from sarutak/master

    modified SparkPluginBuild.scala to use https protocol for accessing gith...
    
    We cannot build Spark behind a proxy although we execute sbt with -Dhttp(s).proxyHost -Dhttp(s).proxyPort -Dhttp(s).proxyUser -Dhttp(s).proxyPassword options.
    It's because of using git protocol to clone junit_xml_listener.git.
    I could build after modifying SparkPluginBuild.scala.
    
    I reported this issue to JIRA.
    https://spark-project.atlassian.net/browse/SPARK-1046
    rxin committed Jan 28, 2014
    Configuration menu
    Copy the full SHA
    3d5c03e View commit details
    Browse the repository at this point in the history
  2. Merge pull request #466 from liyinan926/file-overwrite-new

    Allow files added through SparkContext.addFile() to be overwritten
    
    This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. For example, a possible use case is: the driver periodically renews a Hadoop delegation token and writes it to a token file. The token file needs to be downloaded by the executors whenever it gets renewed. However, the current implementation throws an exception when the target file exists and its contents do not match those of the new source. This PR adds an option to allow files to be overwritten to support use cases similar to the above.
    rxin committed Jan 28, 2014
    Configuration menu
    Copy the full SHA
    84670f2 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2014

  1. Switch from MUTF8 to UTF8 in PySpark serializers.

    This fixes SPARK-1043, a bug introduced in 0.9.0
    where PySpark couldn't serialize strings > 64kB.
    
    This fix was written by @tyro89 and @bouk in #512.
    This commit squashes and rebases their pull request
    in order to fix some merge conflicts.
    JoshRosen committed Jan 29, 2014
    Configuration menu
    Copy the full SHA
    1381fc7 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #523 from JoshRosen/SPARK-1043

    Switch from MUTF8 to UTF8 in PySpark serializers.
    
    This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB.
    
    This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
    JoshRosen committed Jan 29, 2014
    Configuration menu
    Copy the full SHA
    f8c742c View commit details
    Browse the repository at this point in the history
  3. Merge pull request #497 from tdas/docs-update

    Updated Spark Streaming Programming Guide
    
    Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome.
    
    In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here -
    
    http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html
    
    The major changes are:
    - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources
    - An example right after overview to quickly give an idea of what Spark Streaming program looks like
    - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs)
    - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature
    - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode
    - Added information about linking and using the external input sources like Kafka and Flume
    - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery.
    
    Todos:
    - Links to the docs of external Kafka, Flume, etc
    - Illustrate window operation with figure as well as example.
    
    Author: Tathagata Das <[email protected]>
    
    == Merge branch commits ==
    
    commit 18ff105
    Author: Tathagata Das <[email protected]>
    Date:   Tue Jan 28 21:49:30 2014 -0800
    
        Fixed a lot of broken links.
    
    commit 34a5a60
    Author: Tathagata Das <[email protected]>
    Date:   Tue Jan 28 18:02:28 2014 -0800
    
        Updated github url to use SPARK_GITHUB_URL variable.
    
    commit f338a60
    Author: Tathagata Das <[email protected]>
    Date:   Mon Jan 27 22:42:42 2014 -0800
    
        More updates based on Patrick and Harvey's comments.
    
    commit 89a81ff
    Author: Tathagata Das <[email protected]>
    Date:   Mon Jan 27 13:08:34 2014 -0800
    
        Updated docs based on Patricks PR comments.
    
    commit d5b6196
    Author: Tathagata Das <[email protected]>
    Date:   Sun Jan 26 20:15:58 2014 -0800
    
        Added spark.streaming.unpersist config and info on StreamingListener interface.
    
    commit e3dcb46
    Author: Tathagata Das <[email protected]>
    Date:   Sun Jan 26 18:41:12 2014 -0800
    
        Fixed docs on StreamingContext.getOrCreate.
    
    commit 6c29524
    Author: Tathagata Das <[email protected]>
    Date:   Thu Jan 23 18:49:39 2014 -0800
    
        Added example and figure for window operations, and links to Kafka and Flume API docs.
    
    commit f06b964
    Author: Tathagata Das <[email protected]>
    Date:   Wed Jan 22 22:49:12 2014 -0800
    
        Fixed missing endhighlight tag in the MLlib guide.
    
    commit 036a7d4
    Merge: eab351d a1cd185
    Author: Tathagata Das <[email protected]>
    Date:   Wed Jan 22 22:17:42 2014 -0800
    
        Merge remote-tracking branch 'apache/master' into docs-update
    
    commit eab351d
    Author: Tathagata Das <[email protected]>
    Date:   Wed Jan 22 22:17:15 2014 -0800
    
        Update Spark Streaming Programming Guide.
    tdas authored and pwendell committed Jan 29, 2014
    Configuration menu
    Copy the full SHA
    7930209 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #494 from tyro89/worker_registration_issue

    Issue with failed worker registrations
    
    I've been going through the spark source after having some odd issues with workers dying and not coming back. After some digging (I'm very new to scala and spark) I believe I've found a worker registration issue. It looks to me like a failed registration follows the same code path as a successful registration which end up with workers believing they are connected (since they received a `RegisteredWorker` event) even tho they are not registered on the Master.
    
    This is a quick fix that I hope addresses this issue (assuming I didn't completely miss-read the code and I'm about to look like a silly person :P)
    
    I'm opening this pr now to start a chat with you guys while I do some more testing on my side :)
    
    Author: Erik Selin <[email protected]>
    
    == Merge branch commits ==
    
    commit 973012f
    Author: Erik Selin <[email protected]>
    Date:   Tue Jan 28 23:36:12 2014 -0500
    
        break logwarning into two lines to respect line character limit.
    
    commit e3754dc
    Author: Erik Selin <[email protected]>
    Date:   Tue Jan 28 21:16:21 2014 -0500
    
        add log warning when worker registration fails due to attempt to re-register on same address.
    
    commit 14baca2
    Author: Erik Selin <[email protected]>
    Date:   Wed Jan 22 21:23:26 2014 -0500
    
        address code style comment
    
    commit 71c0d7e
    Author: Erik Selin <[email protected]>
    Date:   Wed Jan 22 16:01:42 2014 -0500
    
        Make a failed registration not persist, not send a `RegisteredWordker` event and not run `schedule` but rather send a `RegisterWorkerFailed` message to the worker attempting to register.
    Erik Selin authored and pwendell committed Jan 29, 2014
    Configuration menu
    Copy the full SHA
    0ff38c2 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2014

  1. Merge pull request #524 from rxin/doc

    Added spark.shuffle.file.buffer.kb to configuration doc.
    
    Author: Reynold Xin <[email protected]>
    
    == Merge branch commits ==
    
    commit 0eea1d7
    Author: Reynold Xin <[email protected]>
    Date:   Wed Jan 29 14:40:48 2014 -0800
    
        Added spark.shuffle.file.buffer.kb to configuration doc.
    rxin committed Jan 30, 2014
    Configuration menu
    Copy the full SHA
    ac712e4 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2014

  1. Merge pull request #527 from ankurdave/graphx-assembly-pom

    Add GraphX to assembly/pom.xml
    
    Author: Ankur Dave <[email protected]>
    
    == Merge branch commits ==
    
    commit bb0b33e
    Author: Ankur Dave <[email protected]>
    Date:   Fri Jan 31 15:24:52 2014 -0800
    
        Add GraphX to assembly/pom.xml
    ankurdave authored and pwendell committed Feb 1, 2014
    Configuration menu
    Copy the full SHA
    a8cf3ec View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2014

  1. Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala

    Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency
    
    Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code.
    This PR is to change it to => to get some consistency on the Scala code.
    
    If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of =>
    
    And remove unused imports found in TwitterInputDStream.scala while I was there =)
    
    Author: Henry Saputra <[email protected]>
    
    == Merge branch commits ==
    
    commit 29c1771
    Author: Henry Saputra <[email protected]>
    Date:   Sat Feb 1 22:05:16 2014 -0800
    
        Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.
    hsaputra authored and rxin committed Feb 3, 2014
    Configuration menu
    Copy the full SHA
    0386f42 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #530 from aarondav/cleanup. Closes #530.

    Remove explicit conversion to PairRDDFunctions in cogroup()
    
    As SparkContext._ is already imported, using the implicit conversion appears to make the code much cleaner. Perhaps there was some sinister reason for doing the conversion explicitly, however.
    
    Author: Aaron Davidson <[email protected]>
    
    == Merge branch commits ==
    
    commit aa4a63f
    Author: Aaron Davidson <[email protected]>
    Date:   Sun Feb 2 23:48:04 2014 -0800
    
        Remove explicit conversion to PairRDDFunctions in cogroup()
    
        As SparkContext._ is already imported, using the implicit conversion
        appears to make the code much cleaner. Perhaps there was some sinister
        reason for doing the converion explicitly, however.
    aarondav authored and rxin committed Feb 3, 2014
    Configuration menu
    Copy the full SHA
    1625d8c View commit details
    Browse the repository at this point in the history
  3. Merge pull request #528 from mengxr/sample. Closes #528.

     Refactor RDD sampling and add randomSplit to RDD (update)
    
    Replace SampledRDD by PartitionwiseSampledRDD, which accepts a RandomSampler instance as input. The current sample with/without replacement can be easily integrated via BernoulliSampler and PoissonSampler. The benefits are:
    
    1) RDD.randomSplit is implemented in the same way, related to https://github.com/apache/incubator-spark/pull/513
    2) Stratified sampling and importance sampling can be implemented in the same manner as well.
    
    Unit tests are included for samplers and RDD.randomSplit.
    
    This should performance better than my previous request where the BernoulliSampler creates many Iterator instances:
    https://github.com/apache/incubator-spark/pull/513
    
    Author: Xiangrui Meng <[email protected]>
    
    == Merge branch commits ==
    
    commit e8ce957
    Author: Xiangrui Meng <[email protected]>
    Date:   Mon Feb 3 12:21:08 2014 -0800
    
        more docs to PartitionwiseSampledRDD
    
    commit fbb4586
    Author: Xiangrui Meng <[email protected]>
    Date:   Mon Feb 3 00:44:23 2014 -0800
    
        move XORShiftRandom to util.random and use it in BernoulliSampler
    
    commit 987456b
    Author: Xiangrui Meng <[email protected]>
    Date:   Sat Feb 1 11:06:59 2014 -0800
    
        relax assertions in SortingSuite because the RangePartitioner has large variance in this case
    
    commit 3690aae
    Author: Xiangrui Meng <[email protected]>
    Date:   Sat Feb 1 09:56:28 2014 -0800
    
        test split ratio of RDD.randomSplit
    
    commit 8a410bc
    Author: Xiangrui Meng <[email protected]>
    Date:   Sat Feb 1 09:25:22 2014 -0800
    
        add a test to ensure seed distribution and minor style update
    
    commit ce7e866
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 31 18:06:22 2014 -0800
    
        minor style change
    
    commit 750912b
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 31 18:04:54 2014 -0800
    
        fix some long lines
    
    commit c446a25
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 31 17:59:59 2014 -0800
    
        add complement to BernoulliSampler and minor style changes
    
    commit dbe2bc2
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 31 17:45:08 2014 -0800
    
        switch to partition-wise sampling for better performance
    
    commit a1fca52
    Merge: ac712e4 cf6128f
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 31 16:33:09 2014 -0800
    
        Merge branch 'sample' of github.com:mengxr/incubator-spark into sample
    
    commit cf6128f
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 14:40:07 2014 -0800
    
        set SampledRDD deprecated in 1.0
    
    commit f430f84
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 14:38:59 2014 -0800
    
        update code style
    
    commit a8b5e20
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 12:56:27 2014 -0800
    
        move package random to util.random
    
    commit ab0fa2c
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 12:50:35 2014 -0800
    
        add Apache headers and update code style
    
    commit 985609f
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 11:49:25 2014 -0800
    
        add new lines
    
    commit b21bddf
    Author: Xiangrui Meng <[email protected]>
    Date:   Sun Jan 26 11:46:35 2014 -0800
    
        move samplers to random.IndependentRandomSampler and add tests
    
    commit c02dacb
    Author: Xiangrui Meng <[email protected]>
    Date:   Sat Jan 25 15:20:24 2014 -0800
    
        add RandomSampler
    
    commit 8ff7ba3
    Author: Xiangrui Meng <[email protected]>
    Date:   Fri Jan 24 13:23:22 2014 -0800
    
        init impl of IndependentlySampledRDD
    mengxr authored and rxin committed Feb 3, 2014
    Configuration menu
    Copy the full SHA
    23af00f View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2014

  1. Merge pull request #535 from sslavic/patch-2. Closes #535.

    Fixed typo in scaladoc
    
    Author: Stevo Slavić <[email protected]>
    
    == Merge branch commits ==
    
    commit 0a77f78
    Author: Stevo Slavić <[email protected]>
    Date:   Tue Feb 4 15:30:27 2014 +0100
    
        Fixed typo in scaladoc
    sslavic authored and rxin committed Feb 4, 2014
    Configuration menu
    Copy the full SHA
    0c05cd3 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #534 from sslavic/patch-1. Closes #534.

    Fixed wrong path to compute-classpath.cmd
    
    compute-classpath.cmd is in bin, not in sbin directory
    
    Author: Stevo Slavić <[email protected]>
    
    == Merge branch commits ==
    
    commit 23deca3
    Author: Stevo Slavić <[email protected]>
    Date:   Tue Feb 4 15:01:47 2014 +0100
    
        Fixed wrong path to compute-classpath.cmd
    
        compute-classpath.cmd is in bin, not in sbin directory
    sslavic authored and rxin committed Feb 4, 2014
    Configuration menu
    Copy the full SHA
    9209287 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2014

  1. Merge pull request #540 from sslavic/patch-3. Closes #540.

    Fix line end character stripping for Windows
    
    LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too).
    
    This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff.
    
    Author: Stevo Slavić <[email protected]>
    
    == Merge branch commits ==
    
    commit 1e43ba0
    Author: Stevo Slavić <[email protected]>
    Date:   Wed Feb 5 14:48:29 2014 +0100
    
        Fix line end character stripping for Windows
    
        LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too).
    
        This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff.
    sslavic authored and rxin committed Feb 5, 2014
    Configuration menu
    Copy the full SHA
    f7fd80d View commit details
    Browse the repository at this point in the history
  2. Merge pull request #544 from kayousterhout/fix_test_warnings. Closes …

    …#544.
    
    Fixed warnings in test compilation.
    
    This commit fixes two problems: a redundant import, and a
    deprecated function.
    
    Author: Kay Ousterhout <[email protected]>
    
    == Merge branch commits ==
    
    commit da9d2e1
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Feb 5 11:41:51 2014 -0800
    
        Fixed warnings in test compilation.
    
        This commit fixes two problems: a redundant import, and a
        deprecated function.
    kayousterhout authored and rxin committed Feb 5, 2014
    Configuration menu
    Copy the full SHA
    cc14ba9 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2014

  1. Merge pull request #549 from CodingCat/deadcode_master. Closes #549.

    remove actorToWorker in master.scala, which is actually not used
    
    actorToWorker is actually not used in the code....just remove it
    
    Author: CodingCat <[email protected]>
    
    == Merge branch commits ==
    
    commit 52656c2
    Author: CodingCat <[email protected]>
    Date:   Thu Feb 6 00:28:26 2014 -0500
    
        remove actorToWorker in master.scala, which is actually not used
    CodingCat authored and rxin committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    18c4ee7 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Close…

    …s #526.
    
    spark on yarn - yarn-client mode doesn't always exit immediately
    
    https://spark-project.atlassian.net/browse/SPARK-1049
    
    If you run in the yarn-client mode but you don't get all the workers you requested right away and then you exit your application, the application master stays around until it gets the number of workers you initially requested. This is a waste of resources.  The AM should exit immediately upon the client going away.
    
    This fix simply checks to see if the driver closed while its waiting for the initial # of workers.
    
    Author: Thomas Graves <[email protected]>
    
    == Merge branch commits ==
    
    commit 03f40a6
    Author: Thomas Graves <[email protected]>
    Date:   Fri Jan 31 11:23:10 2014 -0600
    
        spark on yarn - yarn-client mode doesn't always exit immediately
    tgravescs authored and rxin committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    3802096 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #545 from kayousterhout/fix_progress. Closes #545.

    Fix off-by-one error with task progress info log.
    
    Author: Kay Ousterhout <[email protected]>
    
    == Merge branch commits ==
    
    commit 29798fc
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Feb 5 13:40:01 2014 -0800
    
        Fix off-by-one error with task progress info log.
    kayousterhout authored and rxin committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    79c9552 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #498 from ScrapCodes/python-api. Closes #498.

    Python api additions
    
    Author: Prashant Sharma <[email protected]>
    
    == Merge branch commits ==
    
    commit 8b51591
    Author: Prashant Sharma <[email protected]>
    Date:   Fri Jan 24 11:50:29 2014 +0530
    
        Josh's and Patricks review comments.
    
    commit d37f967
    Author: Prashant Sharma <[email protected]>
    Date:   Thu Jan 23 17:27:17 2014 +0530
    
        fixed doc tests
    
    commit 27cb54b
    Author: Prashant Sharma <[email protected]>
    Date:   Thu Jan 23 16:48:43 2014 +0530
    
        Added keys and values methods for PairFunctions in python
    
    commit 4ce76b3
    Author: Prashant Sharma <[email protected]>
    Date:   Thu Jan 23 13:51:26 2014 +0530
    
        Added foreachPartition
    
    commit 05f0534
    Author: Prashant Sharma <[email protected]>
    Date:   Thu Jan 23 13:02:59 2014 +0530
    
        Added coalesce fucntion to python API
    
    commit 6568d2c
    Author: Prashant Sharma <[email protected]>
    Date:   Thu Jan 23 12:52:44 2014 +0530
    
        added repartition function to python API.
    ScrapCodes authored and JoshRosen committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    084839b View commit details
    Browse the repository at this point in the history
  5. Merge pull request #554 from sryza/sandy-spark-1056. Closes #554.

    SPARK-1056. Fix header comment in Executor to not imply that it's only u...
    
    ...sed for Mesos and Standalone.
    
    Author: Sandy Ryza <[email protected]>
    
    == Merge branch commits ==
    
    commit 1f2443d
    Author: Sandy Ryza <[email protected]>
    Date:   Thu Feb 6 15:03:50 2014 -0800
    
        SPARK-1056. Fix header comment in Executor to not imply that it's only used for Mesos and Standalone
    sryza authored and rxin committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    446403b View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2014

  1. Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321.

    Inform DAG scheduler about all started/finished tasks.
    
    Previously, the DAG scheduler was not always informed
    when tasks started and finished. The simplest example here
    is for speculated tasks: the DAGScheduler was only told about
    the first attempt of a task, meaning that SparkListeners were
    also not told about multiple task attempts, so users can't see
    what's going on with speculation in the UI.  The DAGScheduler
    also wasn't always told about finished tasks, so in the UI, some
    tasks will never be shown as finished (this occurs, for example,
    if a task set gets killed).
    
    The other problem is that the fairness accounting was wrong
    -- the number of running tasks in a pool was decreased when a
    task set was considered done, even if all of its tasks hadn't
    yet finished.
    
    Author: Kay Ousterhout <[email protected]>
    
    == Merge branch commits ==
    
    commit c8d547d
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Jan 15 16:47:33 2014 -0800
    
        Addressed Reynold's review comments.
    
        Always use a TaskEndReason (remove the option), and explicitly
        signal when we don't know the reason. Also, always tell
        DAGScheduler (and associated listeners) about started tasks, even
        when they're speculated.
    
    commit 3fee1e2
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Jan 8 22:58:13 2014 -0800
    
        Fixed broken test and improved logging
    
    commit ff12fca
    Author: Kay Ousterhout <[email protected]>
    Date:   Sun Dec 29 21:08:20 2013 -0800
    
        Inform DAG scheduler about all finished tasks.
    
        Previously, the DAG scheduler was not always informed
        when tasks finished. For example, when a task set was
        aborted, the DAG scheduler was never told when the tasks
        in that task set finished. The DAG scheduler was also
        never told about the completion of speculated tasks.
        This led to confusion with SparkListeners because information
        about the completion of those tasks was never passed on to
        the listeners (so in the UI, for example, some tasks will never
        be shown as finished).
    
        The other problem is that the fairness accounting was wrong
        -- the number of running tasks in a pool was decreased when a
        task set was considered done, even if all of its tasks hadn't
        yet finished.
    kayousterhout authored and pwendell committed Feb 7, 2014
    Configuration menu
    Copy the full SHA
    18ad59e View commit details
    Browse the repository at this point in the history
  2. Merge pull request #450 from kayousterhout/fetch_failures. Closes #450.

    Only run ResubmitFailedStages event after a fetch fails
    
    Previously, the ResubmitFailedStages event was called every
    200 milliseconds, leading to a lot of unnecessary event processing
    and clogged DAGScheduler logs.
    
    Author: Kay Ousterhout <[email protected]>
    
    == Merge branch commits ==
    
    commit e603784
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Feb 5 11:34:41 2014 -0800
    
        Re-add check for empty set of failed stages
    
    commit d258f0e
    Author: Kay Ousterhout <[email protected]>
    Date:   Wed Jan 15 23:35:41 2014 -0800
    
        Only run ResubmitFailedStages event after a fetch fails
    
        Previously, the ResubmitFailedStages event was called every
        200 milliseconds, leading to a lot of unnecessary event processing
        and clogged DAGScheduler logs.
    kayousterhout authored and pwendell committed Feb 7, 2014
    Configuration menu
    Copy the full SHA
    0b448df View commit details
    Browse the repository at this point in the history
  3. Merge pull request #533 from andrewor14/master. Closes #533.

    External spilling - generalize batching logic
    
    The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way.
    
    Author: Andrew Or <[email protected]>
    
    == Merge branch commits ==
    
    commit 3ddeb7e
    Author: Andrew Or <[email protected]>
    Date:   Wed Feb 5 12:09:32 2014 -0800
    
        Also privatize fields
    
    commit 090544a
    Author: Andrew Or <[email protected]>
    Date:   Wed Feb 5 10:58:23 2014 -0800
    
        Privatize methods
    
    commit 13920c9
    Author: Andrew Or <[email protected]>
    Date:   Tue Feb 4 16:34:15 2014 -0800
    
        Update docs
    
    commit bd5a1d7
    Author: Andrew Or <[email protected]>
    Date:   Tue Feb 4 13:44:24 2014 -0800
    
        Typo: phyiscal -> physical
    
    commit 287ef44
    Author: Andrew Or <[email protected]>
    Date:   Tue Feb 4 13:38:32 2014 -0800
    
        Avoid reading the entire batch into memory; also simplify streaming logic
    
        Additionally, address formatting comments.
    
    commit 3df7005
    Merge: a531d2e 164489d
    Author: Andrew Or <[email protected]>
    Date:   Mon Feb 3 18:27:49 2014 -0800
    
        Merge branch 'master' of github.com:andrewor14/incubator-spark
    
    commit a531d2e
    Author: Andrew Or <[email protected]>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
    
    commit 164489d
    Author: Andrew Or <[email protected]>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
    andrewor14 authored and pwendell committed Feb 7, 2014
    Configuration menu
    Copy the full SHA
    1896c6e View commit details
    Browse the repository at this point in the history
  4. Merge pull request #506 from ash211/intersection. Closes #506.

    SPARK-1062 Add rdd.intersection(otherRdd) method
    
    Author: Andrew Ash <[email protected]>
    
    == Merge branch commits ==
    
    commit 5d9982b
    Author: Andrew Ash <[email protected]>
    Date:   Thu Feb 6 18:11:45 2014 -0800
    
        Minor fixes
    
        - style: (v,null) => (v, null)
        - mention the shuffle in Javadoc
    
    commit b86d02f
    Author: Andrew Ash <[email protected]>
    Date:   Sun Feb 2 13:17:40 2014 -0800
    
        Overload .intersection() for numPartitions and custom Partitioner
    
    commit bcaa349
    Author: Andrew Ash <[email protected]>
    Date:   Sun Feb 2 13:05:40 2014 -0800
    
        Better naming of parameters in intersection's filter
    
    commit b10a6af
    Author: Andrew Ash <[email protected]>
    Date:   Sat Jan 25 23:06:26 2014 -0800
    
        Follow spark code format conventions of tab => 2 spaces
    
    commit 965256e
    Author: Andrew Ash <[email protected]>
    Date:   Fri Jan 24 00:28:01 2014 -0800
    
        Add rdd.intersection(otherRdd) method
    ash211 authored and pwendell committed Feb 7, 2014
    Configuration menu
    Copy the full SHA
    3a9d82c View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2014

  1. Merge pull request #552 from martinjaggi/master. Closes #552.

    tex formulas in the documentation
    
    using mathjax.
    and spliting the MLlib documentation by techniques
    
    see jira
    https://spark-project.atlassian.net/browse/MLLIB-19
    and
    https://github.com/shivaram/spark/compare/mathjax
    
    Author: Martin Jaggi <[email protected]>
    
    == Merge branch commits ==
    
    commit 0364bfa
    Author: Martin Jaggi <[email protected]>
    Date:   Fri Feb 7 03:19:38 2014 +0100
    
        minor polishing, as suggested by @pwendell
    
    commit dcd2142
    Author: Martin Jaggi <[email protected]>
    Date:   Thu Feb 6 18:04:26 2014 +0100
    
        enabling inline latex formulas with $.$
    
        same mathjax configuration as used in math.stackexchange.com
    
        sample usage in the linear algebra (SVD) documentation
    
    commit bbafafd
    Author: Martin Jaggi <[email protected]>
    Date:   Thu Feb 6 17:31:29 2014 +0100
    
        split MLlib documentation by techniques
    
        and linked from the main mllib-guide.md site
    
    commit d1c5212
    Author: Martin Jaggi <[email protected]>
    Date:   Thu Feb 6 16:59:43 2014 +0100
    
        enable mathjax formula in the .md documentation files
    
        code by @shivaram
    
    commit d73948d
    Author: Martin Jaggi <[email protected]>
    Date:   Thu Feb 6 16:57:23 2014 +0100
    
        minor update on how to compile the documentation
    martinjaggi authored and pwendell committed Feb 8, 2014
    Configuration menu
    Copy the full SHA
    fabf174 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #454 from jey/atomic-sbt-download. Closes #454.

    Make sbt download an atomic operation
    
    Modifies the `sbt/sbt` script to gracefully recover when a previous invocation died in the middle of downloading the SBT jar.
    
    Author: Jey Kottalam <[email protected]>
    
    == Merge branch commits ==
    
    commit 6c600eb
    Author: Jey Kottalam <[email protected]>
    Date:   Fri Jan 17 10:43:54 2014 -0800
    
        Make sbt download an atomic operation
    jey authored and pwendell committed Feb 8, 2014
    Configuration menu
    Copy the full SHA
    7805080 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #561 from Qiuzhuang/master. Closes #561.

    Kill drivers in postStop() for Worker.
    
     JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
    
    Author: Qiuzhuang Lian <[email protected]>
    
    == Merge branch commits ==
    
    commit 9c19ce6
    Author: Qiuzhuang Lian <[email protected]>
    Date:   Sat Feb 8 16:07:39 2014 +0800
    
        Kill drivers in postStop() for Worker.
         JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
    Qiuzhuang authored and pwendell committed Feb 8, 2014
    Configuration menu
    Copy the full SHA
    f0ce736 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2014

  1. Merge pull request #542 from markhamstra/versionBump. Closes #542.

    Version number to 1.0.0-SNAPSHOT
    
    Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.
    
    @pwendell
    
    Author: Mark Hamstra <[email protected]>
    
    == Merge branch commits ==
    
    commit 1b00a8a
    Author: Mark Hamstra <[email protected]>
    Date:   Wed Feb 5 09:30:32 2014 -0800
    
        Version number to 1.0.0-SNAPSHOT
    markhamstra authored and pwendell committed Feb 9, 2014
    Configuration menu
    Copy the full SHA
    c2341c9 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #565 from pwendell/dev-scripts. Closes #565.

    SPARK-1066: Add developer scripts to repository.
    
    These are some developer scripts I've been maintaining in a separate public repo. This patch adds them to the Spark repository so they can evolve here and are clearly accessible to all committers.
    
    I may do some small additional clean-up in this PR, but wanted to put them here in case others want to review. There are a few types of scripts here:
    
    1. A tool to merge pull requests.
    2. A script for packaging releases.
    3. A script for auditing release candidates.
    
    Author: Patrick Wendell <[email protected]>
    
    == Merge branch commits ==
    
    commit 5d5d331
    Author: Patrick Wendell <[email protected]>
    Date:   Sat Feb 8 22:11:47 2014 -0800
    
        SPARK-1066: Add developer scripts to repository.
    pwendell authored and rxin committed Feb 9, 2014
    Configuration menu
    Copy the full SHA
    f892da8 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #560 from pwendell/logging. Closes #560.

    [WIP] SPARK-1067: Default log4j initialization causes errors for those not using log4j
    
    To fix this - we add a check when initializing log4j.
    
    Author: Patrick Wendell <[email protected]>
    
    == Merge branch commits ==
    
    commit ffdce51
    Author: Patrick Wendell <[email protected]>
    Date:   Fri Feb 7 15:22:29 2014 -0800
    
        Logging fix
    pwendell authored and rxin committed Feb 9, 2014
    Configuration menu
    Copy the full SHA
    b6d40b7 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #562 from jyotiska/master. Closes #562.

    Added example Python code for sort
    
    I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code.
    
    Author: jyotiska <[email protected]>
    
    == Merge branch commits ==
    
    commit 8ad8faf
    Author: jyotiska <[email protected]>
    Date:   Sun Feb 9 11:00:41 2014 +0530
    
        Added comments in code on collect() method
    
    commit 6f98f1e
    Author: jyotiska <[email protected]>
    Date:   Sat Feb 8 13:12:37 2014 +0530
    
        Updated python example code sort.py
    
    commit 945e39a
    Author: jyotiska <[email protected]>
    Date:   Sat Feb 8 12:59:09 2014 +0530
    
        Added example python code for sort
    jyotiska authored and rxin committed Feb 9, 2014
    Configuration menu
    Copy the full SHA
    2ef37c9 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #556 from CodingCat/JettyUtil. Closes #556.

    [SPARK-1060] startJettyServer should explicitly use IP information
    
    https://spark-project.atlassian.net/browse/SPARK-1060
    
    In the current implementation, the webserver in Master/Worker is started with
    
    val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers)
    
    inside startJettyServer:
    
    val server = new Server(currentPort) //here, the Server will take "0.0.0.0" as the hostname, i.e. will always bind to the IP address of the first NIC
    
    this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when starting the web server, for the reason stated above, it will always bind to the N1's address
    
    Author: CodingCat <[email protected]>
    
    == Merge branch commits ==
    
    commit 6c6d9a8
    Author: CodingCat <[email protected]>
    Date:   Thu Feb 6 14:53:34 2014 -0500
    
        startJettyServer should explicitly use IP information
    CodingCat authored and rxin committed Feb 9, 2014
    Configuration menu
    Copy the full SHA
    b6dba10 View commit details
    Browse the repository at this point in the history