Skip to content

0.5.0

Compare
Choose a tag to compare
@nirandaperera nirandaperera released this 16 Dec 19:29

Cylon 0.5.0 is a major release. We are excited to present GCylon, cudf-based distributed
DataFrame for Nvidia GPUs, UCX integration, Anaconda support, and much more.

Features

Cylon C++ and Python

  • Adding UCX integration with MPI
  • Adding read distribution
  • Changing join column naming convention to match SQL and pandas
  • Adding Dataframe.applymap, Dataframe.isin
  • Add iloc operation to DataFrame
  • Adding null handling to table operators and Comparators
  • Adding Equal/ distributed equal operators
  • Adding array flattening
  • Adding Repartition
  • Adding mapreduce style group-by aggregators
  • Adding table level AllGather, Gather and Broadcast operators
  • Performance improvements and bug fixes

Build

  • Updating to Arrow 0.5.x
  • Windows build support
  • MacOS build support
  • Conda build is the default build
  • Improving docker build

Gcylon

First release of Gcylon which supports distributed DataFrame processing on Nvidia GPUs using CuDF:

  • Implemented shuffling and distributed sorting
  • Distributed Join/merge
  • Distributed GroupBy
  • DataFrame Set operations
  • Repartitioning DataFrames
  • Distributed IO for reading/writing CSV, JSON and Parquet files

You can download source code from Github
Conda binaries are available in Anaconda

Commits

3344bf9 Mapreduce style group-by aggregators (#535)
50ef890 Remove minor warnings (#544)
559e8eb Adding CPU serializer (#539)
abb4404 fixed unused variable/parameter and casting warnings (#542)
62a3f08 Distributed IO (#533)
15d06d6 Bump color-string from 1.5.4 to 1.7.4 in /docs (#534)
810c4ed fixing RNG issue (#538)
fbb049b fixing build error (#536)
a10e052 Bump algoliasearch-helper from 3.3.3 to 3.6.2 in /docs (#532)
112ea97 Repartition - CPU (#526)
79c4b73 create a MacOS yml file (#530)
b9e7a8c Repartition - GPU (#528)
2191b9f fixed function name change in cudf api from gcylon test files (#529)
3e9036e Upgrading to arrow 5.0.0 (#525)
24d182a Groupby values null handling (#527)
54a5074 Null handling for Comparators (#524)
0b9516e Adding array flattening (#522)
b3fc2a2 Implemented MergeOrSort when merging sorted tables (#523)
1e061b2 Feature/equal (#499)
e378d1d reformatted gcylon codes with tab size 2, non-functional changes (#521)
8450d9b Added support for sliced tables in gather, broadcast and sorting (#520)
92b8124 Update windows.yml
1f9790d Update macos.yml
d33f9ac Update conda-actions.yml
963d491 Update c-cpp.yml
2229981 added mpi datatype dispatching for primitive data types (#519)
d9936b4 Head tail operators (#512)
ac99d00 Formatting code (#518)
fff84cc Code formatting (#517)
f32f04d Null handling in splitters and build arrays (#511)
4cab7ca Delete files from CPP example folder that are not needed (#516)
d174430 moving tutorial repo to (#514)
9cd7911 Python example cleanup (#513)
fe4caf3 Distributed sorting (#510)
2302f58 Minor improvements to the Table API (#508)
71eb80a adding new test utils (#507)
24b83dd Adding to docker docs (#498)
6f2faf8 Update conda.md
4f8f3c7 Gcylon docs (#501)
a786258 Adding contributing guide to documentation (#496)
8ab8b2d changing join column naming convention to match SQL and pandas (#487)
f18b91f improvements to ucx build from conda (#484)
912fb54 Windows build (#482)
216758a making improvements to the build (#483)
4e2894e Add functions to dataframe (#481)
1f1ddd9 Documentation update (#479)
e623315 Bump tar from 6.1.5 to 6.1.11 in /docs (#477)
1e5db7b improve docs (#476)
58c0595 removing extra examples (#474)
3c823f6 Gcylon integration (#470)
92748eb Cpp example cleanup (#475)
fa14527 Docs improvements (#469)
1306220 Bump url-parse from 1.4.7 to 1.5.3 in /docs (#473)
8234ae7 Bump path-parse from 1.0.6 to 1.0.7 in /docs (#472)
c8b435b Bump tar from 6.0.5 to 6.1.5 in /docs (#471)
1cc28dd Performance improvements (#453)
9092bbf MacOS build (#464)
d59d91e Add iloc operation to DataFrame (#465)
8d7a8dc Removed glog files from the header files (#463)
ea62eef License updates (#462)
2f56265 changed all relative Cylon header references to global (#461)
123c93c Building in conda env without using conda-build (#457)
3b3a285 Compilation document improvements (#454)
8578b1f Adding barrier at the end of the test case (#458)
e6eded5 Fix for empty df (#455)
8f14992 Fixed mpi test case (#456)
cb06998 Changes to the Docs (#451)
4ce1d7e updates to the docker readme
e011e0f enhancing readme
adfa6c0 adding read distribution (#432)
bd2e024 UCX integration (#439)
a42d04a Bump ws from 6.2.1 to 6.2.2 in /docs (#437)
710b562 Bump dns-packet from 1.3.1 to 1.3.4 in /docs (#435)
07aee74 adding new operators to DataFrame API (#429)
71e57f8 Updating to arrow 4.0 (#418)
a490dc2 changing ctx to const reference in methods (#419)
18a5447 missing docs (#428)
38534f5 0.4.1 release (#427)
10f5a6a Enabling scalars in df set_item (#425)
0be7897 Op bench refactor (#417)
ec964d8 Bug fixes in dataframe (#420)
e0ba964 Update c-cpp.yml
0200c02 adding finalize check and removing destructor finalize call. (#412)
149919c Update README.md
016c5c9 adding missing test case
5609535 Update README.md
e3ca0bf 0.4.0 release (#411)

Contributors

Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0