0.5.0
Cylon 0.5.0 is a major release. We are excited to present GCylon, cudf-based distributed
DataFrame for Nvidia GPUs, UCX integration, Anaconda support, and much more.
Features
Cylon C++ and Python
- Adding UCX integration with MPI
- Adding read distribution
- Changing join column naming convention to match SQL and pandas
- Adding
Dataframe.applymap
,Dataframe.isin
- Add iloc operation to DataFrame
- Adding null handling to table operators and Comparators
- Adding Equal/ distributed equal operators
- Adding array flattening
- Adding Repartition
- Adding mapreduce style group-by aggregators
- Adding table level AllGather, Gather and Broadcast operators
- Performance improvements and bug fixes
Build
- Updating to Arrow 0.5.x
- Windows build support
- MacOS build support
- Conda build is the default build
- Improving docker build
Gcylon
First release of Gcylon which supports distributed DataFrame processing on Nvidia GPUs using CuDF:
- Implemented shuffling and distributed sorting
- Distributed Join/merge
- Distributed GroupBy
- DataFrame Set operations
- Repartitioning DataFrames
- Distributed IO for reading/writing CSV, JSON and Parquet files
You can download source code from Github
Conda binaries are available in Anaconda
Commits
3344bf9 Mapreduce style group-by aggregators (#535)
50ef890 Remove minor warnings (#544)
559e8eb Adding CPU serializer (#539)
abb4404 fixed unused variable/parameter and casting warnings (#542)
62a3f08 Distributed IO (#533)
15d06d6 Bump color-string from 1.5.4 to 1.7.4 in /docs (#534)
810c4ed fixing RNG issue (#538)
fbb049b fixing build error (#536)
a10e052 Bump algoliasearch-helper from 3.3.3 to 3.6.2 in /docs (#532)
112ea97 Repartition - CPU (#526)
79c4b73 create a MacOS yml file (#530)
b9e7a8c Repartition - GPU (#528)
2191b9f fixed function name change in cudf api from gcylon test files (#529)
3e9036e Upgrading to arrow 5.0.0 (#525)
24d182a Groupby values null handling (#527)
54a5074 Null handling for Comparators (#524)
0b9516e Adding array flattening (#522)
b3fc2a2 Implemented MergeOrSort when merging sorted tables (#523)
1e061b2 Feature/equal (#499)
e378d1d reformatted gcylon codes with tab size 2, non-functional changes (#521)
8450d9b Added support for sliced tables in gather, broadcast and sorting (#520)
92b8124 Update windows.yml
1f9790d Update macos.yml
d33f9ac Update conda-actions.yml
963d491 Update c-cpp.yml
2229981 added mpi datatype dispatching for primitive data types (#519)
d9936b4 Head tail operators (#512)
ac99d00 Formatting code (#518)
fff84cc Code formatting (#517)
f32f04d Null handling in splitters and build arrays (#511)
4cab7ca Delete files from CPP example folder that are not needed (#516)
d174430 moving tutorial repo to (#514)
9cd7911 Python example cleanup (#513)
fe4caf3 Distributed sorting (#510)
2302f58 Minor improvements to the Table API (#508)
71eb80a adding new test utils (#507)
24b83dd Adding to docker docs (#498)
6f2faf8 Update conda.md
4f8f3c7 Gcylon docs (#501)
a786258 Adding contributing guide to documentation (#496)
8ab8b2d changing join column naming convention to match SQL and pandas (#487)
f18b91f improvements to ucx build from conda (#484)
912fb54 Windows build (#482)
216758a making improvements to the build (#483)
4e2894e Add functions to dataframe (#481)
1f1ddd9 Documentation update (#479)
e623315 Bump tar from 6.1.5 to 6.1.11 in /docs (#477)
1e5db7b improve docs (#476)
58c0595 removing extra examples (#474)
3c823f6 Gcylon integration (#470)
92748eb Cpp example cleanup (#475)
fa14527 Docs improvements (#469)
1306220 Bump url-parse from 1.4.7 to 1.5.3 in /docs (#473)
8234ae7 Bump path-parse from 1.0.6 to 1.0.7 in /docs (#472)
c8b435b Bump tar from 6.0.5 to 6.1.5 in /docs (#471)
1cc28dd Performance improvements (#453)
9092bbf MacOS build (#464)
d59d91e Add iloc operation to DataFrame (#465)
8d7a8dc Removed glog files from the header files (#463)
ea62eef License updates (#462)
2f56265 changed all relative Cylon header references to global (#461)
123c93c Building in conda env without using conda-build (#457)
3b3a285 Compilation document improvements (#454)
8578b1f Adding barrier at the end of the test case (#458)
e6eded5 Fix for empty df (#455)
8f14992 Fixed mpi test case (#456)
cb06998 Changes to the Docs (#451)
4ce1d7e updates to the docker readme
e011e0f enhancing readme
adfa6c0 adding read distribution (#432)
bd2e024 UCX integration (#439)
a42d04a Bump ws from 6.2.1 to 6.2.2 in /docs (#437)
710b562 Bump dns-packet from 1.3.1 to 1.3.4 in /docs (#435)
07aee74 adding new operators to DataFrame API (#429)
71e57f8 Updating to arrow 4.0 (#418)
a490dc2 changing ctx to const reference in methods (#419)
18a5447 missing docs (#428)
38534f5 0.4.1 release (#427)
10f5a6a Enabling scalars in df set_item (#425)
0be7897 Op bench refactor (#417)
ec964d8 Bug fixes in dataframe (#420)
e0ba964 Update c-cpp.yml
0200c02 adding finalize check and removing destructor finalize call. (#412)
149919c Update README.md
016c5c9 adding missing test case
5609535 Update README.md
e3ca0bf 0.4.0 release (#411)
Contributors
Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0