gomem

Packages for working with Apache Arrow in Go.

Included in gomem is a DataFrame implementation. It uses Apache Arrow (Go) under the hood to store and manipulate data in a columnar format.

Packages

Tables	Description	Link
dataframe	A DataFrame implementation using Arrow.	code
collection	Abstract access to Arrow arrays using gomem Objects.	code
iterator	Iterators for iterating over Arrow arrays.	code
logical	Abstract logical types.	code
object	Abstract object type capable of automatically converting Object types.	code
smartbuilder	Abstract Arrow array builder.	code

dataframe

A DataFrame built on Apache Arrow.

Installation

Add the package to your go.mod file:

require github.com/gomem/gomem master

Or, clone the repository:

git clone --branch master https://github.com/gomem/gomem.git $GOPATH/src/github.com/gomem/gomem

A complete example:

mkdir my-dataframe-app && cd my-dataframe-app

cat > go.mod <<-END
  module my-dataframe-app

  require github.com/gomem/gomem master
END

cat > main.go <<-END
  package main

  import (
    "fmt"

    "github.com/apache/arrow/go/arrow/memory"
    "github.com/gomem/gomem/pkg/dataframe
  )

  func main() {
    pool := memory.NewGoAllocator()
    df, _ := dataframe.NewDataFrameFromMem(pool, dataframe.Dict{
      "col1": []int32{1, 2, 3, 4, 5},
      "col2": []float64{1.1, 2.2, 3.3, 4.4, 5},
      "col3": []string{"foo", "bar", "ping", "", "pong"},
      "col4": []interface{}{2, 4, 6, nil, 8},
    })
    defer df.Release()
    fmt.Printf("DataFrame:\n%s\n", df.Display(0))
  }

  // DataFrame:
  // rec[0]["col1"]: [1 2 3 4 5]
  // rec[0]["col2"]: [1.1 2.2 3.3 4.4 5]
  // rec[0]["col3"]: ["foo" "bar" "ping" "" "pong"]
  // rec[0]["col4"]: [2 4 6 (null) 8]
END

go run main.go

Arrow Array Usage

See the DataFrame tests for extensive usage examples.

Reference Counting

From the arrow/go README...

The library makes use of reference counting so that it can track when memory buffers are no longer used. This allows Arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call `Retain` / `Release`?

If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.
You own any object you create via functions whose name begins with New or Copy or any operation that results in a new immutable DataFrame being returned or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.
If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

Note: You can write a test using memory.NewCheckedAllocator to assert that you have released all resources properly. See: tests

TODO

This DataFrame currently implements most of the scalar types we've come across. There is still work to be done on some of the list and struct types. Feel free to submit a PR if find you need them. This library will let you know when you do.

Implement all Arrow DataTypes.
Add a filter function to DataFrame.
Add an order by function to DataFrame.

Contributing

Pull requests are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.circleci		.circleci
_tools		_tools
internal		internal
pkg		pkg
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
NOTICE.txt		NOTICE.txt
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gomem

Packages

dataframe

Installation

Arrow Array Usage

Reference Counting

When to call `Retain` / `Release`?

TODO

Contributing

License

About

Releases

Packages

Languages

License

tgruben/gomem

Folders and files

Latest commit

History

Repository files navigation

gomem

Packages

dataframe

Installation

Arrow Array Usage

Reference Counting

When to call Retain / Release?

TODO

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

When to call `Retain` / `Release`?

Packages