Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] [RFC] Add SOMASparseNDArray$read_spam_matrix() #1367

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mojaveazure
Copy link
Member

Add support for reading spam matrices from sparse arrays. This allows users to read in data exceding the capacity of Matrix-based sparse matrices (nnz > .Machine$integer.max || dim(mat) > .Machine$integer.max)

Implemented SOMA methods:

  • SOMASparseNDArray$read_spam_matrix(): allows reading in an array as a spam matrix (both 32- and 64-bit)

Future work:

  • determine if this is desired functionality
  • we may want to combine all sparse-matrix readers into one read_sparse_matrix() or read_matrix() method with options to determine how to realize the matrix

Note for reviewers:

I don't know if this is functionality we want to support. I'm more than happy to say that we don't want to support spam, in which case we can close and move on

partially addresses #1348

Add support for reading spam matrices from sparse arrays. This allows
users to read in data exceding the capacity of Matrix-based sparse
matrices (`nnz > .Machine$integer.max || dim(mat) > .Machine$integer.max`)

Implemented SOMA methods:

- `SOMASparseNDArray$read_spam_matrix()`: allows reading in an array as
  a spam matrix (both 32- and 64-bit)

Future work:

- determine if this is desired functionality
- we may want to combine all sparse-matrix readers into one
  `read_sparse_matrix()` or `read_matrix()` method with options to
determine how to realize the matrix

partially addresses 1348
Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I think that is a great and reasonably "free" option for us on capital-l "Large" data

_Edit: Didn't see the test breakage. Odd. Any pointers on this @mojaveazure ?

@mojaveazure
Copy link
Member Author

mojaveazure commented May 11, 2023

I think it's because I call expect_no_condition() and package startup messages are conditions. This causes the test to fail and the resulting spam matrix to be not found, which causes errors when I try to access it downstream. I'm attaching spam in $read_spam_matrix as spam matrices are useless without the spam package being attached

I've updated the test to library(spam) before running anything (after the skip_if_not_installed) to ensure that spam is attached. That should prevent the expect_no_condition() from failing (and I like expect_no_condition() as it'll fail on warnings that I may not have handled properly)

@eddelbuettel
Copy link
Contributor

Looks like we get other repurcussions from being a spam object and not inheriting from Matrix?

@mojaveazure
Copy link
Member Author

Yeah; that and t,Matrix() is an S4 method using an implicit S4 generic defined by Matrix, not an S3 method on base::t()

> base::t
function (x) 
UseMethod("t")
<bytecode: 0x560e860bc968>
<environment: namespace:base>
> Matrix::t
standardGeneric for "t" defined from package "base"

function (x) 
standardGeneric("t")
<environment: 0x560e860e89c8>
Methods may be defined for arguments: x
Use  showMethods(t)  for currently available ones.

S4 generics/methods get weird with implicit declaration and overriding S3 generics

@codecov-commenter
Copy link

Codecov Report

Patch coverage has no change and project coverage change: -11.29 ⚠️

Comparison is base (473ab21) 65.11% compared to head (735b43f) 53.83%.

❗ Current head 735b43f differs from pull request most recent head f33a01b. Consider uploading reports for the commit f33a01b to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1367       +/-   ##
===========================================
- Coverage   65.11%   53.83%   -11.29%     
===========================================
  Files          91       68       -23     
  Lines        7583     5430     -2153     
===========================================
- Hits         4938     2923     -2015     
+ Misses       2645     2507      -138     
Flag Coverage Δ
python ?
r 53.83% <ø> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 42 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pablo-gar
Copy link
Member

pablo-gar commented May 12, 2023

look great!

Can we hold on on merging this PR? I introduced some design decisions on #1274. For example the addition of SOMASparseNDarray$convert_coords() and utils-readTransformers.R where we should put the arrow-to-spam conversion. And we would also likely want to have an iterated version of read_spam_matrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants