Expose `n_bins` argument to `cebra_sklearn_helpers.align_embeddings` instead of fixing default value internally #24

drsax93 · 2023-06-23T16:31:49Z

Is there an existing issue for this?

I have searched the existing issues

Bug description

Hello,
I am trying to compute the consistency score across different embeddings from hippocampal population activity that have been obtained using 2d tracking position as the auxiliary variable.
To compute the consistency score I have tried to use as labels either the linearised 2d position or another discrete labelling, but I get an error in cebra_sklearn_helpers.align_embeddings when quantising the embeddings with new labels. I believe it might be due to the high number of bins (n_bins) used within the _coarse_to_fine() function.. What do you think the issue may be?

Operating System

operating system: Ubuntu 20.04

CEBRA version

cebra version 0.2.0

Device type

gpu

Steps To Reproduce

Here is a snippet of the code

# Between-datasets consistency, by aligning on the labels
import cebra

 # embeddings as list of np.ndarrays
embds = [cebra_w[e][m] for e in exps for m in MICE[:2]]
 # labels as list of 1d np.ndarrays() w linearised tracking position
labels = [lineariseTrack(track[e][m][:,0], track[e][m][:,1], binsize=30)\
          for e in exps for m in MICE[:2]]

scores, pairs, datasets = cebra.sklearn.metrics.consistency_score(embeddings=embds,
                                                                  labels=labels,
                                                                  between="datasets")

Relevant log output

ValueError                                Traceback (most recent call last)
Cell In[16], line 7
      3 embds = [cebra_w[e][m] for e in exps for m in MICE[:2]]
      4 labels = [lineariseTrack(track[e][m][:,0], track[e][m][:,1], binsize=30)\
      5           for e in exps for m in MICE[:2]]
----> 7 scores, pairs, datasets = cebra.sklearn.metrics.consistency_score(embeddings=embds,
      8                                                                   labels=labels,
      9                                                                   between="datasets")
     10 cebra.plot_consistency(scores, pairs=pairs, datasets=subjects, colorbar_label=None)

File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/metrics.py:362, in consistency_score(embeddings, between, labels, dataset_ids)
    359     scores, pairs, datasets = _consistency_runs(embeddings=embeddings,
    360                                                 dataset_ids=dataset_ids)
    361 elif between == "datasets":
--> 362     scores, pairs, datasets = _consistency_datasets(embeddings=embeddings,
    363                                                     dataset_ids=dataset_ids,
    364                                                     labels=labels)
    365 else:
    366     raise NotImplementedError(
    367         f"Invalid comparison, got between={between}, expects either datasets or runs."
    368     )

File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/metrics.py:205, in _consistency_datasets(embeddings, dataset_ids, labels)
    200     raise ValueError(
    201         "Invalid number of dataset_ids, expect more than one dataset to perform the comparison, "
    202         f"got {len(datasets)}")
    204 # NOTE(celia): with default values normalized=True and n_bins = 100
--> 205 aligned_embeddings = cebra_sklearn_helpers.align_embeddings(
    206     embeddings, labels)
    207 scores, pairs = _consistency_scores(aligned_embeddings,
    208                                     datasets=dataset_ids)
    209 between_dataset = [p[0] != p[1] for p in pairs]

File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:138, in align_embeddings(embeddings, labels, normalize, n_bins)
    133 digitized_labels = np.digitize(
    134     valid_labels, np.linspace(min_labels_value, max_labels_value,
    135                               n_bins))
    137 # quantize embedding based on the new labels
--> 138 quantized_embedding = [
    139     _coarse_to_fine(valid_embedding, digitized_labels, bin_idx)
    140     for bin_idx in range(n_bins)[1:]
    141 ]
    143 if normalize:  # normalize across dimensions
    144     quantized_embedding_norm = [
    145         quantized_sample / np.linalg.norm(quantized_sample, axis=0)
    146         for quantized_sample in quantized_embedding
    147     ]

File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:139, in <listcomp>(.0)
    133 digitized_labels = np.digitize(
    134     valid_labels, np.linspace(min_labels_value, max_labels_value,
    135                               n_bins))
    137 # quantize embedding based on the new labels
    138 quantized_embedding = [
--> 139     _coarse_to_fine(valid_embedding, digitized_labels, bin_idx)
    140     for bin_idx in range(n_bins)[1:]
    141 ]
    143 if normalize:  # normalize across dimensions
    144     quantized_embedding_norm = [
    145         quantized_sample / np.linalg.norm(quantized_sample, axis=0)
    146         for quantized_sample in quantized_embedding
    147     ]

File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:78, in _coarse_to_fine(data, digitized_labels, bin_idx)
     76     if quantized_data is not None:
     77         return quantized_data
---> 78 raise ValueError(
     79     f"Digitalized labels does not have elements close enough to bin index {bin_idx}. "
     80     f"The bin index should be in the range of the labels values.")

ValueError: Digitalized labels does not have elements close enough to bin index 95. The bin index should be in the range of the labels values.

Anything else?

The problematic bin_index varies depending on the discretisation of the position / labels

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

stes · 2023-06-23T16:36:35Z

Thanks for reporting -- as a quick check, can you avoid the error by lowering the number of bins?

drsax93 · 2023-06-23T16:42:08Z

Hi, thanks for the quick reply ! I was planning to give it a go, but I have to double check how to change a function within a package. Should I just change from the main directory and then run python setup.py install right? Cheers, Giuseppe

…

On Jun 23 2023, at 5:36 PM, Steffen Schneider ***@***.***> wrote: Thanks for reporting -- as a quick check, can you avoid the error by lowering the number of bins? — Reply to this email directly, view it on GitHub (#24 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AHQLVO2YRTIKUIXFIEG4FP3XMXAZ3ANCNFSM6AAAAAAZRZAPEU). You are receiving this because you authored the thread.

stes · 2023-06-23T16:44:23Z

Easiest is to clone the repo and do a local install doing

pip install -e .

We might consider exposing the number of bins to the API in the future -- thanks for catching this!

drsax93 · 2023-06-23T17:02:04Z

Ok, I’ll let you know how that goes!

…

On Jun 23, 2023 at 5:44 PM, <Steffen Schneider ***@***.***)> wrote: Easiest is to clone the repo and do a local install doing pip install -e . We might consider exposing the number of bins to the API in the future -- thanks for catching this! — Reply to this email directly, view it on GitHub (#24 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AHQLVO5OQC2ZJ2FPEH7GGC3XMXBXFANCNFSM6AAAAAAZRZAPEU). You are receiving this because you authored the thread.Message ID: ***@***.***>

stes · 2023-06-24T11:39:52Z

PR #25 now contains a suggestion --- let me know if that fixes your issue.

drsax93 · 2023-06-26T13:45:05Z

Changing the number of bins works, thanks!
Could you comment on how to choose the appropriate number?

Reading through the consistemcy score demo it says Correlation matrices depict the after fitting a linear model between behavior-aligned embeddings of two animals, one as the target one as the source (mean, n=10 runs), but I don't see any shuffling / subsampling procedure in the code -- is it so?

Cheers

stes · 2023-06-26T13:50:45Z

@drsax93 ,

Could you comment on how to choose the appropriate number?

An appropriate number of bins would be one that you could also use to plot a histogram of your data; there should be no empty bins (this is what caused your original error), but there should not be too few (in the extreme case, a single bin would cause the consistency to be always at 100%).

So ideally try to find the largest number of bins that avoid the issue that you saw above for best results.

Reading through the consistemcy score demo it says Correlation matrices depict the after fitting a linear model between behavior-aligned embeddings of two animals, one as the target one as the source (mean, n=10 runs), but I don't see any shuffling / subsampling procedure in the code -- is it so?

The runs are with respect to fitting 10 independent CEBRA models. This is sth you have to do as an input for that function. I.e., you would fit 10 models (in the simplest case, just running through a for loop), compute the embeddings, and pass the results to the function.

Does that make sense?

drsax93 · 2023-06-26T14:19:51Z

Yes, thanks! I interpreted the text as suggesting that the score was obtained on subsamples, but the way you explained just now makes sense.

…

On Jun 26 2023, at 2:50 PM, Steffen Schneider ***@***.***> wrote: @drsax93 (https://github.com/drsax93) , > > Could you comment on how to choose the appropriate number? > An appropriate number of bins would be one that you could also use to plot a histogram of your data; there should be no empty bins (this is what caused your original error), but there should not be too few (in the extreme case, a single bin would cause the consistency to be always at 100%). So ideally try to find the largest number of bins that avoid the issue that you saw above for best results. > > Reading through the consistemcy score demo it says Correlation matrices depict the after fitting a linear model between behavior-aligned embeddings of two animals, one as the target one as the source (mean, n=10 runs), but I don't see any shuffling / subsampling procedure in the code -- is it so? > The runs are with respect to fitting 10 independent CEBRA models. This is sth you have to do as an input for that function. I.e., you would fit 10 models (in the simplest case, just running through a for loop), compute the embeddings, and pass the results to the function. Does that make sense? — Reply to this email directly, view it on GitHub (#24 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AHQLVO6IMIYMXBANFUGQM2TXNGHUBANCNFSM6AAAAAAZRZAPEU). You are receiving this because you were mentioned.

drsax93 assigned stes Jun 23, 2023

stes added the enhancement New feature or request label Jun 23, 2023

stes changed the title ~~Consistency score across datasets -- issue in digitising labels~~ Expose n_bins argument to cebra_sklearn_helpers.align_embeddings instead of fixing default value internally Jun 23, 2023

stes mentioned this issue Jun 24, 2023

Expose n_bins argument from align_embeddings #25

Merged

stes closed this as completed in #25 Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose `n_bins` argument to `cebra_sklearn_helpers.align_embeddings` instead of fixing default value internally #24

Expose `n_bins` argument to `cebra_sklearn_helpers.align_embeddings` instead of fixing default value internally #24

drsax93 commented Jun 23, 2023 •

edited

Loading

stes commented Jun 23, 2023

drsax93 commented Jun 23, 2023 via email

stes commented Jun 23, 2023

drsax93 commented Jun 23, 2023 via email

stes commented Jun 24, 2023

drsax93 commented Jun 26, 2023

stes commented Jun 26, 2023 •

edited

Loading

drsax93 commented Jun 26, 2023 via email

Expose n_bins argument to cebra_sklearn_helpers.align_embeddings instead of fixing default value internally #24

Expose n_bins argument to cebra_sklearn_helpers.align_embeddings instead of fixing default value internally #24

Comments

drsax93 commented Jun 23, 2023 • edited Loading

Is there an existing issue for this?

Bug description

Operating System

CEBRA version

Device type

Steps To Reproduce

Relevant log output

Anything else?

Code of Conduct

stes commented Jun 23, 2023

drsax93 commented Jun 23, 2023 via email

stes commented Jun 23, 2023

drsax93 commented Jun 23, 2023 via email

stes commented Jun 24, 2023

drsax93 commented Jun 26, 2023

stes commented Jun 26, 2023 • edited Loading

drsax93 commented Jun 26, 2023 via email

Expose `n_bins` argument to `cebra_sklearn_helpers.align_embeddings` instead of fixing default value internally #24

Expose `n_bins` argument to `cebra_sklearn_helpers.align_embeddings` instead of fixing default value internally #24

drsax93 commented Jun 23, 2023 •

edited

Loading

stes commented Jun 26, 2023 •

edited

Loading