Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the BIDS-like saving to align with template directory #77

Merged
merged 56 commits into from
Sep 16, 2024

Conversation

ibevers
Copy link
Contributor

@ibevers ibevers commented Aug 21, 2024

This pull request updates the BIDS-like structure of the output directory to more closely follow BIDS. It also adds tests and optimizes performance. Some of the changes included in this PR:

  • Adds descriptions for each field to help users of the dataset understand it better.

  • Adds an n_cores parameter that determines the number of processes to use when extracting features. When using Pydra, this is also the number of separate model instances that will be created.

  • Adds tests for the conversion code.

  • Adds transcriptions to the features saved with the option to use different Whisper model sizes.

  • Adds a with_sensitive parameter that determines whether to include raw audio files and transcriptions in the output.

  • Adds a template BIDS-like directory structure.

  • Configure to automatically update CHANGES.md

  • Remove references to summer school

Ensure all these are constructed or copied to the bids-like dataset saving directory with correct names:

  • CHANGES.md
  • README.md
  • dataset_description.json
  • participants.json
  • participants.tsv
  • phenotype
  • <measurement_tool_name>.json
  • <measurement_tool_name>.tsv
  • sub-<participant_id>
  • ses-<session_id>
  • beh
  • audio
  • sub-<participant_id>_ses-<session_id>_task-<task_name>_audio.wav
  • sub-<participant_id>_ses-<session_id>_task-<task_name>_features.pt
  • sub-<participant_id>_ses-<session_id>_task-<task_name>_metadata.json
  • sub-<participant_id>_ses-<session_id>_task-<task_name>_transcript.txt
  • sessions.tsv

@ibevers ibevers self-assigned this Aug 21, 2024
@ibevers
Copy link
Contributor Author

ibevers commented Aug 26, 2024

Need to move all questionnaires in beh to phenotype. We don't have any behavioral experiments in the dataset. Therefore, we should not have a beh dir.

@ibevers
Copy link
Contributor Author

ibevers commented Aug 28, 2024

There should be no beh directory.

  • Move all audio related jsons in beh to audio
  • Remove beh directory

@ibevers
Copy link
Contributor Author

ibevers commented Sep 10, 2024

  • Add flag for not including audios or otherwise make this clear

Copy link
Collaborator

@alistairewj alistairewj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add senselab>=0.12.0 to the requirements - the library doesn't work with earlier versions
  • Are you sure we want to package the audio data in src? it makes the pip package much bigger.
  • Reformatting code & making changes makes it very hard to review changes. in the future, if you plan on bulk reformatting, best practice is to only reformat and do it in its own PR. Then afterward you can make your code changes.

Still in the middle of doing my testing with the latest data dump, but I think it's worth approving this and if I come up with any further improvements I can make a new PR.

@ibevers
Copy link
Contributor Author

ibevers commented Sep 16, 2024

Hi @alistairewj, thank you for reviewing! I didn't realize that about reformatting code and making changes in the same PR. In the future, will do my best to separate them🌱

To-dos:

  • add senselab>=0.12.0 to requirements
  • move audio data out of src

@ibevers
Copy link
Contributor Author

ibevers commented Sep 16, 2024

  • Add license text dataset_description.json

@ibevers ibevers marked this pull request as ready for review September 16, 2024 20:16
@ibevers ibevers merged commit 67615bf into main Sep 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants