Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading text files as media files #173

Open
hagenw opened this issue May 13, 2024 · 2 comments · May be fixed by #176
Open

Add support for reading text files as media files #173

hagenw opened this issue May 13, 2024 · 2 comments · May be fixed by #176
Labels
enhancement New feature or request question Further information is requested

Comments

@hagenw
Copy link
Member

hagenw commented May 13, 2024

In audb 1.7.0 we added support to publish not only audio and video files, but every file format a user would like to publish.
This means we should also adjust process_index(), process_file(), process_files(), process_folder() to support other files.

The question is how to best support text files:

  • Should we pre-define a list of file extension(s), that are then treated as text files?
  • Should we check the mime-type of a file, to see how to handle it (might slow things down)?
  • Should we use try and except statements (could be tricky as audio files might also fail for audiofile if ffmpeg is not installed)

And how to return the content of a text file:

  • Should it be a text string?
  • Should it be a JSON string?

/cc @maxschmitt

@hagenw hagenw added enhancement New feature or request question Further information is requested labels May 13, 2024
@maxschmitt
Copy link
Contributor

I would go for this one:

Should we pre-define a list of file extension(s), that are then treated as text files?

It would be also great if we can support structured text as in a json file. This is especially useful for dialog datasets, with metadata on turn level.

And how to return the content of a text file:
Should it be a text string?
Should it be a JSON string?

If it is a .txt file, it should be a text string, if it is a .json file, it should be a json string. This would imho be the both simplest and clearest solution.

@hagenw
Copy link
Member Author

hagenw commented May 14, 2024

Thanks for the feedback, sounds indeed like a good solution.

@hagenw hagenw linked a pull request Jun 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants