Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Recipes for common use case #410

Open
Tracked by #401
dennyabrain opened this issue Oct 15, 2024 · 8 comments
Open
Tracked by #401

Create Recipes for common use case #410

dennyabrain opened this issue Oct 15, 2024 · 8 comments
Assignees

Comments

@dennyabrain
Copy link
Contributor

dennyabrain commented Oct 15, 2024

Recipes are supposed to be end to end examples of using Feluda for a particular use case. While feluda itself should be easy to use and configured by an experienced python/ML engineer, these recipes would provide easy to copy-paste examples of using feluda for specific use cases. As part of this issue, we should

  1. Identify commonly useful recipes
  2. Write example recipes
  3. discover and fix bugs that we discover along the way
  4. publish them in the wiki/website
@dennyabrain
Copy link
Contributor Author

Examples of commonly useful recipes would be

  1. Index a collection of images and search similar images in them
  2. Index a collection of videos and search for similar videos through them
  3. Index a collection of audios and search for similar audios through them
  4. Process a collection of videos and cluster them into predefined categories
  5. Process a collection of newspaper clippings and search through the text in them

@plon-Susk7
Copy link

Hi @dennyabrain , should we start with the examples provided by you first?

@dennyabrain
Copy link
Contributor Author

Yes @plon-Susk7 let's do one of them. Video is interesting and sufficiently complex. Should we do the recipes related to videos?
So 1. Demonstrating the use of feluda to index and search through videos and 2. Demonstrating the use of feluda to cluster a collection of videos into groups.

@aatmanvaidya might be able to direct you better to which operaters to look at and any relevant documentation.

@plon-Susk7
Copy link

Yeah let's go with video first. I'll fetch additional details from @aatmanvaidya .

@plon-Susk7
Copy link

plon-Susk7 commented Oct 25, 2024

Steps to run notebook:

  1. Install jupyter lab inside virtual environment first (venv).
pip install jupyter lab
  1. After installation of jupyter lab, deactivate venv and run the following command.
jupyter lab --ip 0.0.0.0 --port 8888 --no-browser --allow-root --NotebookApp.token=''

You can run notebook by navigating to http://localhost:8888 on your browser.

@dennyabrain
Copy link
Contributor Author

@aatmanvaidya @plon-Susk7 something to think about, should we considering creating collab notebooks as well? I have two reasons to support this :

  1. It will provide journalists and non tech folks a cloud environment to use feluda in without worrying about installing python and feluda on their machine
  2. since collab integrates well with google drive, they could mount data from their own drives.

2 is useful because a lot of people are familiar with using google drive and create their little personal "datasets" on it all the time. So being able to process their own data using our notebooks would enhance feluda's usability for them.

@aatmanvaidya
Copy link
Collaborator

@dennyabrain even I was thinking about this

but currently, there could be some limitations

  • we will only be able to use operators in google colab.
    • this is not bad, because clustering, t-SNE, extraction of text from image and lot of useful things can still be done. Only store and search won't work
  • elastic search won't work there -- we could think of using other vector databases from langchain etc, but that's a different discussion.
  • since feluda is not a python package yet, we will have to clone the entire repo on the google coalb (this is not a big deal), but this means that whatever operator a journalist/not tech person would want to use, they would have to manually install there, and other dependencies that could come with it like ffmpeg, tessarct-ocr etc

I think we should definitely have examples on google colab, as its just becomes one-click for someone to replicate and use feluda -- they don't have to worry about setting up docker etc.

But I feel, through the work Priyash has done so far on writing example notebooks, we should first finalise the public API and then move towards examples on colab.
What do you think Denny?

@dennyabrain
Copy link
Contributor Author

@aatmanvaidya point taken about the need to publish the library first and to finalize the API. Was getting ahead of myself. Lets do collab later then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

3 participants