If you want to contribute you might take one of these topics and start working on them.
Abstract away the database system of papis to allow for backends such as whoosh for the indexing and querying of documents. However the simple document structure of papis should be left intact.
A feature that many people want appears to be search-in-document feature, this is, the capability of searching keywords inside documents.
For this we would need a reliable way of turning PDF files or any other format into text and being able to discriminate between trivial words of this text and choose the most representative keywords of the text.
This in turn would be stored in some local cache and the user would be able to search in this text like
papis open "text = 'neural GAN'"
or something like this. This would mean that the words neural
and GAN
should be searched also in the cache of the text-converted file.
The problems for this is that is difficult to make sure that the user
have good tools to convert into text even only for pdf
and there is
no nice solution of a python library doing this.
Implement BASE support
The great BASE search engine offers a free service for open source projects and I have already been in contact with them, I just lack the time to delve into their API and implement it in clean python.
For the moment the one interested in implementing this would have to create a file in the spirit of
papis/arxiv.py
where a module method get_data
is implemented that has a signature
similar (but not necessarily equal to)
def get_data(
query="",
author="",
title="",
abstract="",
comment="",
journal="",
report_number="",
category="",
id_list="",
page=0,
max_results=30
):
etc... Also the user agent of this function should be called papis
as already discussed with the people from BASE.
We have been using the package
[argcomplete](https://github.com/kislyuk/argcomplete)
to provide a quite rudimentary bash autocompletion.
It would be nice to have an extensible bash and zsh autocompletion
script that we can update by hand each time that we update
the cli. I insist it should be by hand in order to ensure
to better performance of the autocompletion. At least to my knowledge
argcomplete
has to run the program every time it spits out the
autocompletion, which in my opinion for papis is suboptimal.
Therefore is someone is skilled in bash or zsh autocompletion, she can contribute one.
Use habanero for crossref
[X]
Right now papis is parsing crossref with a hand-made module. However I think in order to maintain less code and profit from better ad-hoc made codes we should use habanero to interface with crossref since it is a very powerful library written in python.
Right now papis has a downloaders module where different downloaders for different services/journals are listed.
It also has some files like crossref.py
and isbn.py
where the functionality of crawlers or searchers are implemented.
Somehow one should be able too to unify this into the downloaders section,
so that one has for the libgen
service for instance the class
papis.downloaders.libgen.Downloader
as it now stands and also a class
papis.downloaders.libgen.Crawler
or the like.
Right now one of the main things that papis needs is a thorough unit testing. We have to find good ways of testing all command line behaviour a beginning for this is found in the file
tests/bash/
The idea is to create a test library where all possible cli commands and behaviours can be tested.
Also for the downloaders we have to mock the html pages so that the BeautifulSoup package can parse them using the existing code (mocking http connections) and test the parsers.
If someone has experience with youtube, it would be nice to have a review in youtube explaining the main uses of papis and of her workflow.
The downloaders implemented should be downloading the documents via a normal http connection. If the users have an account at some university where the university is paying for access to the journal, then it would be nice that people can provide per user config a proxy that is used to download the paper.
Papis should work in windows, however I am unable to test this.
It would be somehow nice to have a logo.
( There is a running project in here so before implementing a GUI yourself you might want to help out there? )
It would be nice to have a GTK or QT based GUI, there is a branch with a gtk GUI however I'm in principle a little bit afraid of implementing a gui and becoming jabref. A GUI for papis should be extremely simple and uncluttered.
People should be able to control everything by configurable keyboard shortcuts. I do not know however if papis needs a GUI. This is therefore a low priority for papis.
It would be nice to implement packages for the common linux distributions and for brew in macOS.
- Brew (MacOS)
- Debian/Ubuntu
- Archlinux
- NixOS
- Other