-
Notifications
You must be signed in to change notification settings - Fork 8
Files retrieval from a index #6
Comments
It sounds like the hashes are printed to something other than stdout (likely stderr). Can you try redirecting stderr to stdout in the first part of the pipeline? Something like this:
The -e option to runscript uses environment variables for authentication. If they are not set then it falls back to prompting for authentication to CRITs. Please take a look at |
After adding 2>&1 it returns error: Also I didn't mention that while running this command I've received this output: |
I do have write permissions to the output directory but still nothing saved in it. Can you please advise? |
Have you configured CRITs to use services in where ever you put your them? Most people put them in /data/crits_services and then configure CRITs to use that directory for services. |
First of all I'd like to thank you for your assistance. I am really appreciate it. I've configured CRITs to use services, but I didn't put these services at the root directory. At the CRITs GUI all services are displaying as available, including snugglefish service. I just can't use snugglefish service because I am not succeeding to finish indexing of the sample files we've uploaded so far. Is there any log that I can check to see why it is not saving files at the given directory? Thanks |
I can dig into this a bit on Monday when I'm back in the office. |
Great, looking forward to hear any news from you. |
Can you provide more information about the various directories involved and where you are executing things from. Everything you are doing seems more or less right but it's not clear exactly what you are doing. A direct copy/paste of the commands and the exact output would help, along with a description of where your services live. In particular the error: |
Hi, The CRITs directory is /mnt/crits/crits-m The first thing that I did while starting to index for snugglefish is: python manage.py runscript snugglefish_service index -- -a create -n CW -q "{'source.name': 'CW'}" -d /mnt/crits/data/crits_services/snugglefish_service/index Then, I am running this command to check the status: Then I am trying to get first 500 files for the CW index: The output that I see on the screen: But when I am checking the directory mnt/crits/data/crits_services/snugglefish_service/wxs/CW/ , it is empty. |
So the hashes are being printed to stderr (I still need to figure out how to get scripts to properly print to stdout instead of stderr). You can get the hashes to stdout by simply redirecting stderr to stdout, as I suggested earlier. Do this by executing:
Notice I am adding |
Hi, I've run this command: Output on the screen: Thanks, |
So the Read this for more information: https://github.com/crits/crits/wiki/runscript |
Added these environments with "export" command and when I do "printenv" I see CRITS_USER=my user CRITS_PASSWORD= my password , but seems that it doesn't work. Can you please advise what's wrong here also? |
Hello, I have additional question. Every time that I am uploading new samples to the CRITs should I delete the existing index and create a new one? Is there any way just to update the existing index with a new sample files? Thank you, |
You have a bunch of different things going on and are not providing enough information to debug.
This sounds like you don't have a git repository cloned for your CRITs install. Can you confirm that is true? Creating an empty .git directory was the wrong thing to do, please remove that. Do not delete the indexes or create a new one. The method described should update the existing index with the number of new files. Here's the rough outline of what should happen: You have no snugglefish indexes defined in CRITs. So you create one as described in the documentation. This example would create an index named
You then upload 500 samples to CRITs all with the source FOO and want to index them. The first thing you need to have to create snugglefish indexes are files to index. To get the files out of CRITs and on disk where you can index them you can use:
This will query CRITs for the snugglefish index named The next step is to actually index the files:
This will create indexes in The last step is to tell CRITs that you just indexed some files:
At this point you can remove the files created in I also don't think the number is doubling in your case. I suspect what is happening is that you are repeatedly telling CRITs that you indexed 500 files (or whatever your number is) because you are not removing them once you index them. If you need further help I would humbly request that you try to provide exact copies of what you are doing. Exactly what the commands are, where you are running them from, and exactly what the output is. and try not to do more than one thing at once, otherwise the problems can compound on themselves and become difficult to follow and for me to help you. |
Hi, It is correct I don't have a git repository cloned for my CRITs install. I just ran "git init" and have created empty .git directory. I just removed it according to your instructions. Can you please advise what should I do next to resolve "Not a git repository" error? What repository should I clone? Regarding all new samples. In my case there is someone else that uploading new samples every singe day. I have indexed all samples yesterday and I imagine that samples that he'll add today will not be in that index. Am I right? If so how can I update my index with all new samples that have been added today? Is there any way to make this process automated? Thank you, |
The missing .git directory is not a hard error. I believe it can be ignored. Once the files are indexed you can remove them from disk, then move the newly created indexes to the directory specified in CRITs. You can do this daily if you want, just automate it with a script. |
And yes, you are right. The "getnew" command will get all new files since the last time you ran that command. |
So, I just add newly created index to the already existing one? If for the moment I have one index named CW.index00000000. so the next one will be CW.index00000001? |
Also regarding setting environment variables CRITS_USER and CRITS_PASSWORD. The "Not a git repository" error just making impossible to me insert my username, because it looks like this every time: Thanks, |
Hello, I am receiving "No objects found". What might be a problem? Thanks, |
Any response please? |
Hello, I am receiving following errors: There are two files I want to index. When I am indexing one file in a time it works fine but when I try to index these 2 files together I am receiving the errors above. |
When indexing new files for an existing index you need to have the existing index already in the output directory. This way it will be appended to I believe. The exact error seems to be a problem when writing to a file. Are you sure permissions on the output are correct? |
Hello,
When I am running the command -
python manage.py runscript -e snugglefish_service index -- -a getnew -n FOO -c 500000 | xargs -I % -n 1 python manage.py runscript -e crits_scripts get_file -- -m % -o /home/wxs/FOO/%
I am receiving the list of the hashes as a screen output, but the actual directory /home/wxs/FOO remains empty.
What might be a cause of this?
Also it seems that -e option not working for me, every time that I run this command it requires authentication.
Thank you.
The text was updated successfully, but these errors were encountered: