Suggestion: Wrapper function to run Annotator on larger datasets #37

iainmwallace · 2017-07-07T17:48:10Z

Hi,

It would be useful to have a wrapper function for Annotator for when the list of ids is >500. Ideally it would run the queries in parallel.

Cheers,

Iain

iainmwallace · 2017-07-07T18:21:46Z

This is the best that I could come up with ---

x<-split(my_ids, ceiling(seq_along(my_ids)/500))

annotate_data<-function(my_id_subset,full_data,my_fields){
mini_dataset<-full_data%>%filter(id%in%my_id_subset)
annotated_dataset<-Annotator.annotate(mini_dataset,my_fields)%>%as_tibble() # annotation should return lists
return(annotated_dataset)
}

y<-pblapply(x,annotate_data,a,fields)%>%bind_rows()

davecap · 2017-07-07T18:23:51Z

Not bad! We actually do something similar in the Python client: https://github.com/solvebio/solvebio-python/blob/master/solvebio/annotate.py#L33

Shouldn't be too hard to get it integrated in the R client.

davecap modified the milestones: v2.0.0 Release, v2.0.1 Release Jul 22, 2017

davecap modified the milestone: v2.0.1 Release Aug 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Wrapper function to run Annotator on larger datasets #37

Suggestion: Wrapper function to run Annotator on larger datasets #37

iainmwallace commented Jul 7, 2017

iainmwallace commented Jul 7, 2017

davecap commented Jul 7, 2017

Suggestion: Wrapper function to run Annotator on larger datasets #37

Suggestion: Wrapper function to run Annotator on larger datasets #37

Comments

iainmwallace commented Jul 7, 2017

iainmwallace commented Jul 7, 2017

davecap commented Jul 7, 2017