You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add DiscoveryBench to OpenHands' evaluation suite. DiscoveryBench contains 264 tasks collected
across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.
Do you have thoughts on the technical implementation?
The implementation will consist of:
Inference script to solve a DiscoveryBench task (goal & datasets)
Facetted evaluation script to rigorously evaluate the answers
Documentation for the OpenHands users
Additional context
We are working on a PR for this and will seek OpenHands contributors' input to finalize it. Tagging other contributors to the PR - @Ethan0456, @majumderb and @neubig who helped us chart out the integration.
The text was updated successfully, but these errors were encountered:
What problem or use case are you trying to solve?
Add DiscoveryBench to OpenHands' evaluation suite. DiscoveryBench contains 264 tasks collected
across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.
https://github.com/allenai/discoverybench/
https://x.com/mbodhisattwa/status/1811524569410531333
Do you have thoughts on the technical implementation?
The implementation will consist of:
Additional context
We are working on a PR for this and will seek OpenHands contributors' input to finalize it. Tagging other contributors to the PR - @Ethan0456, @majumderb and @neubig who helped us chart out the integration.
The text was updated successfully, but these errors were encountered: