All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- The public methods of
Surrogate
models now operate on dataframes in experimental representation instead of tensors in computational representation Surrogate.posterior
models now returns aPosterior
objectparam_bounds_comp
ofSearchSpace
,SubspaceDiscrete
andSubspaceContinuous
has been replaced withcomp_rep_bounds
, which returns a dataframe
py.typed
file to enable the use of type checkers on the user sideGaussianSurrogate
base class for surrogate models with Gaussian posteriorscomp_rep_columns
property forParameter
,SearchSpace
,SubspaceDiscrete
andSubspaceContinuous
classes- New mechanisms for surrogate input/output scaling configurable per class
SurrogateProtocol
as an interface for user-defined surrogate architectures
- The transition from experimental to computational representation no longer happens in the recommender but in the surrogate
- Fallback models created by
catch_constant_targets
are stored outside the surrogate to_tensor
now also handlesnumpy
arraysMIN
mode ofNumericalTarget
is now implemented via the acquisition function instead of negating the computational representation
CategoricalParameter
andTaskParameter
no longer incorrectly coerce a single string input to categories/tasksfarthest_point_sampling
no longer depends on the provided point order
register_custom_architecture
decoratorScalar
andDefaultScaler
classes
- The role of
register_custom_architecture
has been taken over bybaybe.surrogates.base.SurrogateProtocol
- Providing an explicit
batch_size
is now mandatory when asking for recommendations RecommenderProtocol.recommend
now accepts an optionalObjective
RecommenderProtocol.recommend
now expects training data to be provided as a single dataframe in experimental representation instead of two separate dataframes in computational representationParameter.is_numeric
has been replaced withParameter.is_numerical
DiscreteParameter.transform_rep_exp2comp
has been replaced withDiscreteParameter.transform
filter_attributes
has been replaced withmatch_attributes
Surrogate
base class now exposes ato_botorch
methodSubspaceDiscrete.to_searchspace
andSubspaceContinuous.to_searchspace
convenience constructor- Validators for
Campaign
attributes _optional
subpackage for managing optional dependencies- New acquisition functions for active learning:
qNIPV
(negative integrated posterior variance) andPSTD
(posterior standard deviation) - Acquisition function:
qKG
(knowledge gradient) - Abstract
ContinuousNonlinearConstraint
class - Abstract
CardinalityConstraint
class andDiscreteCardinalityConstraint
/ContinuousCardinalityConstraint
subclasses - Uniform sampling mechanism for continuous spaces with cardinality constraints
register_hooks
utility enabling user-defined augmentation of arbitrary callablestransform
methods ofSearchSpace
,SubspaceDiscrete
andSubspaceContinuous
now take additionalallow_missing
andallow_extra
keyword arguments- More details to the transfer learning user guide
- Activated doctests
SubspaceDiscrete.from_parameter
,SubspaceContinuous.from_parameter
,SubspaceContinuous.from_product
andSearchSpace.from_parameter
convenience constructorsDiscreteParameter.to_subspace
,ContinuousParameter.to_subspace
andParameter.to_searchspace
convenience constructors- Utilities for permutation and dependency data augmentation
- Validation and translation tests for kernels
BasicKernel
andCompositeKernel
base classes- Activated
pre-commit.ci
with auto-update - User guide for active learning
- Polars expressions for
DiscreteSumConstraint
,DiscreteProductConstraint
,DiscreteExcludeConstraint
,DiscreteLinkedParametersConstraint
andDiscreteNoLabelDuplicatesConstraint
- Discrete search space Cartesian product can be created lazily via Polars
- Examples demonstrating the
register_hooks
utility: basic registration mechanism, monitoring the probability of improvement, and automatic campaign stopping - Documentation building now uses a lockfile to fix the exact environment
- Passing an
Objective
toCampaign
is now optional GaussianProcessSurrogate
models are no longer wrapped when cast to BoTorch- Restrict upper versions of main dependencies, motivated by major
numpy
release - Sampling methods in
qNIPV
andBotorchRecommender
are now specified viaDiscreteSamplingMethod
enum Interval
class now supports degenerate intervals containing only one elementadd_fake_results
now directly processesTarget
objects instead of aCampaign
path
argument in plotting utility is now optional and defaults toPath(".")
UnusedObjectWarning
by non-predictive recommenders is now ignored during simulations- The default kernel factory now avoids strong jumps by linearly interpolating between two fixed low and high dimensional prior regimes
- The previous default kernel factory has been renamed to
EDBOKernelFactory
and now fully reflects the original logic - The default acquisition function has been changed from
qEI
toqLogEI
for improved numerical stability
- Support for Python 3.9 removed due to new BoTorch requirements and guidelines from Scientific Python
- Linter
typos
for spellchecking
sequential
flag ofSequentialGreedyRecommender
is now set toTrue
- Serialization bug related to class layout of
SKLearnClusteringRecommender
MetaRecommender
s no longer trigger warnings about non-empty objectives or measurements when calling aNonPredictiveRecommender
- Bug introduced in 0.9.0 (PR #221, commit 3078f3), where arguments to
to_gpytorch
are not passed on to the GPyTorch kernels - Positive-valued kernel attributes are now correctly handled by validators and hypothesis strategies
- As a temporary workaround to compensate for missing
IndexKernel
priors,fit_gpytorch_mll_torch
is used instead offit_gpytorch_mll
when aTaskParameter
is present, which acts as regularization via early stopping during model fitting
SequentialGreedyRecommender
class replaced withBotorchRecommender
SubspaceContinuous.samples_random
has been replaced withSubspaceContinuous.sample_uniform
SubspaceContinuous.samples_full_factorial
has been replaced withSubspaceContinuous.sample_from_full_factorial
- Passing a dataframe via the
data
argument to thetransform
methods ofSearchSpace
,SubspaceDiscrete
andSubspaceContinuous
is no longer possible. The dataframe must now be passed as positional argument. - The new
allow_extra
flag is automatically set toTrue
intransform
methods of search space classes when left unspecified
Interval.is_finite
property- Specifying target configs without type information
- Specifying parameters/constraints at the top level of a campaign configs
- Passing
numerical_measurements_must_be_within_tolerance
toCampaign
batch_quantity
argument- Passing
allow_repeated_recommendations
orallow_recommending_already_measured
toMetaRecommender
(or formerStrategy
) *Strategy
classes andbaybe.strategies
subpackage- Specifying
MetaRecommender
(or formerStrategy
) configs without type information
- Discrete searchspace memory estimate is now natively represented in bytes
- Non-GP surrogates not working with
deepcopy
and the simulation package due to slotted base class - Datatype inconsistencies for various parameters'
values
andcomp_df
andSubSelectionCondition
'sselection
related to floating point precision
- Class hierarchy for objectives
AdditiveKernel
,LinearKernel
,MaternKernel
,PeriodicKernel
,PiecewisePolynomialKernel
,PolynomialKernel
,ProductKernel
,RBFKernel
,RFFKernel
,RQKernel
,ScaleKernel
classesKernelFactory
protocol enabling context-dependent construction of kernels- Preset mechanism for
GaussianProcessSurrogate
hypothesis
strategies and roundtrip test for kernels, constraints, objectives, priors and acquisition functions- New acquisition functions:
qSR
,qNEI
,LogEI
,qLogEI
,qLogNEI
GammaPrior
,HalfCauchyPrior
,NormalPrior
,HalfNormalPrior
,LogNormalPrior
andSmoothedBoxPrior
classes- Possibility to deserialize classes from optional class name abbreviations
- Basic deserialization tests using different class type specifiers
- Serialization user guide
- Environment variables user guide
- Utility for estimating memory requirements of discrete product search space
mypy
for search space and objectives
- Reorganized acquisition.py into
acquisition
subpackage - Reorganized simulation.py into
simulation
subpackage - Reorganized gaussian_process.py into
gaussian_process
subpackage - Acquisition functions are now their own objects
acquisition_function_cls
constructor parameter renamed toacquisition_function
- User guide now explains the new objective classes
- Telemetry deactivation warning is only shown to developers
torch
,gpytorch
andbotorch
are lazy-loaded for improved startup time- If an exception is encountered during simulation, incomplete results are returned with a warning instead of passing through the uncaught exception
- Environment variables
BAYBE_NUMPY_USE_SINGLE_PRECISION
andBAYBE_TORCH_USE_SINGLE_PRECISION
to enforce single point precision usage
model_params
attribute fromSurrogate
base class,GaussianProcessSurrogate
andCustomONNXSurrogate
- Dependency on
requests
package
n_task_params
now evaluates to 1 iftask_idx == 0
- Simulation no longer fails in
ignore
mode when lookup dataframe contains duplicate parameter configurations - Simulation no longer fails for targets in
MATCH
mode closest_element
now works for array-like input of all kinds- Structuring concrete subclasses no longer requires providing an explicit
type
field _target(s)
attributes ofObjectives
are now de-/serialized without leading underscore to support user-friendly serialization strings- Telemetry does not execute any code if it was disabled
- Running simulations no longer alters the states of the global random number generators
- The former
baybe.objective.Objective
class has been replaced withSingleTargetObjective
andDesirabilityObjective
acquisition_function_cls
constructor parameter forBayesianRecommender
VarUCB
andqVarUCB
acquisition functions
BayBE
classbaybe.surrogate
modulebaybe.targets.Objective
classbaybe.strategies.Strategy
class
- Simulation user guide
- Example for transfer learning backtesting utility
pyupgrade
pre-commit hook- Better human readable
__str__
representation of objective and targets - Alternative dataframe deserialization from
pd.DataFrame
constructors
- More detailed and sophisticated search space user guide
- Support for Python 3.12
- Upgraded syntax to Python 3.9
- Bumped
onnx
version to fix vulnerability - Increased threshold for low-dimensional GP priors
- Replaced
fit_gpytorch_mll_torch
withfit_gpytorch_mll
- Use
tox-uv
in pipelines
telemetry
dependency is no longer a group (enables Poetry installation)
- Better human readable
__str__
representation of campaign - README now contains an example on substance encoding results
- Transfer learning user guide
from_simplex
constructor now also takes and applies optional constraints
- Full lookup backtesting example now tests different substance encodings
- Replaced unmaintained
mordred
dependency bymordredcommunity
SearchSpace
s now usendarray
instead ofTensor
from_simplex
now efficiently validated inCampaign.validate_config
- BoTorch dependency bumped to
>=0.9.3
- Workaround for BoTorch hybrid recommender data type
- Support for Python 3.8
- Subpackages for the available recommender types
- Multi-style plotting capabilities for generated example plots
- JSON file for plotting themes
- Smoke testing in relevant tox environments
ContinuousParameter
base class- New environment variable
BAYBE_CACHE_DIR
that can customize the disk cache directory or turn off disk caching entirely - Options to control the number of nonzero parameters in
SubspaceDiscrete.from_simplex
- Temporarily ignore ONNX vulnerabilities
- Better human readable
__str__
representation of search spaces pretty_print_df
function for printing shortened versions of dataframes- Basic Transfer Learning example
- Repo now has reminders (https://github.com/marketplace/actions/issue-reminder) enabled
mypy
for recommenders
Recommender
s now share their core logic via their base class- Remove progress bars in examples
- Strategies are now called
MetaRecommender
's and part of therecommenders.meta
module Recommender
's are now calledPureRecommender
's and part of therecommenders.pure
modulestrategy
keyword ofCampaign
renamed torecommender
NaiveHybridRecommender
renamed toNaiveHybridSpaceRecommender
- Unhandled exception in telemetry when username could not be inferred on Windows
- Metadata is now correctly updated for hybrid spaces
- Unintended deactivation of telemetry due to import problem
- Line wrapping in examples
TwoPhaseStrategy
,SequentialStrategy
andStreamingSequentialStrategy
have been replaced with their newMetaRecommender
versions
- Copy button for code blocks in documentation
mypy
for campaign, constraints and telemetry- Top-level example summaries
RecommenderProtocol
as common interface forStrategy
andRecommender
SubspaceDiscrete.from_simplex
convenience constructor
- Order of README sections
- Imports from top level
baybe.utils
no longer possible - Renamed
utils.numeric
toutils.numerical
- Optional
chem
dependencies are lazily imported, improving startup time
- Several minor issues in documentation
- Visibility and constructor exposure of
Campaign
attributes that should be private TaskParameter
s no longer disappear from computational representation when the search space contains only one task parameter value- Failing
baybe
import from environments containing only core dependencies caused by eagerly loadingchem
dependencies tox
coretest
now uses correct environment and skips unavailable tests- Basic serialization example no longer requires optional
chem
dependencies
- Detailed headings in table of contents of examples
- Passing
numerical_measurements_must_be_within_tolerance
to theCampaign
constructor is no longer supported. Instead,Campaign.add_measurements
now takes an additional parameter to control the behavior. batch_quantity
replaced withbatch_size
allow_repeated_recommendations
andallow_recommending_already_measured
are now attributes ofRecommender
and no longer attributes ofStrategy
- Target enums
mypy
for targets and intervals- Tests for code blocks in README and user guides
hypothesis
strategies and roundtrip tests for targets, intervals, and dataframes- De-/serialization of target subclasses via base class
- Docs building check now part of CI
- Automatic formatting checks for code examples in documentation
- Deserialization of classes with classmethod constructors can now be customized
by providing an optional
constructor
field SearchSpace.from_dataframe
convenience constructor
- Renamed
bounds_transform_func
target attribute totransformation
Interval.is_bounded
now implements the mathematical definition of boundedness- Moved and renamed target transform utility functions
- Examples have two levels of headings in the table of content
- Fix orders of examples in table of content
DiscreteCustomConstraint
validator now expects dataframe instead of seriesignore_example
flag builds but does not execute examples when building documentation- New user guide versions for campaigns, targets and objectives
- Binarization of dataframes now happens via pickling
- Wrong use of
tolerance
argument in constraints user guide - Errors with generics and type aliases in documentation
- Deduplication bug in substance_data
hypothesis
strategy - Use pydoclint as flake8 plugin and not as a stand-alone linter
- Margins in documentation for desktop and mobile version
Interval
s can now also be deserialized from a bounds iterableSubspaceDiscrete
andSubspaceContinuous
now have de-/serialization methods
- Conda install instructions and version badge
- Early fail for different Python versions in regular pipeline
Interval.is_finite
replaced withInterval.is_bounded
- Specifying target configs without explicit type information is deprecated
- Specifying parameters/constraints at the top level of a campaign configuration JSON is
deprecated. Instead, an explicit
searchspace
field must be provided with an optionalconstructor
entry
- Release pipeline now also publishes source distributions
hypothesis
strategies and tests for parameters package
- Reworked validation tests for parameters package
SubstanceParameter
now collects inconsistent user input in anExceptionGroup
- Link handling in documentation
- GitHub CI pipelines
- GitHub documentation pipeline
- Optional
--force
option for building the documentation despite errors - Enabled passing optional arguments to
tox -e docs
calls - Logo and banner images
- Project metadata for pyproject.toml
- PyPI release pipeline
- Favicon for homepage
- More literature references
- First drafts of first user guides
- Reworked README for GitHub landing page
- Now has concise contribution guidelines
- Use Furo theme for documentation
--debug
flag for documentation building
- Script for building HTML documentation and corresponding
tox
environment - Linter
typos
for spellchecking - Parameter encoding enums
mypy
for parameters packagetox
environments formypy
- Replacing
pylint
,flake8
,µfmt
andusort
withruff
- Markdown based documentation replaced with HTML based documentation
encoding
is no longer a class variable- Now installed with correct
pandas
dependency flag comp_df
column names forCustomDiscreteParameter
are now safe
Raises
section for validators and corresponding contributing guideline- Bring your own model: surrogate classes for custom model architectures and pre-trained ONNX models
- Test module for deprecation warnings
- Option to control the switching point of
TwoPhaseStrategy
(formerStrategy
) SequentialStrategy
andStreamingSequentialStrategy
classes- Telemetry env variable
BAYBE_TELEMETRY_VPN_CHECK
turning the initial connectivity check on/off - Telemetry env variable
BAYBE_TELEMETRY_VPN_CHECK_TIMEOUT
for setting the connectivity check timeout
- Reorganized modules into subpackages
- Serialization no longer relies on cattrs' global converter
- Refined (un-)structuring logic
- Telemetry env variable
BAYBE_TELEMETRY_HOST
renamed toBAYBE_TELEMETRY_ENDPOINT
- Telemetry env variable
BAYBE_DEBUG_FAKE_USERHASH
renamed toBAYBE_TELEMETRY_USERNAME
- Telemetry env variable
BAYBE_DEBUG_FAKE_HOSTHASH
renamed toBAYBE_TELEMETRY_HOSTNAME
- Bumped cattrs version
- Now supports Python 3.11
- Removed
pyarrow
version pin TaskParameter
added to serialization test- Deserialization (e.g. from config) no longer silently drops unknown arguments
BayBE
class replaced withCampaign
baybe.surrogate
replaced withbaybe.surrogates
baybe.targets.Objective
replaced withbaybe.objective.Objective
baybe.strategies.Strategy
replaced withbaybe.strategies.TwoPhaseStrategy
- Linear in-/equality constraints over continuous parameters
- Constrained optimization for
SequentialGreedyRecommender
RandomRecommender
now supports linear in-/equality constraints via polytope sampling
- Include linting for all functions
- Rewrite functions to distinguish between private and public ones
- Unreachable telemetry endpoints now automatically disables telemetry and no longer cause any data submission loops
add_fake_results
utility now considers potential target bounds- Constraint names have been refactored to indicate whether they operate on discrete or continuous parameters
- Random recommendation failing for small discrete (sub-)spaces
- Deserialization issue with
TaskParameter
TaskParameter
for multitask modelling- Basic transfer learning capability using multitask kernels
- Advanced simulation mechanisms for transfer learning and search space partitioning
- Extensive docstring documentation in all files
- Autodoc using sphinx
- Script for automatic code documentation
- New
tox
environments for a full and a core-only pytest run
- Discrete subspaces require unique indices
- Simulation function signatures are redesigned (but largely backwards compatible)
- Docstring contents and style (numpy -> google)
- Regrouped additional dependencies
- Test environments for multiple Python versions via
tox
- Removed
environment.yml
- Telemetry host endpoint is now flexible via the environment variable
BAYBE_TELEMETRY_HOST
- Inference for
__version__
- Vulnerability check via
pip-audit
tests
dependency group
- Removed no longer required
fsspec
dependency
- Scipy vulnerability by bumping version to 1.10.1
- Missing
pyarrow
dependency
from_dataframe
convenience constructors for discrete and continuous subspacesfrom_bounds
convenience constructor for continuous subspacesempty
convenience constructors discrete and continuous subspacesbaybe
,strategies
andutils
namespace for convenient imports- Simple test for config validation
VarUCB
andqVarUCB
acquisition functions emulating maximum variance for active learning- Surrogate model serialization
- Surrogate model parameter passing
- Renamed
create
constructors tofrom_product
- Renamed
empty
checks for subspaces tois_empty
- Fixed inconsistent class names in surrogate.py
- Fixed inconsistent class names in parameters.py
- Cached recommendations are now private
- Parameters, targets and objectives are now immutable
- Adjusted comments in example files
- Accelerated the slowest tests
- Removed try blocks from config examples
- Upgraded numpy requirement to >= 1.24.1
- Requires
protobuf<=3.20.3
SearchSpace
parameters in surrogate models are now handled infit
- Dataframes are encoded in binary for serialization
comp_rep
is loaded directly from the serialization string
- Include scaling in FPS recommender
- Support for pandas>=2.0.0
- Constraints serialization
- A maximum of one
DependenciesConstraint
is allowed - Bumped numpy and matplotlib versions
- Code coverage check with pytest-cov
- Hybrid mode for
SequentialGreedyRecommender
- Removed support for infinite parameter bounds
- Removed not yet implemented MULTI objective mode
- Changelog assert in Azure pipeline
- Bug: telemetry could not be fully deactivated
Interval
class for representing parameter/target bounds- Activated mypy for the first few modules and fixed their type issues
- Automatic (de-)serialization and
SerialMixin
class - Basic serialization example, demo and tests
- Mechanisms for loading and validating config files
- Telemetry via OpenTelemetry
- More detailed package installation info
- Fallback mechanism for
NonPredictiveRecommender
- Introduce naive hybrid recommender
- Switched from pydantic to attrs in all modules except constraints.py
- Removed subclass initialization hooks and
type
attribute - Refactored class attributes and their conversion/validation/initialization
- Removed no longer needed
HashableDict
class - Refactored strategy and recommendation module structures
- Replaced dict-based configuration logic with object-based logic
- Overall versioning scheme and version inference for telemetry
- No longer using private telemetry imports
- Fixed package versions for dev tools
- Revised "Getting Started" section in README.md
- Revised examples
- Telemetry no longer crashing when package was not installed
- Tests for different search space types and their compatible recommenders
- Initial strategies converted to recommenders
- Config keyword
initial_strategy
replaced byinitial_recommender_cls
- Config keywords for the clustering recommenders changed from
x
toCLUSTERING_x
- skicit-learn-extra is now optional dependency in the [extra] group
- Type identifiers of greedy recommenders changed to 'SEQUENTIAL_GREEDY_x'
- Parameter bounds now only contain dimensions that actually appear in the search space
- Parsing for continuous parameters
- Caching of recommendations to avoid unnecessary computations
- Strategy support for hybrid spaces
- Custom discrete constraint with user-provided validator
- Parameter class hierarchy
SearchSpace
has now a discrete and continuous subspace- Model fit now done upon requesting recommendations
- Updated BoTorch and GPyTorch versions are also used in pyproject.toml
SearchSpace
class- Code testing with pytest
- Option to specify initial data for backtesting simulations
- SequentialGreedyRecommender class
- Switched from miniconda to micromamba in Azure pipeline
- BoTorch version upgrade to fix critical bug (pytorch/botorch#1454)
- Parameters cannot be initialized with duplicate values
- Initial strategy: Farthest Point Sampling
- Initial strategy: Partitioning Around Medoids
- Initial strategy: K-means
- Initial strategy: Gaussian Mixture Model
- Constraints and conditions for discrete parameters
- Data scaling functionality
- Decorator for automatic model scaling
- Decorator for handling constant targets
- Decorator for handling batched model input
- Surrogate model: Mean prediction
- Surrogate model: Random forrest
- Surrogate model: NGBoost
- Surrogate model: Bayesian linear
- Save/load functionality for BayBE objects
- UCB now usable as acquisition function, hard-set beta parameter to 1.0
- Temporary GP priors now exactly reproduce EDBO setting
- Code skeleton with a central object to access functionality
- Basic parser for categorical parameters with one-hot encoding
- Basic parser for discrete numerical parameters
- Azure pipeline for code formatting and linting
- Single-task Gaussian process strategy
- Streamlit dashboard for comparing single-task strategies
- Input functionality to read measurements including automatic matching to search space
- Integer encoding for categorical parameters
- Parser for numerical discrete parameters
- Single numerical target with Min and Max mode
- Recommendation functionality
- Parameter scaling depending on parameter types and user-chosen scalers
- Noise and fake-measurement utilities
- Internal metadata storing various info about datapoints in the search space
- BayBE options controlling recommendation and data addition behavior
- Config parsing and validation using pydantic
- Global random seed control
- Strategy connection with BayBE object
- Custom parameters as labels with user-provided encodings
- Substance parameters which are encoded via cheminformatics descriptors
- Data cleaning utilities useful for descriptors
- Simulation capabilities for testing the package on existing data
- Parsing and preprocessing for multiple targets / desirability ansatz
- Basic README file
- Automatic publishing of tagged versions
- Caching of experimental parameters and chemical descriptors
- Choices for acquisition functions and their usage with arbitrary surrogate models
- Temporary logic for selecting GP priors