Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling a SageMaker step corrupts part of the object. #3272

Closed
DRKolev-code opened this issue Aug 1, 2022 · 3 comments
Closed

Pickling a SageMaker step corrupts part of the object. #3272

DRKolev-code opened this issue Aug 1, 2022 · 3 comments
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform

Comments

@DRKolev-code
Copy link

Describe the bug
This issue is related to boto/boto3#3365. I have been trying to pickle and unpickle a SageMaker step and the relevant part for this repo is the fact that after unpickling a step the step.step_type becomes CONDITION from what it was originally:

Original:
ProcessingStep(name='calls', display_name=None, description='Processing missing calldata.', step_type=<StepTypeEnum.PROCESSING: 'Processing'>, depends_on=None)

Becomes:
ProcessingStep(name='calls', display_name=None, description='Processing missing calldata.', step_type=<StepTypeEnum.CONDITION: 'Condition'>, depends_on=None)

To reproduce
I cannot give the complete code for the ProcessingStep but I think any initialization would produce the bug.

import dill
from dill import loads, dumps
import copyreg

def save_sslcontext(obj):
                return obj.__class__, (obj.protocol,)

copyreg.pickle(ssl.SSLSocket, _reduce_socket, _rebuild_socket)
copyreg.pickle(ssl.SSLContext, save_sslcontext)

# Creating a ProcessingStep. I cannot share the script but I imagine any processing step will have the same behaviour.
step= ProcessingStep(
            name='calls',
            description='Processing missing calldata.',
            processor=script_processor,
            cache_config=cache_config,
            code=(Path(__file__).parent / 'scripts' / 'calls.py').as_posix(),
            job_arguments=[
                '--catalog_variables',
                json.dumps(catalog_variables),
            ],
        )

print(step)
print(loads(dumps(step)

Expected behavior
I expect once unpickled the object to not have corrupted fields. the step.step_type should not change.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.98.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): ProcessingStep, TuningStep, probably all of the other ones.
  • Framework version: na
  • Python version: 3.10
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Either

Additional context
To pickle a SageMaker step sometimes you have to add the following code as well. This is due to botocore and related to the boto/boto3#3365 issue.

# Stuff I changed to make `dill `work.
def __getattr__client(self, item):
	if item not in vars(self):
		raise AttributeError

	event_name = "getattr.%s.%s" % (self._service_model.service_id.hyphenize(), item)
	handler, event_response = self.meta.events.emit_until_response(
		event_name, client=self
	)

	if event_response is not None:
		return event_response

	raise AttributeError(
		"'%s' object has no attribute '%s'" % (self.__class__.__name__, item)
	)

botocore.client.BaseClient.__getattr__ = __getattr__client

def __getattr__errorfactory(self, name):
	if name not in vars(self):
		raise AttributeError

	exception_cls_names = [
		exception_cls.__name__ for exception_cls in self._code_to_exception.values()
	]
	raise AttributeError(
		"%r object has no attribute %r. Valid exceptions are: %s"
		% (self, name, ", ".join(exception_cls_names))
	)

botocore.errorfactory.BaseClientExceptions.__getattr__ = __getattr__errorfactory

def save_sslcontext(obj):
                return obj.__class__, (obj.protocol,)

copyreg.pickle(ssl.SSLSocket, _reduce_socket, _rebuild_socket)
copyreg.pickle(ssl.SSLContext, save_sslcontext)
@staubhp staubhp added the component: pipelines Relates to the SageMaker Pipeline Platform label Aug 1, 2022
@navaj0
Copy link
Contributor

navaj0 commented Sep 1, 2022

There is a bug in this metaclass for the StepTypeEnum: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/entities.py#L41

Basically, StepTypeEnum("any string value") always returns the StepTypeEnum.CONDITION. It is not expected behavior.

@navaj0
Copy link
Contributor

navaj0 commented Sep 1, 2022

Could you elaborate on your use case? Why would you like to pickle/unpickle the pipeline objects? If you like to share the pipeline definition with someone else, you either share the source code or do pipeline.definition() to compile the pipeline into a Json and share.

Even though we fix the StepTypeEnum bug, I still see a couple of problems with pickling the pipeline objects. For example, dill couldn't handle Enum object: uqfoundation/dill#250

@brockwade633
Copy link
Contributor

Closing this issue as the underlying StepTypeEnum bug has been fixed, and other questions have been abandoned. Feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform
Projects
None yet
Development

No branches or pull requests

4 participants