Skip to content

Commit

Permalink
[Evaluation] Adding initial TSG and error message enhancement for rem…
Browse files Browse the repository at this point in the history
…ote tracking failure (Azure#38062)

* Handle storage access error

* add tsg link

* update

* add initial tsg

* fix lint issues

* Fix the link issues and broken tests

* fix the link issue again

* address comments

* update
  • Loading branch information
ninghu authored Oct 24, 2024
1 parent 328ae9c commit c400ce2
Show file tree
Hide file tree
Showing 5 changed files with 125 additions and 17 deletions.
5 changes: 3 additions & 2 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@
- Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
- AdversarialScenario enum does not include `ADVERSARIAL_INDIRECT_JAILBREAK`, invoking IndirectJailbreak or XPIA should be done with `IndirectAttackSimulator`
- Outputs of `Simulator` and `AdversarialSimulator` previously had `to_eval_qa_json_lines` and now has `to_eval_qr_json_lines`. Where `to_eval_qa_json_lines` had:
```json
```json
{"question": <user_message>, "answer": <assistant_message>}
```
`to_eval_qr_json_lines` now has:
```json
```json
{"query": <user_message>, "response": assistant_message}
```

Expand All @@ -32,6 +32,7 @@
- `GroundednessEvaluator`
- `SimilarityEvaluator`
- `RetrievalEvaluator`
- Improved the error message for storage access permission issues to provide clearer guidance for users.

## 1.0.0b4 (2024-10-16)

Expand Down
50 changes: 50 additions & 0 deletions sdk/evaluation/azure-ai-evaluation/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Troubleshoot AI Evaluation SDK Issues

This guide walks you through how to investigate failures, common errors in the `azure-ai-evaluation` SDK, and steps to mitigate these issues.

## Table of Contents

- [Handle Evaluate API Errors](#handle-evaluate-api-errors)
- [Troubleshoot Remote Tracking Issues](#troubleshoot-remote-tracking-issues)
- [Safety Metric Supported Regions](#safety-metric-supported-regions)
- [Handle Simulation Errors](#handle-simulation-errors)
- [Adversarial Simulation Supported Regions](#adversarial-simulation-supported-regions)
- [Logging](#logging)
- [Get additional help](#get-additional-help)

## Handle Evaluate API Errors

### Troubleshoot Remote Tracking Issues

- Before running `evaluate()`, to ensure that you can enable logging and tracing to your Azure AI project, make sure you are first logged in by running `az login`.
- Then install the following sub-package:

```Shell
pip install azure-ai-evaluation[remote]
```

- Ensure that you assign the proper permissions to the storage account linked to your Azure AI Studio hub. This can be done with the following command. More information can be found [here](https://review.learn.microsoft.com/azure/ai-studio/how-to/disable-local-auth).

```Shell
az role assignment create --role "Storage Blob Data Contributor" --scope /subscriptions/<mySubscriptionID>/resourceGroups/<myResourceGroupName> --assignee-principal-type User --assignee-object-id "<user-id>"
```

- Additionally, if you're using a virtual network or private link, and your evaluation run upload fails because of that, check out this [guide](https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network#access-data-using-the-studio).
### Safety Metric Supported Regions
Risk and safety evaluators depend on the Azure AI Studio safety evaluation backend service. For a list of supported regions, please refer to the documentation [here](https://aka.ms/azureaisafetyeval-regionsupport).
## Handle Simulation Errors
### Adversarial Simulation Supported Regions
Adversarial simulators use Azure AI Studio safety evaluation backend service to generate an adversarial dataset against your application. For a list of supported regions, please refer to the documentation [here](https://aka.ms/azureaiadvsimulator-regionsupport).
## Logging
You can set logging level via environment variable `PF_LOGGING_LEVEL`, valid values includes `CRITICAL`, `ERROR`, `WARNING`, `INFO`, `DEBUG`, default to `INFO`.
## Get Additional Help
Additional information on ways to reach out for support can be found in the [SUPPORT.md](https://github.com/Azure/azure-sdk-for-python/blob/main/SUPPORT.md) at the root of the repo.
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from azure.ai.evaluation._version import VERSION
from azure.core.pipeline.policies import RetryPolicy
from azure.core.rest import HttpResponse
from azure.core.exceptions import HttpResponseError

LOGGER = logging.getLogger(__name__)

Expand Down Expand Up @@ -443,10 +444,26 @@ def log_artifact(self, artifact_folder: str, artifact_name: str = EVALUATION_ART
datastore = self._ml_client.datastores.get_default(include_secrets=True)
account_url = f"{datastore.account_name}.blob.{datastore.endpoint}"
svc_client = BlobServiceClient(account_url=account_url, credential=self._get_datastore_credential(datastore))
for local, remote in zip(local_paths, remote_paths["paths"]):
blob_client = svc_client.get_blob_client(container=datastore.container_name, blob=remote["path"])
with open(local, "rb") as fp:
blob_client.upload_blob(fp, overwrite=True)
try:
for local, remote in zip(local_paths, remote_paths["paths"]):
blob_client = svc_client.get_blob_client(container=datastore.container_name, blob=remote["path"])
with open(local, "rb") as fp:
blob_client.upload_blob(fp, overwrite=True)
except HttpResponseError as ex:
if ex.status_code == 403:
msg = (
"Failed to upload evaluation run to the cloud due to insufficient permission to access the storage."
" Please ensure that the necessary access rights are granted."
)
raise EvaluationException(
message=msg,
target=ErrorTarget.EVAL_RUN,
category=ErrorCategory.FAILED_REMOTE_TRACKING,
blame=ErrorBlame.USER_ERROR,
tsg_link="https://aka.ms/azsdk/python/evaluation/remotetracking/troubleshoot",
) from ex

raise ex

# To show artifact in UI we will need to register it. If it is a promptflow run,
# we are rewriting already registered artifact and need to skip this step.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

import pandas as pd
from promptflow._sdk._constants import LINE_NUMBER
from promptflow._sdk._errors import MissingAzurePackage
from promptflow._sdk._errors import MissingAzurePackage, UserAuthenticationError, UploadInternalError
from promptflow.client import PFClient
from promptflow.entities import Run

Expand Down Expand Up @@ -416,15 +416,31 @@ def _apply_target_to_data(
_run_name = kwargs.get("_run_name")
upload_target_snaphot = kwargs.get("_upload_target_snapshot", False)

with TargetRunContext(upload_target_snaphot):
run: Run = pf_client.run(
flow=target,
display_name=evaluation_name,
data=data,
properties={EvaluationRunProperties.RUN_TYPE: "eval_run", "isEvaluatorRun": "true"},
stream=True,
name=_run_name,
)
try:
with TargetRunContext(upload_target_snaphot):
run: Run = pf_client.run(
flow=target,
display_name=evaluation_name,
data=data,
properties={EvaluationRunProperties.RUN_TYPE: "eval_run", "isEvaluatorRun": "true"},
stream=True,
name=_run_name,
)
except (UserAuthenticationError, UploadInternalError) as ex:
if "Failed to upload run" in ex.message:
msg = (
"Failed to upload the target run to the cloud. "
"This may be caused by insufficient permission to access storage or other errors."
)
raise EvaluationException(
message=msg,
target=ErrorTarget.EVALUATE,
category=ErrorCategory.FAILED_REMOTE_TRACKING,
blame=ErrorBlame.USER_ERROR,
tsg_link="https://aka.ms/azsdk/python/evaluation/remotetracking/troubleshoot",
) from ex

raise ex

target_output: pd.DataFrame = pf_client.runs.get_details(run, all_results=True)
# Remove input and output prefix
Expand Down Expand Up @@ -620,7 +636,17 @@ def evaluate(
internal_message=error_message,
target=ErrorTarget.EVALUATE,
category=ErrorCategory.FAILED_EXECUTION,
blame=ErrorBlame.UNKNOWN,
blame=ErrorBlame.USER_ERROR,
) from e

# Ensure a consistent user experience when encountering errors by converting
# all other exceptions to EvaluationException.
if not isinstance(e, EvaluationException):
raise EvaluationException(
message=str(e),
target=ErrorTarget.EVALUATE,
category=ErrorCategory.FAILED_EXECUTION,
blame=ErrorBlame.SYSTEM_ERROR,
) from e

raise e
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class ErrorCategory(Enum):
* FAILED_EXECUTION -> Execution failed
* SERVICE_UNAVAILABLE -> Service is unavailable
* MISSING_PACKAGE -> Required package is missing
* FAILED_REMOTE_TRACKING -> Remote tracking failed
* UNKNOWN -> Undefined placeholder. Avoid using.
"""

Expand All @@ -33,6 +34,7 @@ class ErrorCategory(Enum):
FAILED_EXECUTION = "FAILED_EXECUTION"
SERVICE_UNAVAILABLE = "SERVICE UNAVAILABLE"
MISSING_PACKAGE = "MISSING PACKAGE"
FAILED_REMOTE_TRACKING = "FAILED REMOTE TRACKING"
UNKNOWN = "UNKNOWN"


Expand Down Expand Up @@ -90,6 +92,8 @@ class EvaluationException(AzureError):
:type category: ~azure.ai.evaluation._exceptions.ErrorCategory
:param blame: The source of blame for the error, defaults to Unknown.
:type balance: ~azure.ai.evaluation._exceptions.ErrorBlame
:param tsg_link: A link to the TSG page for troubleshooting the error.
:type tsg_link: str
"""

def __init__(
Expand All @@ -100,10 +104,20 @@ def __init__(
target: ErrorTarget = ErrorTarget.UNKNOWN,
category: ErrorCategory = ErrorCategory.UNKNOWN,
blame: ErrorBlame = ErrorBlame.UNKNOWN,
tsg_link: Optional[str] = None,
**kwargs,
) -> None:
self.category = category
self.target = target
self.blame = blame
self.internal_message = internal_message
self.tsg_link = tsg_link
super().__init__(message, *args, **kwargs)

def __str__(self):
error_blame = "InternalError" if self.blame != ErrorBlame.USER_ERROR else "UserError"
msg = f"({error_blame}) {super().__str__()}"
if self.tsg_link:
msg += f"\nVisit {self.tsg_link} to troubleshoot this issue."

return msg

0 comments on commit c400ce2

Please sign in to comment.