[FEAT]: Unify evaluation prompt and episode rendering for human readers #164

XuhuiZhou · 2024-08-06T22:12:52Z

Description

Human readers have the potential to serve as judges. However, the discrepancy between evaluation prompt episode rendering for human readers causes trouble on the issue.

Additional Information

I am specifically talking about:

class EpisodeLog(JsonModel):
...
    def render_for_humans(self) -> tuple[list[AgentProfile], list[str]]:
...

And there's discrepancy between how evaluation prompt is composed:

    @gin.configurable
    @beartype
    async def __acall__(
        self,
        turn_number: int,
        messages: list[tuple[str, Message]] | None,
        history: str = "",
        temperature: float = 0.0,
    ) -> list[tuple[str, tuple[tuple[str, int | float | bool], str]]]:
        # filter did nothing
        if not history and messages:
            messages_filtered = [
                (x, y)
                for x, y in messages
                if "did nothing" not in y.to_natural_language()
            ]
            history = "\n".join(
                [
                    (
                        f"{x} {y.to_natural_language()}"
                        if x != "Environment"
                        else y.to_natural_language()
                    )
                    for x, y in messages_filtered
                ]
            )

XuhuiZhou added enhancement New feature or request priority labels Aug 6, 2024

XuhuiZhou mentioned this issue Aug 6, 2024

[BUG]: Reward prompt log is wrong due to the use of shared instance member across different coroutines #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Unify evaluation prompt and episode rendering for human readers #164

[FEAT]: Unify evaluation prompt and episode rendering for human readers #164

XuhuiZhou commented Aug 6, 2024

[FEAT]: Unify evaluation prompt and episode rendering for human readers #164

[FEAT]: Unify evaluation prompt and episode rendering for human readers #164

Comments

XuhuiZhou commented Aug 6, 2024

Description

Additional Information