Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Set terminate Option for user #985

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions src/generators.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,9 @@ Generator::Generator(const Model& model, const GeneratorParams& params) : model_
}

void Generator::ComputeLogits() {
if (state_->params_->session_terminated) {
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
throw std::runtime_error("Session in Terminated state, exiting!");
}
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
if (computed_logits_)
throw std::runtime_error("ComputeLogits called again without calling GenerateNextToken first");

Expand All @@ -172,6 +175,9 @@ void Generator::ComputeLogits() {
}

bool Generator::IsDone() const {
if (state_->params_->session_terminated) {
throw std::runtime_error("Session in Terminated state, exiting!");
}
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
if (computed_logits_)
throw std::runtime_error("IsDone() can't be called in the middle of processing logits");

Expand All @@ -184,6 +190,9 @@ bool Generator::IsDone() const {
}

void Generator::GenerateNextToken() {
if (state_->params_->session_terminated) {
throw std::runtime_error("Session in Terminated state, exiting!");
}
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
if (!computed_logits_)
throw std::runtime_error("Must call ComputeLogits before GenerateNextToken");
computed_logits_ = false;
Expand Down
1 change: 1 addition & 0 deletions src/generators.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ struct GeneratorParams : std::enable_shared_from_this<GeneratorParams>, LeakChec
int batch_size{1};
int max_batch_size{0};
bool use_cuda_graph{};
mutable bool session_terminated{false};
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
int sequence_length{};
int BatchBeamSize() const { return search.num_beams * batch_size; }

Expand Down
20 changes: 18 additions & 2 deletions src/models/model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ static std::string CurrentModulePath() {

namespace Generators {

State::State(const GeneratorParams& params, const Model& model)
State::State(const GeneratorParams& params_, const Model& model)
ajindal1 marked this conversation as resolved.
Show resolved Hide resolved
: model_{model},
params_{params.shared_from_this()} {}
params_{params_.shared_from_this()} {}

void State::Run(OrtSession& session, OrtRunOptions& run_options, int new_batch_size) {
auto captured_graph_info = GetCapturedGraphInfo();
Expand Down Expand Up @@ -76,7 +76,20 @@ void State::Run(OrtSession& session, OrtRunOptions& run_options, int new_batch_s
}
Copy link
Member

@RyanUnderhill RyanUnderhill Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In session.Run(...) above on line 58, we don't do anything on termination?

The issue is that there's a potential error race. If the user calls 'SetTerminate()' and then a non terminate error was thrown from session.Run() the user will not know it was a non termination related error (unless they look at the string).

Can we detect termination from session.Run? If so, we should rename session_terminated to session_terminate_set and have a second variable for "session_terminated" for if we actually hit the termination case (or if we call ThrowErrorIfSessionTerminated, as that will catch the non session.Run terminate cases).

This way we separate requesting termination from hitting termination, and IsTerminated() should only return true if we hit termination. This does mean that IsTerminated() will return false after SetTerminate() is called, and won't return true until a function is called that checks for termination. This could be a problem.

@baijumeswani can review my thinking also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an interesting question. I believe the condition you are mentioning is rare, if the session is already terminated then it might be possible that the error might be different because of it being in terminated state. Also, if the error occurs because of some other reason, it should be produced in another run when the processing happens in non-terminated state and can be caught at that time.

}

void State::SetTerminate() {
params_->session_terminated = true;
model_.run_options_->SetTerminate();
}

void State::UnsetTerminate() {
params_->session_terminated = false;
model_.run_options_->UnsetTerminate();
}

OrtValue* State::GetInput(const char* name) {
if (params_->session_terminated) {
throw std::runtime_error("Session in Terminated state, exiting!");
}
for (size_t i = 0; i < input_names_.size(); i++) {
if (std::strcmp(input_names_[i], name) == 0) {
return inputs_[i];
Expand All @@ -86,6 +99,9 @@ OrtValue* State::GetInput(const char* name) {
}

OrtValue* State::GetOutput(const char* name) {
if (params_->session_terminated) {
throw std::runtime_error("Session in Terminated state, exiting!");
}
for (size_t i = 0; i < output_names_.size(); i++) {
if (std::strcmp(output_names_[i], name) == 0) {
return outputs_[i];
Expand Down
2 changes: 2 additions & 0 deletions src/models/model.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ struct State {
virtual const CapturedGraphInfo* GetCapturedGraphInfo() const { return nullptr; }
virtual void Finalize() {}

void SetTerminate();
void UnsetTerminate();
OrtValue* GetInput(const char* name);

virtual OrtValue* GetOutput(const char* name);
Expand Down
8 changes: 8 additions & 0 deletions src/ort_genai.h
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,14 @@ struct OgaGenerator : OgaAbstract {
OgaCheckResult(OgaGenerator_GenerateNextToken(this));
}

void SetTerminate() {
OgaCheckResult(OgaGenerator_SetTerminate(this));
}

void UnsetTerminate() {
OgaCheckResult(OgaGenerator_UnsetTerminate(this));
}

size_t GetSequenceCount(size_t index) const {
return OgaGenerator_GetSequenceCount(this, index);
}
Expand Down
16 changes: 16 additions & 0 deletions src/ort_genai_c.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,22 @@ OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken(OgaGenerator* generator)
OGA_CATCH
}

OgaResult* OGA_API_CALL OgaGenerator_SetTerminate(OgaGenerator* oga_generator) {
OGA_TRY
auto& generator = *reinterpret_cast<const Generators::Generator*>(oga_generator);
generator.state_->SetTerminate();
return nullptr;
OGA_CATCH
}

OgaResult* OGA_API_CALL OgaGenerator_UnsetTerminate(OgaGenerator* oga_generator) {
OGA_TRY
auto& generator = *reinterpret_cast<const Generators::Generator*>(oga_generator);
generator.state_->UnsetTerminate();
return nullptr;
OGA_CATCH
}

OgaResult* OGA_API_CALL OgaGenerator_GetOutput(const OgaGenerator* oga_generator, const char* name, OgaTensor** out) {
OGA_TRY
auto& generator = *reinterpret_cast<const Generators::Generator*>(oga_generator);
Expand Down
2 changes: 2 additions & 0 deletions src/ort_genai_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,8 @@ OGA_EXPORT bool OGA_API_CALL OgaGenerator_IsDone(const OgaGenerator* generator);
*/
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_ComputeLogits(OgaGenerator* generator);
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken(OgaGenerator* generator);
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_SetTerminate(OgaGenerator* generator);
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_UnsetTerminate(OgaGenerator* generator);

/*
* \brief Returns a copy of the model output identified by the given name as an OgaTensor on CPU. The buffer is owned by returned OgaTensor
Expand Down
Loading