Dialogue is a client-side library for HTTP-based RPC, designed to work well with Conjure-defined APIs.


  • ConcurrencyLimiters: additive increase multiplicative decrease (AIMD) concurrency limiters ensure bursty traffic doesn't overload upstream servers.
  • Client-side node selection: by making load balancing decisions in the client, Dialogue avoids the necessity for an L7 proxy (and its associated latency penalty).
  • Queue: in the case where all nodes are limited (e.g. during a spike in traffic), requests are added to a FIFO queue and processed as soon as the one of the ConcurrencyLimiters has capacity.
  • Retries: requests are retried a constant number of times, if possible.
  • Live reloading: uris can be added or removed without losing ConcurrencyLimiter or node selection states.
  • Content decoding: JSON, SMILE and CBOR are supported by default, with user-defined encodings also supported.
  • Streaming: requests and responses are streamed without buffering the entire body into memory.


  • Zipkin-style tracing: internal operations are instrumented using Zipkin-style tracing-java spans, and X-B3-TraceId headers are propagated
  • Metrics: Timers, meters, and gauges are defined using metric-schema and stored in a Tritium TaggedMetricRegistry.
  • Structured logging: SLF4J logs are designed to be rendered as JSON, with every parameter declaratively named.


Dialogue works best with Conjure-generated client bindings, i.e. for a given Conjure-defined FooService, the conjure-java code generator produces two java interfaces: FooServiceBlocking and FooServiceAsync. See the conjure-java generated client bindings section below for more details.

Production usage

Your server framework should provide an abstraction to create clients that handle uri live-reloading and reuse connection pools. For example, in Witchcraft, you can create a FooServiceBlocking like so:

FooServiceBlocking fooService = witchcraft.conjureClients().client(FooServiceBlocking.class, "foo-service").get();

// network call:
List<Item> items = fooService.getItems();

The non-blocking instance can be constructed similarly:

FooServiceAsync fooService = witchcraft.conjureClients().client(FooServiceAsync.class, "foo-service").get();

ListenableFuture<List<Item>> items = fooService.getItems();

Under the hood

If the Witchcraft method above is not available, you can construct clients manually using a factory provided by com.palantir.dialogue:dialogue-clients.

DialogueClients.ReloadingFactory clients = DialogueClients.create(servicesConfigBlockRefreshable).withUserAgent(agent);

FooServiceBlocking client = clients.get(FooServiceBlocking.class, "foo-service");
FooServiceAsync client2 = clients.get(FooServiceAsync.class, "foo-service");

It's important to construct the ReloadingFactory once for the lifetime of your JVM, and construct all clients from this single clientfactory instance. This ensures that underlying Apache connection pools are reused correctly, and that many clients all talking to the same URI will share the same concurrency limiter.

This abstraction uses Apache HttpClient under the hood, as it is a reliable and performant HTTP client. (See alternatives below.)

It is possible to construct a DialogueChannel manually which allows you to use alternative raw clients such as OkHttp instead of Apache.

conjure-java generated client bindings

Dialogue works best with generated client bindings, i.e. for a given Conjure-defined FooService, the conjure-java code generator can produce two java interfaces: FooServiceBlocking and FooServiceAsync. Generating these at compile-time means that making a request involves zero reflection - all serializers and deserializers are already set up in advance, so that zero efficiency compromises are made. A sample getThing endpoint with some path params, query params and a request body looks like this:

public ListenableFuture<Thing> getThing(
        AuthHeader authHeader, String pathParam, List<ResourceIdentifier> queryKey, MyRequest body) {
    Request.Builder _request = Request.builder();
    _request.putHeaderParams("Authorization", plainSerDe.serializeBearerToken(authHeader.getBearerToken()));
    _request.putPathParams("path", plainSerDe.serializeString(pathParam));
    for (ResourceIdentifier queryKeyElement : queryKey) {
        _request.putQueryParams("queryKey", plainSerDe.serializeRid(queryKeyElement));
    return runtime.clients()
            .call(channel, DialogueSampleEndpoints.getThing,, thingDeserializer);

Blocking or async

Of the two generated interfaces FooServiceBlocking and FooServiceAync, the blocking version is usually appropriate for 98% of use-cases, and results in much simpler control flow and error-handling. The async version returns Guava ListenableFutures, making it a lot more fiddly to use. Futures.addCallback and FluentFuture are your friends here.

dialogue-annotations-processor generated client bindings

dialogue-annotations-processor is a retrofit replacement for use-cases where a service needs to talk to a non-conjure server.

To set up the annotation, simply add (make sure you are using gradle-processors):

dependencies {
    annotationProcessor 'com.palantir.dialogue:dialogue-annotations-processor'
    implementation 'com.palantir.dialogue:dialogue-annotations'

Next, create an annotated interface that describes the service you need to talk to, appropriately annotated with @DialogueService:

import com.palantir.dialogue.DialogueService;
import com.palantir.dialogue.HttpMethod;
import com.palantir.dialogue.RequestBody;
import com.palantir.dialogue.Response;
import com.palantir.dialogue.annotations.Request;
import java.util.OptionalInt;
import java.util.UUID;

public interface MyService {

    @Request(method = HttpMethod.POST, path = "/params/{myPathParam}/{myPathParam2}", accept=MyCustomResponseDeserializer.class)
    MyCustomResponse params(
            @Request.QueryParam("q") String query,
            // Path parameter variable name must match the request path component
            @Request.PathParam UUID myPathParam,
            @Request.PathParam(encoder = MyCustomParamTypeEncoder.class) MyCustomParamType myPathParam2,
            // converts a custom type into a Multimap<String, String>
            @Request.QueryMap(encoder = MyCustomQueryTypeEncoder.class) MyCustomQueryParamType myCustomQueryParam,
            @Request.Header("Custom-Header") int requestHeaderValue,
            // Headers can be optional
            @Request.Header("Custom-Optional-Header") OptionalInt maybeRequestHeaderValue,
            // converts a custom type into a Multimap<String, String>
            @Request.HeaderMap(encoder = MyCustomHeaderTypeEncoder.class) MyCustomHeaderParamType myCustomHeaderParam,
            // Custom encoding classes may be provided for the request and response.
            @Request.Body(MySerializableTypeBodySerializer.class) MySerializableType body);


  • Async request handling: simply make your method return ListenableFuture<Foo>.
  • Custom parameter types: @Request.(Header|PathParam|QueryParam)(encoder=MyCustomParamTypeEncoder.class).
  • Custom serialization/deserialization: add @Request.Body(MySerializableTypeBodySerializer.class) or @Request(accept=MyCustomResponseDeserializer.class).
  • Custom serialization from Map, Multimap, and custom types into query parameters and header parameters. This functions similarly to the Feign QueryMap and HeaderMap features, but with added control of customizing the serialization.
  • Authentication: builtin Authorization header handling if an annotated method has an AuthHeader parameter.

See more examples on how to define clients and use the generated code.


Dialogue is built around the Channel abstraction, with many different internal implementations that often add a little bit of behaviour and then delegate to another inner Channel.

public interface Channel {
    ListenableFuture<Response> execute(Endpoint endpoint, Request request);

For example, the TraceEnrichingChannel just augments the request with zipkin-style tracing headers and then calls a delegate.

This API is influenced by gRPC's Java library, which has a similar Channel concept.


Concurrency limits

Every request passes through a pair of AIMD concurrency limiters. There are two types of concurrency limiter: per-host, and per-endpoint. Each concurrency limiter operates in conjunction with a queue to stage pending requests until a permit becomes available.

Limiter Diagram

+---------+   +----------+      +-------------+    +--------------------+   +--------------------------+   +----------------------------+
| Request +-->+Request Queue+-->+Node Selector+--->+Host Limiter (node0)+-->+Endpoint Queue(node0,ping)+-->+Endpoint Limiter(node0,ping)+---------+
+---------+   +----------+      +--------------+   +---------------------+  +--------------------------+   +----------------------------+         |
                                               |                         |                                                                        |
                                               |                         |  +---------------------------+  +-----------------------------+        |
                                               |                         +->+Endpoint Queue(node0,hello)+->+Endpoint Limiter(node0,hello)+----v   v
                                               |                            +---------------------------+  +-----------------------------+   +----+-------+
                                               |                                                                                             |HTTP Request|
                                               |                                                                                             +--+-+-------+
                                               |   +--------------------+   +--------------------------+   +----------------------------+       ^ ^
                                               +-->+Host Limiter (node1)+-->+Endpoint Queue(node1,ping)+-->+Endpoint Limiter(node1,ping)+-------+ |
                                                   +---------------------+  +--------------------------+   +----------------------------+         |
                                                                         |                                                                        |
                                                                         |  +---------------------------+  +-----------------------------+        |
                                                                         +->+Endpoint Queue(node1,hello)+->+Endpoint Limiter(node1,hello)+--------+
                                                                            +---------------------------+  +-----------------------------+

Host limits

Each host has a concurrency limiter which protects servers by stopping requests getting out the door on the client-side. Permits are decreased after receiving 308 or 501-599 response, or encountering a network error (IOException). 429 or 500 responses have no change. Otherwise, permits are increased.

Host limits are based on failures that indicate the target host overall is in a degraded state.

Endpoint limits

Each endpoint has a concurrency limiter which is distinct for each host. This allows servers to provide per-endpoint backpressure in the form of 429 status QoS responses. Permits are decreased after receiving a 429 or 500 response code. 501-599 responses have no change. Otherwise, permits are increased.

Endpoint limits are based on failures that are coupled to an individual endpoint.

Node selection strategies

When configured with multiple uris, Dialogue has several strategies for choosing which upstream to route requests to. The default strategy is PIN_UNTIL_ERROR, although users can choose alternatives such as ROUND_ROBIN when building a ClientConfiguration object. Note that the choice of an appropriate strategy usually depends on the upstream server's behaviour, i.e. if its performance relies heavily on warm caches, or if successive requests must land on the same node to successfully complete a transaction. To solve this problem without needing code changes in all clients, servers can recommend a NodeSelectionStrategy (see below).

Server-recommended node selection strategies

Servers can inform clients of their recommended strategies by including the Node-Selection-Strategy response header. Values are separated by commas and are ordered by preference. See available strategies.

Node-Selection-Strategy: BALANCED,PIN_UNTIL_ERROR

When the header is present, it takes precedence over user-selected strategies. Servers are free to omit this value.


Used to balance requests across many servers better than the default PIN_UNTIL_ERROR. The actual algorithm has evolved from naive Round Robin, then to Random Selection and now makes smarter decisions based on stats about each host (see This fixes a dramatic failure mode when a single server is very slow (this can be seen empirically in the simulations). Note that unlike concurrency limiters, this node selection strategy never prevents a request getting out the door, it just ranks hosts to try to deliver the best possible client-perceived response time (and success rate).

Specifically, it keeps track of the number of in flight requests for each host, and also records every failure it sees for each host. A request is then routed to the host with the lowest inflight + 10*recent_failures.

The ROUND_ROBIN strategy is not appropriate for transactional use cases where successive requests must land on the same node, and it's also not optimal for use-cases where there are many nodes and cache affinity is very important.

Sticky requests

Dialogue channels can be configured to stick to a single host: after the first request is successfully executed on a host, all subsequent requests will be routed to the same host. This strategy is useful for transactional workflows, where all requests tied to a particular transaction may need to be executed on the same host.

The implementation reuses the same limiter pipeline, with some adjustments (simplified to show a single host/endpoint only):

Sticky Request +-->+Per Sticky Channel queue+-->+Per Sticky Channel Limiter+-->+Node Selector+--->+Host Limiter (node0)+-->+Endpoint Queue(node0,ping)+-->+Endpoint Limiter(node0,ping)

Each sticky channel gets its own queue. If a sticky channel has no requests in-flight, Host Limiter is sidestepped (the request is let through regardless of current concurrency limits). This means there is a potential for many low-bandwidth sticky channels to compete with regular channels.

Supported Per-Endpoint Conjure Tags

  • dialogue-disable-endpoint-concurrency-limiting: Opts a single endpoint out of per-endpoint concurrency limiting, however per-host concurrency limiting continues to apply!
  • prefer-compressed-response: Forces requests to always include Accept-Encoding: gzip, rather than attempting to opt out of response compression for in-environment requests. This usually shouldn't be used because compression can be much more expensive than network transfer.
  • compress-request: Request bodies are gzip compressed. This requires prior knowledge that the receiving server handles Content-Encoding: gzip request bodies.

Alternative HTTP clients

Dialogue is not coupled to a single HTTP client library - this repo contains implementations based on OkHttp, Java's HttpURLConnection, the new Java11 HttpClient as well as the aforementioned Apache HttpClient. We endorse the Apache client because it performed the best in our benchmarks and affords granular control over connection pools.


Dialogue is the product of years of learning from operating thousands of Java servers across hundreds of deployments. Previous incarnations relied on Feign, Retrofit2 and OkHttp.


