Frequently Asked Questions


To enable operators to quickly diagnose Loggregator related issues.

This FAQ will try and consolidate some helpful troubleshooting steps to acknowledge some common questions that Loggregator has received.


Q: How can I debug my Loggregator components?

Loggregator is a complex subcomponent of Cloud Foundry with many components on its own. We'll try to describe how we can better help you troubleshoot Loggregator in case you are having problems seeing your logs.

Q: How can I check the health of my etcd cluster?

Metron uses etcd for service discovery to find the Doppler cluster. If metron is unable to read from etcd OR if the Dopplers are not able to properly advertise themselves via etcd, then metron will panic.

# Run the following curl for each etcd node
curl -vvv http://<etcd server>:4001/v2/stats/leader

Make sure that there is only one leader for all the nodes. Unfortunately, we've come across a scenario where the etcdctl tool will state that the cluster is healthy but it could be in a state where there are multiple leaders. This could be caused due to a network partition.

The fastest way to resolve this issue is by restarting each etcd node one at a time so that the cluster can achieve quorum.

Once the etcd cluster is restarted and restored, the dopplers and metrons will need to be restarted as well to ensure they are properly communicating with the etcd cluster.

Q: How do I get etcd data when it is in TLS mode?

If your CF environment has etcd deployed in TLS mode, you will no longer be able to simply curl the data out. Here are a few steps in order to get the data out to help troubleshoot.

  1. bosh ssh etcd_z1/0
  2. cd /var/vcap/packages/etcd/
  3. In order to get the list of available keys,
./etcdctl \
--cert-file /var/vcap/jobs/etcd/config/certs/client.crt \
--key-file /var/vcap/jobs/etcd/config/certs/client.key \
--ca-file /var/vcap/jobs/etcd/config/certs/server-ca.crt \
-C \
ls doppler/meta --recursive

You should see output similar to the output below

  1. Get the value of a key,
./etcdctl \ 
--cert-file /var/vcap/jobs/etcd/config/certs/client.crt \ 
--key-file /var/vcap/jobs/etcd/config/certs/client.key \
--ca-file /var/vcap/jobs/etcd/config/certs/server-ca.crt \
-C \
get /doppler/meta/z1/doppler_z1/e27e8ab6-e29c-446d-a0dd-c692c7d16dd1

Note: The value can be found within the EtcdUrls property in the config files. For example, /var/vcap/jobs/doppler/config/doppler.json

Q: How do I disable UAA for the Traffic Controller?

Traffic Controller has a property in its spec called traffic_controller.disable_access_control.

By default this is false. This is not a config property but rather a flag passed in to the traffic controller. See here.

Setting this property will make the logAccessAuthorizer and the adminAuthorizer always allow access to the app logs and firehose.

This feature was originally created so that Loggregator could be used in Lattice.

Q: Why do I get this can't forward message: loggregator client pool is empty error?

This error message shows up in the Metron logs if it doesn't have any registered Dopplers in its client pool.

Issue 1 - Can't find ETCD

It could be that Metron or Doppler cannot communicate with its Key-Value store ETCD.

  1. Look for the error message Failed to connect to etcd in the logs.
  2. Verify you can access ETCD.
  • Verify ETCD urls in the Metron config /var/vcap/jobs/metron_agent/config/metron_agent.json.
  • Try pinging ETCD to see if Doppler has advertised itself correctly.
# Old Doppler Endpoint
curl http://<your_etcd_ip>:<port/4001>/v2/keys/healthstatus/doppler?recursive=true

# New Doppler Endpoint
curl http://<your_etcd_ip>:<port/4001>/v2/keys/doppler/meta?recursive=true

The older endpoint will contain just the Doppler IP. The newer endpoint will contain json that may look like this.

{ "version": 1, "endpoints":["udp://<doppler_ip>:<port>", "tls://<doppler_ip:<port>"]}

Issue 2 - Mismatch ETCD keys

If you see values being populated in either of the endpoints then it means your Doppler and Metron can both see ETCD and read/write to it.

  • Look at the ETCD key that Doppler is advertising. It should have the following structure.

    # Old
    # New

    Compare each of these properties to the config within Metron - they should match.

    We have come across scenarios where Doppler was on a different zone and was advertising zone1 whereas Metron was configured with property "Zone": "zone2",.

    This makes Metron look for a different key and thus unable to find the Doppler IP and protocol.

Issue 3 - ETCD is in a weird state

We came across a situation where ETCD got into a weird state and its process needed to be restarted. The tracker story is here and should be resolved.

Basically killall etcd

