Locating Applications on Cloud Foundry Diego

September 28, 2018

Cloud Foundry deploys application containers on so-called Diego cells. Each Diego cell runs a number of application containers and exposes the applications through random ports on the Diego cell. This blog post shows some very useful debugging and analysis tricks for Diego. First, we determine the Cloud Foundry app belonging to a Diego container and second, we locate the containers for a specific application URL using the cfdot command line utility.

How to Find Which Apps Are Running in a Diego Cell

Identifying the application that is running in a specific container is very useful from time to time. It is particularly interesting if we have identified a certain container behaving in an unusual way - for example if the container might use more CPU or network than the average container in your deployment. To find out what's going on we need to identify the application that is running inside the container on the Diego cell. Now that we know the container we have obtained the IP address of the Diego cell and the port on which the container is running. This is all we need to determine the app that is running on the container.

Entering the Diego Cell

To determine the app running on the container we first need to open a SSH session on the Diego cell the container is running on. With a Pivotal Cloud Foundry deployment you may SSH into the OpsManager VM and from there continue using the BOSH CLI:

# find Diego cell name we are interested in
$ bosh instances | grep diego
Instance                                           Process    AZ   IPs           Load              CPU    CPU   CPU   CPU   Memory
                                                   State                         (1m, 5m, 15m)     Total  User  Sys   Wait  Usage
diego_cell/4cc18e61-b1d3-468d-8a72-de3334463ac4    running    AZ1  10.18.144.18  0.37, 0.20, 0.18  -      1.9%  2.3%  0.1%  21% (3.4 GB)
diego_cell/32cbae32-9558-43e8-ab57-efbac71639e2    running    AZ2  10.18.144.19  2.25, 2.33, 2.29  -      13.0% 12.7% 19.1% 27% (4.4 GB)
diego_cell/32e1ae65-ad0a-4232-965a-c8b1da62fa88    running    AZ1  10.18.144.11  0.23, 0.21, 0.23  -      2.6%  2.9%  0.0%  20% (3.3 GB)
diego_cell/5a6d1cf5-752d-4218-b53c-46e952a0da17    running    AZ3  10.18.144.23  0.31, 0.29, 0.28  -      3.4%  3.3%  0.0%  21% (3.5 GB)
...

# We identified unusually high wait times on the 2nd Diego cell.
# Let's open a SSH connection to the suspicious Diego cell:
$ bosh ssh diego_cell/32cbae32-9558-43e8-ab57-efbac71639e2

If you are unfamiliar with the CLI tools and how to authenticate with each of them make sure to read our blog post on authenticating Cloud Foundry CLI tools.

Using cfdot to Access the Diego Brain

Now that we are on a Diego cell we have access to the cfdot command line tool, which comes preinstalled on each Diego component. cfdot ships a lot of commands that are all useful to inspect (and even change!) the Diego state.

To see the containers running on the current cell you may use the following snippet:

# Determine cell IP address on the outbound interface
$ CELL_IP=$(ip route  get 1 | awk '{print $NF;exit}')
# Get the list of active long running processes on Diego
$ cfdot actual-lrp-groups | grep "\"$CELL_IP\"" | jq
{
  "instance": {
    "process_guid": "08c9d685-c3db-480c-a0cc-c3ebf8f37fa3-2f2103f5-bfe4-4652-90ae-209a5d5393b1",
    "index": 0,
    "domain": "cf-apps",
    "instance_guid": "a777bf30-522a-4aee-6bae-bcb6",
    "address": "10.18.144.19",
    "ports": [
      {
        "container_port": 8080,
        "host_port": 61003
      },
      {
        "container_port": 2222,
        "host_port": 61004
      }
    ],
    "instance_address": "10.255.154.167",
    "crash_count": 0,
    "state": "RUNNING",
    # ...
  }
}
# ...

And as the actual-lrp-groups outputs a single line per app container we can even grep for the specific port (and not use some complicated jq queries). The application GUID as known to the Cloud Foundry API is contained in the process_guid element, which we can cut out to finally identify the app (the remainder is the app version GUID):

$ CELL_IP=$(ip route  get 1 | awk '{print $NF;exit}')
$ cfdot actual-lrp-groups | grep $CELL_IP | grep <DIEGO_CELL_PORT> | jq -r '.instance.process_guid' | cut -c1-36
c5c223a0-8541-4b4f-88f9-43e513c612f1

To now access the app information you may use the CF API from a machine where you have the cf command line (not available on Diego components):

$ cf curl "/v3/apps/c5c223a0-8541-4b4f-88f9-43e513c612f1"
{
   "guid": "c5c223a0-8541-4b4f-88f9-43e513c612f1",
   "name": "awesome-application",
   "state": "STARTED",
   "created_at": "2018-09-23T09:09:20Z",
   "updated_at": "2018-09-23T14:52:58Z",
   # ...

How to Find Out on Which Diego Cells an App Is Running On

Given that we have the route of a Cloud Foundry application we sometimes want to know on which Diego cells and containers the application instances exactly run on. To do this, we may use the cfdot tool again by first querying the desired long-running processes (LRPs) for the application route and then finding the instances in the actual LRPs:

# find the route (for example "my-awesome-app.cfapps.io")
$ PGUIDS=$(cfdot desired-lrps | grep <APP_ROUTE> | jq -r '.process_guid')
$ echo $PGUIDS
1145ece5-3dcd-4580-a91c-89d57b26d579-842d8fb3-c29b-4157-bfe3-6a3f19a92439
5b127875-92e8-458c-b5ce-5b3773086c25-72e66556-f5f2-4547-9ed6-f18fd1ba8471

Of course, grepping may yield more process GUIDs than what we are looking for, so make sure that you are really grepping the URLs you are looking for. You may get fancy and write a jq select query instead of grepping to have a precise result. Once you have the process GUIDs in a variable, you can use them to locate the containers as follows:

$ cfdot actual-lrp-groups | grep "$PGUIDS" | jq '{address: .instance.instance_address, ports: .instance.ports}'
{
  "address": "10.255.172.142",
  "ports": [
    {
      "container_port": 8080,
      "host_port": 61027
    },
    {
      "container_port": 2222,
      "host_port": 61028
    }
  ]
}
# ...

Now that you know the Diego cell IP address and the ports the containers are exposed to you may connect to the Diego cell for further investigation of the application containers.

Conclusion

The cfdot command line tool is very powerful to debug the Diego deployment. As it is preinstalled and pre-configured (certificate environment variables are already in place :-) ) on Diego cells it is a really nice tool for advanced debugging any Cloud Foundry operator should know. If you did something cool using cfdot please let us know in the comments!

About the author: Fabian Keller

Loves cloud technologies, high code quality and craftsmanship. Passionate about woodworking when not facing a screen.