complement/internal
Eric Eastwood ec492656f8
Debug logs for port bindings when Docker deploy image fails (#761)
This is spawning from `bind: address already in use` failures we were seeing in CI and will hopefully give a better picture of what's happening.

Example `bind: address already in use` error:
```
Deploy: Deploy returned error Deploy: Failed to deploy image {Containers:-1 Created:1738100303 ID:sha256:4d01dfad2d62669c5784270111944528031170173ddc0b66d96f58ae700812d5 Labels:map[complement_blueprint:5_servers complement_context:sidecar_tests.5_servers.hs4 complement_hs_name:hs4 complement_pkg:sidecar_tests gitsha1:68e27e962c946074d6f9e18e98e55b3fe4f4c128 org.opencontainers.image.documentation:https://github.com/element-hq/synapse/blob/master/docker/README.md org.opencontainers.image.licenses:AGPL-3.0-or-later org.opencontainers.image.source:https://github.com/element-hq/synapse.git org.opencontainers.image.url:https://matrix.org/docs/projects/server/synapse] ParentID:sha256:2ae79e697d44ba986e3369e5a5197328e7b9638fa20102c9ca259e885ae71eea RepoDigests:[] RepoTags:[localhost/complement:sidecar_tests.5_servers.hs4] SharedSize:-1 Size:1400625929 VirtualSize:0} : Error response from daemon: driver failed programming external connectivity on endpoint complement_sidecar_tests_3_sideca
  r_tests.5_servers.hs4_2 (fc1bbb210acbcb5c13e64dc1336b36e914f4c8a5069db8d0ded1e12f0f26f10f): Error starting userland proxy: listen tcp4 127.0.0.1:33090: bind: address already in use
```


### Example logs that this PR will output

```sh
$ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestFederationRoomsInvite
...

============== While deploying complement_fed_1_fed.2_servers.hs2_2 : START ALL COMPLEMENT DOCKER PORT BINDINGS ==============
Container: 6c4665d6a12f063f0a0c0707fe826ac7bb1ef7642d9cc2b8189206c28aa30c12: [/complement_fed_1_fed.2_servers.hs1_1]
    (host) -> (container)
    0.0.0.0:33494 -> 8009/tcp
    0.0.0.0:33495 -> 8080/tcp
    127.0.0.1:33107 -> 8448/tcp
    127.0.0.1:33106 -> 8008/tcp
Container: 8cb4e7a3725d291a3bb6762d7ac2af4cf6f66be64b8947542f8d299e4a01b645: [/complement_fed_1_fed.2_servers.hs2_2]
    (host) -> (container)
    127.0.0.1:33105 -> 8448/tcp
    127.0.0.1:33104 -> 8008/tcp
    0.0.0.0:33492 -> 8009/tcp
    0.0.0.0:33493 -> 8080/tcp
Container: f3f21372d83bc31bd17ccd1636c1084d9187fad9bcb6ce987348d01576ee0406: [/complement_fed.2_servers.hs2]
    (host) -> (container)
Container: aaa75bed21d55804aead3ab5bb10695803bb5c0a1f72d9b892f99d02d2548659: [/complement_fed.2_servers.hs1]
    (host) -> (container)
=============== While deploying complement_fed_1_fed.2_servers.hs2_2 : END ALL COMPLEMENT DOCKER PORT BINDINGS ===============
```


### What's the root cause?

I suspect that we might be running into bugs with the Docker port allocation code since [Complement is creating the images in parallel](8f0c7639a3/internal/docker/deployer.go (L186-L195)) (I think).

Compounding the problem might be that we're exposing many more ports than usual in the out-of-repo Complement image that we're using and Complement is using [`PublishAllPorts: true`](8f0c7639a3/internal/docker/deployer.go (L372)) when deploying the image which will take all of our exposed ports and make port bindings to the host for each.

Bugs in this part of Docker is not uncommon, just [search for `"address already in use"` in the Docker codebase](https://github.com/moby/moby/issues?q=is%3Aissue%20state%3Aopen%20%22address%20already%20in%20use%22). For example, https://github.com/moby/moby/issues/48274 has some smoke.

---

The first layer of parallelism that we can control is multiple images within a single deploy which I think we can add a mutex around things to control.

Multiple test packages running in parallel (controlled by `-p n`) is a good second factor that is probably harder to figure out.

```sh
$ go help build
...

-p n
    the number of programs, such as build commands or
    test binaries, that can be run in parallel.
    The default is GOMAXPROCS, normally the number of CPUs available.
```
2025-02-11 09:27:40 -06:00
..
config Add COMPLEMENT_ENABLE_DIRTY_RUNS to vastly speed up test runs (#677) 2023-10-18 14:23:40 +01:00
data Drop attachment requirements for media download (#646) 2023-09-29 07:20:08 -04:00
docker Debug logs for port bindings when Docker deploy image fails (#761) 2025-02-11 09:27:40 -06:00
instruction Remove libolm dependency (#738) 2024-09-25 12:27:04 +01:00
web Use http.server.Close (not .Shutdown) for cleanup (#540) 2022-10-28 16:15:52 +01:00