CoreOS Update

I recently updated from 1298.7.0 to 1353.6.0 and for some reason some of the bridge interfaces for my containers wouldn’t start up. I just saw the following interfaces:

core ~ # ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.0.106/24 brd 192.168.0.255 scope global enp1s0f0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    inet 10.2.37.0/16 scope global flannel.1
       valid_lft forever preferred_lft forever
6: br-4a79a183632a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.25.0.1/16 scope global br-4a79a183632a
       valid_lft forever preferred_lft forever
7: br-868395bd74c5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.19.0.1/16 scope global br-868395bd74c5
       valid_lft forever preferred_lft forever
8: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
9: br-44c028eed818: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.18.0.1/16 scope global br-44c028eed818

But before the update I had a lot more. Some interfaces just failed to come up.

Dedicated Bridges with Docker Compose

As I kept looking at the setup, I realized that for each container deployment that I did with a docker-compose, there was a dedicated bridge and docker network created:

core ~ # docker network ls
NETWORK ID          NAME                 DRIVER              SCOPE
749f7512f466        bridge               bridge              local
a60a4439c7b3        elk_default          bridge              local
aac3a00b7c01        host                 host                local
6df70ed8f399        mariadb_default      bridge              local
b2c32fd3ea00        none                 null                local
8ba94d526efb        shipyard_default     bridge              local
44c028eed818        splunk_default       bridge              local
868395bd74c5        watchtower_default   bridge              local
f6a4ffcb974a        zabbix_default       bridge              local

This is actually known behavior and is described in Compose network configuration not following docker0 network setup. You can specify the following in the docker-compose.yml config:

network_mode: bridge

and it will connect to the docker0 bridge if you like.

Flannel with Docker

Since I was already using CoreOS I knew that flannel comes pre-installed with it. Flannel is an overlay network that allows containers to talk to each other across multiple Docker Hosts. There is actually a nice diagram from their documentation:

flannel-packet-flow

What I liked about flannel is that it sits in front of docker and the containers don’t really need to know that they are using an overlay network. You can just configure the containers to use the docker0 (bridge network) interface and it will work without issues. If you are not using CoreOS the flannel setup is covered in the following sites:

  1. Multi-Host Networking Overlay with Flannel
  2. Configuring a flannel overlay network with VXLAN for Docker on IBM Power Systems servers

The jist of the setup is, first add the flannel network into etcd2:

$ etcdctl set /coreos.com/network/config '{"Network": "10.0.0.0/8", "SubnetLen": 20, "SubnetMin": "10.10.0.0","SubnetMax": "10.99.0.0","Backend": {"Type": "vxlan","VNI": 100,"Port": 8472}}'

Then confirm it’s set:

$ etcdctl get /coreos.com/network/config

Then start flanneld and specify your external/public IP:

$ nohup sudo flanneld --iface=192.168.205.10 &

That will create a flannel.# interface on the network that you specified. Then start the docker daemon and make sure it’s on the flannel subnet that you defined initially:

$ sudo service docker stop
$ source /run/flannel/subnet.env
$ sudo ifconfig docker0 ${FLANNEL_SUBNET}
$ sudo docker daemon --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} &

And then you can just start a container and it will be on the flannel network:

$ sudo docker run -d --name test1  busybox sh -c "while true; do sleep 3600; done"

Flannel with CoreOS

With CoreOS all we have to do is configure the cloud-config.yml file and apply the configuration. Here is the relevant config for flannel (this is covered in Configuring flannel for container networking):

flannel:
    interface: 192.168.0.106
etcd2:
    name: "core"
    #discovery: "https://discovery.etcd.io/<token>"
    advertise-client-urls: "http://192.168.0.106:2379"
    initial-advertise-peer-urls: "http://192.168.0.106:2380"
    listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001"
    listen-peer-urls: "http://192.168.0.106:2380,http://192.168.0.106:7001"
- name: etcd2.service
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Service]
            Restart=always

            [Install]
            WantedBy=multi-user.target
- name: docker.service
      command: start
      drop-ins:
        - name: 50-insecure-registry.conf
          content: |
            [Unit]
            [Service]
            Environment=DOCKER_OPTS='--insecure-registry="0.0.0.0/0"'

        - name: 60-docker-wait-for-flannel-config.conf
          content: |
            [Unit]
            After=flanneld.service
            Requires=flanneld.service

            [Service]
            Restart=always

            [Install]
            WantedBy=multi-user.target
- name: flanneld.service
      command: start
      drop-ins:
      - name: 50-network-config.conf
        content: |
          [Unit]
           After=etcd2.service
           Requires=etcd2.service

          [Service]
          ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'

          [Install]
           WantedBy=multi-user.target

Then to apply it we can just run this:

$ coreos-cloudinit -validate --from-file cloud-config.yaml
$ coreos-cloudinit --from-file cloud-config.yaml

And if you want it to persist a reboot, you can copy into the install location:

$ cp cloud-config.yaml /var/lib/coreos-install/user_data

After that’s done, you will see the flanneld daemon running:

core ~ # systemctl status flanneld -l --no-pager
● flanneld.service - flannel - Network fabric for containers (System Application Container)
   Loaded: loaded (/usr/lib/systemd/system/flanneld.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flanneld.service.d
           └─50-network-config.conf
   Active: active (running) since Wed 2017-04-26 11:07:33 MDT; 5h 1min ago
     Docs: https://github.com/coreos/flannel
  Process: 4683 ExecStop=/usr/bin/rkt stop --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=0/SUCCESS)
  Process: 4765 ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config {"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=0/SUCCESS)
  Process: 4734 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=0/SUCCESS)
  Process: 4731 ExecStartPre=/usr/bin/mkdir --parents /var/lib/coreos /run/flannel (code=exited, status=0/SUCCESS)
  Process: 4725 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)
 Main PID: 4775 (flanneld)
    Tasks: 15 (limit: 32768)
   Memory: 10.0M
      CPU: 3.959s
   CGroup: /system.slice/flanneld.service
           └─4775 /opt/bin/flanneld --ip-masq=true

Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.300976    4775 manager.go:149] Using interface with name enp1s0f0 and address 192.168.0.106
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.300993    4775 manager.go:166] Defaulting external address to interface address (192.168.0.106)
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.313707    4775 local_manager.go:134] Found lease (10.2.37.0/24) for current IP (192.168.0.106), reusing
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.319892    4775 ipmasq.go:47] Adding iptables rule: -s 10.2.0.0/16 -d 10.2.0.0/16 -j RETURN
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.321837    4775 ipmasq.go:47] Adding iptables rule: -s 10.2.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.323857    4775 ipmasq.go:47] Adding iptables rule: ! -s 10.2.0.0/16 -d 10.2.0.0/16 -j MASQUERADE
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.325706    4775 manager.go:250] Lease acquired: 10.2.37.0/24
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.325946    4775 network.go:58] Watching for L3 misses
Apr 26 11:07:33 core flannel-wrapper[4775]: I0426 17:07:33.325981    4775 network.go:66] Watching for new subnet leases
Apr 26 11:07:33 core systemd[1]: Started flannel - Network fabric for containers (System Application Container).

And the docker daemon will also start up on the new flannel network:

core ~ # systemctl status docker -l --no-pager                                                                               ● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─50-insecure-registry.conf, 60-docker-wait-for-flannel-config.conf
   Active: active (running) since Wed 2017-04-26 11:07:36 MDT; 5h 2min ago
     Docs: http://docs.docker.com
 Main PID: 4892 (dockerd)
    Tasks: 107
   Memory: 60.6M
      CPU: 14.329s
   CGroup: /system.slice/docker.service
           ├─ 4892 dockerd --host=fd:// --containerd=/var/run/docker/libcontainerd/docker-containerd.sock --insecure-registry=0.0.0.0/0 --bip=10.2.37.1/24 --mtu=1450 --ip-masq=false --selinux-enabled
           ├─ 5022 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 3306 -container-ip 10.2.37.9 -container-port
3306

Apr 26 11:07:36 core dockerd[4892]: time="2017-04-26T11:07:36.101420161-06:00" level=info msg="Loading containers: done."
Apr 26 11:07:36 core dockerd[4892]: time="2017-04-26T11:07:36.101509944-06:00" level=info msg="Daemon has completed initializ
ation"
Apr 26 11:07:36 core dockerd[4892]: time="2017-04-26T11:07:36.101537005-06:00" level=info msg="Docker daemon" commit=d5236f0
graphdriver=overlay version=1.12.6
Apr 26 11:07:36 core dockerd[4892]: time="2017-04-26T11:07:36.117838732-06:00" level=info msg="API listen on /var/run/docker.
sock"
Apr 26 11:07:36 core systemd[1]: Started Docker Application Container Engine.

And you will see the configuration pushed to etcd2:

core ~ # etcdctl get /coreos.com/network/config
{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}

And the config was created for docker to read:

core ~ # cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.2.0.0/16
FLANNEL_SUBNET=10.2.37.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
core ~ # cat /run/flannel/flannel_docker_opts.env
DOCKER_OPT_BIP="--bip=10.2.37.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=false"
DOCKER_OPT_MTU="--mtu=1450"

Testing out the Flannel Network

Now I can start up a container and make sure I can reach another container on the same bridge:

core ~ # docker run --rm -it alpine /bin/sh
/ # ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
59: eth0@if60: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    inet 10.2.37.13/24 scope global eth0
       valid_lft forever preferred_lft forever
/ # ping -c 1 10.2.37.5
PING 10.2.37.5 (10.2.37.5): 56 data bytes
64 bytes from 10.2.37.5: seq=0 ttl=64 time=0.125 ms

--- 10.2.37.5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.125/0.125/0.125 ms

And now I just have one bridge with mulitple containers connected to it:

# core ~ # brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.024238119fbf       no              veth0f54902
                                                        veth103c116
                                                        veth42c1fba
                                                        veth560dee2
                                                        veth6943b27
                                                        veth6f481f3
                                                        veth7472f10
                                                        veth7c445fb
                                                        vethb428e5d
                                                        vethf78b563
                                                        vethfb91608

One thing to note is that all the requests will come in from the docker0 interface or the default gateway. You can do another quick test to confirm:

core ~ # docker run --rm -it -e 12345 -p 12345:12345 alpine /bin/sh
/ # apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
v3.5.2-55-g4a95ad60e8 [http://dl-cdn.alpinelinux.org/alpine/v3.5/main]
v3.5.2-49-g2cff35f5fc [http://dl-cdn.alpinelinux.org/alpine/v3.5/community]
OK: 7964 distinct packages available
/ # apk add tcpdump
(1/2) Installing libpcap (1.7.4-r1)
(2/2) Installing tcpdump (4.9.0-r0)
Executing busybox-1.25.1-r0.trigger
OK: 5 MiB in 13 packages
/ # tcpdump -i eth0 tcp port 12345 -nne
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
22:31:56.539387 02:42:38:11:9f:bf > 02:42:0a:02:25:0d, ethertype IPv4 (0x0800), length 74: 10.2.37.1.54980 > 10.2.37.13.12345: Flags [S], seq 3436635296, win 29200, options [mss 1460,sackOK,TS val 7733117 ecr 0,nop,wscale 7], length 0
22:31:56.539425 02:42:0a:02:25:0d > 02:42:38:11:9f:bf, ethertype IPv4 (0x0800), length 54: 10.2.37.13.12345 > 10.2.37.1.54980: Flags [R.], seq 0, ack 3436635297, win 0, length 0

So if any converted containers were using the old bridge IPs they will have to be updated.

Converting docker-compose.yml to use the network bridge

To do the conversion, I just stopped all the containers:

# docker stop $(docker ps -a -q)

Then deleted the old networks:

# docker network rm elk_default mariadb_default shipyard_default splunk_default

Then added the following line to all the docker-compose.yml files:

network_mode: bridge

and lastly started them up:

# docker-compose up -d

Iptables Configuration

You will notice that for each port your expose an iptables rule is added to allow that port to reach the host (in the DOCKER chain):

core ~ # iptables -L -n -v
Chain INPUT (policy ACCEPT 1185 packets, 207K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 189 packets, 32822 bytes)
 pkts bytes target     prot opt in     out     source               destination
1154K  517M DOCKER-ISOLATION  all  --  *      *       0.0.0.0/0            0.0.0.0/0
 740K  155M DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
 151K   32M ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
 414K  363M ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
   46  2784 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 974 packets, 448K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain DOCKER (1 references)
 pkts bytes target     prot opt in     out     source               destination
 395K   52M ACCEPT     tcp  --  !docker0 docker0  0.0.0.0/0            10.2.37.9            tcp dpt:3306
    0     0 ACCEPT     tcp  --  !docker0 docker0  0.0.0.0/0            10.2.37.5            tcp dpt:9300
Chain DOCKER-ISOLATION (1 references)
 pkts bytes target     prot opt in     out     source               destination
1154K  517M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

And since I was also specifying external ports in my configuration, I saw DNATs (in the PREROUTING/DOCKER chain) and MASQUERADEs (in the POSTROUTING chain) added as well (to allow outbound and inbound traffic from the outside):

core ~ # iptables -L -n -v -t nat
Chain PREROUTING (policy ACCEPT 96 packets, 11911 bytes)
 pkts bytes target     prot opt in     out     source               destination
23111 1457K DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 94 packets, 11775 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 8 packets, 1496 bytes)
 pkts bytes target     prot opt in     out     source               destination
    2   120 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 8 packets, 1496 bytes)
 pkts bytes target     prot opt in     out     source               destination
  492  103K RETURN     all  --  *      *       10.2.0.0/16          10.2.0.0/16
  311 19896 MASQUERADE  all  --  *      *       10.2.0.0/16         !224.0.0.0/4
23095 1456K MASQUERADE  all  --  *      *      !10.2.0.0/16          10.2.0.0/16
    0     0 MASQUERADE  tcp  --  *      *       10.2.37.9            10.2.37.9            tcp dpt:3306
    0     0 MASQUERADE  tcp  --  *      *       10.2.37.5            10.2.37.5            tcp dpt:9300
Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
 4623  277K DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:3306 to:10.2.37.9:3306
    0     0 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9300 to:10.2.37.5:9300

CoreOS issue with docker network bridges

Later on as I was doing some research I ran into this CoreOS issue. And it looks like with the new CoreOS version (1353.6.0) there was a race condition with interface claiming between docker and the systemd network service. And the following modification:

[Match]
 Type=bridge
-Name=docker*
+Name=docker* br-*

In the the 50-docker.network file, would’ve probably fixed my original issue, but I was still happy to move my containers to flannel for future expansion. And it looks like version 1353.7.0 fixes that specific issue as well.