Installing CoreOS and Shipyard

I wanted to try out CoreOS, an OS that is optimized for Docker Containers.

Installing to disk

Most of the setup is covered in Installing CoreOS Container Linux to disk. Just download the ISO and then dd it to a USB stick:

$ sudo dd if=coreos_production_iso_image.iso of=/dev/sdc bs=1M status=progress

After the machine boots from the USB stick, it automatically logs in as the core user. I set the user’s password with the following command:

$ sudo passwd core

Since the machine was able to get a DHCP address, then I just ssh‘ed into the machine:

<> ssh core@192.168.1.230
Password:
Last login: Sun Jan 15 22:46:52 UTC 2017 on tty1
Container Linux by CoreOS stable (1235.6.0)
Update Strategy: No Reboots
Failed Units: 1
  tcsd.service
core@localhost ~ $

And I did the rest from there (I also became root sudo su -). First find your disk (mine was /dev/nvme0n1):

$ sudo fdisk -l
Disk /dev/loop0: 233.4 MiB, 244736000 bytes, 478000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x48d52bce

Device         Boot   Start     End Sectors  Size Id Type
/dev/nvme0n1p1 *       2048 1050623 1048576  512M  b W95 FAT32
/dev/nvme0n1p2      1050624 2099199 1048576  512M  b W95 FAT32
/dev/nvme0n1p3      2099200 3147775 1048576  512M  b W95 FAT32

Then you have to create a cloud-config.yaml file to configure your install. Here is the cloud-config I ended up creating:

localhost ~ # cat ~core/cloud-config.yaml
#cloud-config
hostname: core
coreos:
  update:
    reboot-strategy: etcd-lock
  etcd2:
    name: core
    initial-advertise-peer-urls: http://127.0.0.1:2380
    initial-cluster-token: core_etcd
    initial-cluster: core=http://127.0.0.1:2380
    initial-cluster-state: new
    listen-peer-urls: http://0.0.0.0:2380,http://0.0.0.0:7001
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    advertise-client-urls:  http://0.0.0.0:2379,http://0.0.0.0:4001
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: docker-tcp.socket
      command: start
      enable: true
      content: |
        [Unit]
        Description=Docker Socket for the API

        [Socket]
        ListenStream=2375
        BindIPv6Only=both
        Service=docker.service

        [Install]
        WantedBy=sockets.target
    - name: docker.service
      command: start
      drop-ins:
        - name: 50-insecure-registry.conf
          content: |
            [Unit]
            [Service]
            Environment=DOCKER_OPTS='--insecure-registry="0.0.0.0/0"'
    - name: flanneld.service
      command: start
      drop-ins:
      - name: 50-network-config.conf
        content: |
          [Service]
          ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: 00-eno1.network
      runtime: true
      content: |
        [Match]
        Name=eno1

        [Network]
        DNS=192.168.1.1
        DNS=192.168.56.1
        Address=192.168.1.106/24
        Gateway=192.168.1.1
        LinkLocalAddressing=no
        IPv6AcceptRA=no
write-files:
 - path: /etc/conf.d/nfs
   permissions: '0644'
   content: |
     OPTS_RPC_MOUNTD=""

users:
  - name: "elatov"
    passwd: "$6$pO1"
    groups:
      - "sudo"
      - "docker"
    ssh-authorized-keys:
      - "ssh-rsa TFnaJYPYKp elatov@me"

ssh_authorized_keys:
  - ssh-rsa TFnaJYPYKp elatov@me

I had some help from these web sites for the cloud-config file parameters:

And now for the install:

localhost ~ # coreos-install -d /dev/nvme0n1 -C stable -c ~core/cloud-config.yaml
2017/01/15 23:38:32 Checking availability of "local-file"
2017/01/15 23:38:32 Fetching user-data from datasource of type "local-file"
Downloading the signature for https://stable.release.core-os.net/amd64-usr/1235.6.0/coreos_production_image.bin.bz2...
2017-01-15 23:38:33 URL:https://stable.release.core-os.net/amd64-usr/1235.6.0/coreos_production_image.bin.bz2.sig [564/564] -> "/tmp/coreos-install.QCfpdhiZq4/coreos_production_image.bin.bz2.sig" [1]
Downloading, writing and verifying coreos_production_image.bin.bz2...
2017-01-15 23:39:05 URL:https://stable.release.core-os.net/amd64-usr/1235.6.0/coreos_production_image.bin.bz2 [276173809/276173809] -> "-" [1]
gpg: Signature made Tue Jan 10 05:48:31 2017 UTC
gpg:                using RSA key 48F9B96A2E16137F
gpg:                issuer "buildbot@coreos.com"
gpg: key 50E0885593D2DCB4 marked as ultimately trusted
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: Good signature from "CoreOS Buildbot (Offical Builds) <buildbot@coreos.com>" [ultimate]
Installing cloud-config...
Success! CoreOS stable 1235.6.0 is installed on /dev/nvme0n1

After a reboot, the machine will start up CoreOS and you can ssh using the keys that you defined in the cloud-config file.

Installing Shipyard

Note: It looks like the Shipyard Project is no longer maintained.

I then wanted to install Shipyard which can help with managing Docker Containers. Looking over the GitHub Project, it should be as easy as this:

curl -sSL https://shipyard-project.com/deploy | bash -s

But running that yieled the following error:

Error starting userland proxy: listen tcp 0.0.0.0:7001: bind: address already in use.

Then following the instructions laid out in Install on coreos issues (solved) #755 help out. I manually deployed the containers:

core ~ # docker run -ti -d --restart=always --name shipyard-swarm-manager swarm:latest manage --host tcp://0.0.0.0:3375 etcd://172.17.0.1:4001
cfc33357c008d94841e4470b580706203d6d54f6d7c8b3462370f18134587024

To make sure shipyard-swarm-manager is able to connect to etcd2, find it’s id:

core ~ # docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                            NAMES
cfc33357c008        swarm:latest        "/swarm manage --host"   4 seconds ago       Up 3 seconds        2375/tcp                         shipyard-swarm-manager
ca24bde9016b        rethinkdb           "rethinkdb --bind all"   38 minutes ago      Up 38 minutes       8080/tcp, 28015/tcp, 29015/tcp   shipyard-rethinkdb

Then check out the logs:

core ~ # docker logs cfc33357c008
INFO[0000] Initializing discovery without TLS
INFO[0000] Listening for HTTP                            addr=0.0.0.0:3375 proto=tcp

I figure out that I should use the IP that the docker0 interface is listening on:

macm ~ # ip -4 a s dev docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

Also you can login to one of the containers and make sure you can reach those end points:

macm ~ # docker exec -it ca24bde9016b /bin/sh
# curl -L http://172.17.0.1:2375
{"message":"page not found"}
# curl -L http://172.17.0.1:4001/v2/keys
{"action":"get","node":{"dir":true,"nodes":[{"key":"/coreos.com","dir":true,"modifiedIndex":4,"createdIndex":4},{"key":"/docker","dir":true,"modifiedIndex":1409,"createdIndex":1409}]}}

I had to change my cloud-config.yaml file and update it to change the config for etcd2, to listen on it’s docker0 interface (more on that below). So keep going and deploy the rest of the containers:

core ~ # docker run \
>     -ti \
>     -d \
>     --restart=always \
>     --name shipyard-swarm-agent \
>     swarm:latest \
>     join --addr 172.17.0.1:2375 etcd://172.17.0.1:2379
556eac9be28fbe5a5a41d13694c86a56fce65b5cb00e34056c4b4ae606ce5d49

Do the same thing and confirm the logs look good:

core ~ # docker logs 556eac9be28f
INFO[0000] Initializing discovery without TLS
INFO[0000] Registering on the discovery service every 1m0s...  addr=172.17.0.1:2375 discovery=etcd://172.17.0.1:2379

And finally run the controller:

core ~ # docker run \
>     -ti \
>     -d \
>     --restart=always \
>     --name shipyard-controller \
>     --link shipyard-rethinkdb:rethinkdb \
>     --link shipyard-swarm-manager:swarm \
>     -p 8080:8080 \
>     shipyard/shipyard:latest \
>     server \
>     -d tcp://swarm:3375
73252834a961c267e8e54492086f1e2a2df235548b67543730f1f63b127f5263

Now you can go to http://IP:8080 and login with admin/shipyard and you can see all the containers that are currently running:

sy-containers

And the available nodes (I only had one for now):

sy-nodes

Change the Cloud-Config And Update it

First copy the config that was used during the install:

core ~ # cp /var/lib/coreos-install/user_data /tmp/cc.yml

Then modify what you need:

core ~ # vi /tmp/cc.yml

Then confirm the config is okay:

core ~ # coreos-cloudinit -validate --from-file /tmp/cc.yml
2017/01/16 00:46:05 Checking availability of "local-file"
2017/01/16 00:46:05 Fetching user-data from datasource of type "local-file"

And lastly go ahead and apply the config:

core ~ # coreos-cloudinit --from-file /tmp/cc.yml
2017/01/16 00:46:28 Checking availability of "local-file"
2017/01/16 00:46:28 Fetching user-data from datasource of type "local-file"
2017/01/16 00:46:28 Fetching meta-data from datasource of type "local-file"
2017/01/16 00:46:28 Parsing user-data as cloud-config
2017/01/16 00:46:28 Merging cloud-config from meta-data and user-data
2017/01/16 00:46:28 Set hostname to core
2017/01/16 00:46:28 User 'elatov' exists, ignoring creation-time fields
2017/01/16 00:46:28 Setting 'elatov' user's password
2017/01/16 00:46:28 Authorizing 2 SSH keys for user 'elatov'
2017/01/16 00:46:28 Authorized SSH keys for core user
2017/01/16 00:46:28 Writing file to "/etc/conf.d/nfs"
2017/01/16 00:46:28 Wrote file to "/etc/conf.d/nfs"
2017/01/16 00:46:28 Wrote file /etc/conf.d/nfs to filesystem
2017/01/16 00:46:28 Writing file to "/etc/coreos/update.conf"
2017/01/16 00:46:28 Wrote file to "/etc/coreos/update.conf"
2017/01/16 00:46:28 Wrote file /etc/coreos/update.conf to filesystem
2017/01/16 00:46:28 Writing unit "docker-tcp.socket" to filesystem
2017/01/16 00:46:28 Writing file to "/etc/systemd/system/docker-tcp.socket"
2017/01/16 00:46:28 Wrote file to "/etc/systemd/system/docker-tcp.socket"
2017/01/16 00:46:28 Wrote unit "docker-tcp.socket"
2017/01/16 00:46:28 Enabling unit file "docker-tcp.socket"
2017/01/16 00:46:28 Enabled unit "docker-tcp.socket"
2017/01/16 00:46:28 Writing drop-in unit "50-insecure-registry.conf" to filesystem
2017/01/16 00:46:28 Writing file to "/etc/systemd/system/docker.service.d/50-insecure-registry.conf"
2017/01/16 00:46:28 Wrote file to "/etc/systemd/system/docker.service.d/50-insecure-registry.conf"
2017/01/16 00:46:28 Wrote drop-in unit "50-insecure-registry.conf"
2017/01/16 00:46:28 Writing drop-in unit "50-network-config.conf" to filesystem
2017/01/16 00:46:28 Writing file to "/etc/systemd/system/flanneld.service.d/50-network-config.conf"
2017/01/16 00:46:28 Wrote file to "/etc/systemd/system/flanneld.service.d/50-network-config.conf"
2017/01/16 00:46:28 Wrote drop-in unit "50-network-config.conf"
2017/01/16 00:46:28 Writing unit "00-eno1.network" to filesystem
2017/01/16 00:46:28 Writing file to "/run/systemd/network/00-eno1.network"
2017/01/16 00:46:28 Wrote file to "/run/systemd/network/00-eno1.network"
2017/01/16 00:46:28 Wrote unit "00-eno1.network"
2017/01/16 00:46:28 Ensuring runtime unit file "00-eno1.network" is unmasked
2017/01/16 00:46:28 /run/systemd/network/00-eno1.network is not null or empty, refusing to unmask
2017/01/16 00:46:28 Ensuring runtime unit file "etcd.service" is unmasked
2017/01/16 00:46:28 Ensuring runtime unit file "etcd2.service" is unmasked
2017/01/16 00:46:28 Ensuring runtime unit file "fleet.service" is unmasked
2017/01/16 00:46:28 Ensuring runtime unit file "locksmithd.service" is unmasked
2017/01/16 00:46:28 Ensuring runtime unit file "locksmithd.service" is unmasked
2017/01/16 00:46:28 Restarting systemd-networkd
2017/01/16 00:46:29 Restarted systemd-networkd (done)
2017/01/16 00:46:29 Calling unit command "start" on "docker-tcp.socket"'
2017/01/16 00:46:29 Result of "start" on "docker-tcp.socket": done
2017/01/16 00:46:29 Calling unit command "start" on "docker.service"'
2017/01/16 00:46:29 Result of "start" on "docker.service": done
2017/01/16 00:46:29 Calling unit command "start" on "flanneld.service"'
2017/01/16 00:46:29 Result of "start" on "flanneld.service": done
2017/01/16 00:46:29 Calling unit command "restart" on "locksmithd.service"'
2017/01/16 00:46:29 Result of "restart" on "locksmithd.service": done

After that make sure the services are listening on the right IPs:

macm ~ # ss -lnt
State      Recv-Q Send-Q      Local Address:Port  Peer Address:Port
LISTEN     0      128         172.17.0.1:2380                *:*
LISTEN     0      128         172.17.0.1:7001                *:*
LISTEN     0      128                 :::2375               :::*
LISTEN     0      128                 :::2379               :::*
LISTEN     0      128                 :::8080               :::*
LISTEN     0      128                 :::22                 :::*
LISTEN     0      128                 :::4001               :::*

Here are the config locations: Cloud-Config Locations. Since I was making changes to the etc2 service, I could confirm it’s running:

macm ~ # systemctl status etcd2
● etcd2.service - etcd2
   Loaded: loaded (/usr/lib/systemd/system/etcd2.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/etcd2.service.d
           └─20-cloudinit.conf
   Active: active (running) since Mon 2017-02-20 06:02:06 UTC; 24min ago
 Main PID: 2997 (etcd2)
    Tasks: 14
   Memory: 18.4M
      CPU: 5.639s
   CGroup: /system.slice/etcd2.service
           └─2997 /usr/bin/etcd2

Feb 20 06:02:06 macm etcd2[2997]: added member ce2a822cea30bfca [http://localhost:2380 http://localhost:7001] to cluster 7e27652122e8b2ae from
Feb 20 06:02:06 macm etcd2[2997]: set the cluster version to 2.3 from store
Feb 20 06:02:06 macm etcd2[2997]: starting server... [version: 2.3.7, cluster version: 2.3]
Feb 20 06:02:06 macm systemd[1]: Started etcd2.
Feb 20 06:02:07 macm etcd2[2997]: ce2a822cea30bfca is starting a new election at term 10
Feb 20 06:02:07 macm etcd2[2997]: ce2a822cea30bfca became candidate at term 11
Feb 20 06:02:07 macm etcd2[2997]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 11
Feb 20 06:02:07 macm etcd2[2997]: ce2a822cea30bfca became leader at term 11
Feb 20 06:02:07 macm etcd2[2997]: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 11
Feb 20 06:02:07 macm etcd2[2997]: published {Name:0e7ed956df4d4f599f5038340d14867a ClientURLs:[http://192.168.1.109:2379]} to cluster 7e276521

and also check the new etcd2 service config that it generates:

macm ~ # cat /run/systemd/system/etcd2.service.d/20-cloudinit.conf
[Service]
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.109:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.17.0.1:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://0.0.0.0:4001"
Environment="ETCD_LISTEN_PEER_URLS=http://172.17.0.1:2380,http://172.17.0.1:7001"

Disable etcd2 Service on CoreOS

If you want, you could also disable the etcd2 service and just install shipyard the automated way, which installs it’s own etcd version. This is discussed at etcd keeps getting started in place of etcd2 #3211. Here is relevant config to just unmask the service:

# cat /var/lib/coreos-install/user_data
#cloud-config
hostname: core
coreos:
  update:
    reboot-strategy: etcd-lock
  units:
    - name: etcd2.service
      mask: true
    - name: docker-tcp.socket
      command: start

You can then stop all the containers and reboot, to apply the config:

core ~ # docker stop $(docker ps -a -q)
73252834a961
556eac9be28f
cfc33357c008
ca24bde9016b
core ~ # reboot

Also as an FYI, you can use the update_engine_client command to check for update:

macm ~ # update_engine_client -check_for_update --status
I0224 05:45:46.781448  8135 update_engine_client.cc:237] Querying Update Engine status...
LAST_CHECKED_TIME=1487915080
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_IDLE
NEW_VERSION=0.0.0
NEW_SIZE=0