Building a multi-container application with vSphere Integrated Containers v1.1.1

Running demo containers on vSphere Integrated Containers (aka VIC) is pretty exciting but eventually the time will come to get our hands dirty and build an actual application that needs to live on a virtual container host (aka VCH) in a resilient way. So in this post,  I’ll try to cover the steps in order to build an application with two layers (application/ui layer and a database layer) and how to workaround the challenges that VIC engine imposes. High level objectives are;

  • Select a proper application
  • Open the required ports on ESXi servers
  • Create and configure a VCH
  • Create docker-compose configuration file
  • Run the containers and test the application

The component versions that I use for this task is;

  • vSphere Integrated Containers: 1.1.1, build 56a309f
  • docker-compose: 1.13.0, build 1719ceb

Selection of a proper application:

The application that I’m going to deploy is an open source project on GitHub, called Gogs. Gogs is defined by the creators as a painless, self-hosted Git service. The goal of the project is to make the easiest, fastest, and most painless way of setting up a self-hosted Git service, written with Go. So this enables an independent binary distribution across all platforms that Go supports, including Linux, Mac OS X, Windows and even ARM.

There are many ways to install Gogs, such as installation from the source code, packages or with Vagrant but of course we will focus on installation as a Docker container. Normally, one container is more than enough to run Gogs but it also supports remote databases so we will take advantage of that to create a two-layered application.

Required ports on ESXi servers:

Before we start, I assume that VIC appliance is deployed and configured properly. Also we need a vic-machine in order to run vic commands and this is going to be a CentOS box in my environment. If we still don’t have the vic-machine binaries, we can easily get them with curl from the vic appliance. (sddcvic is my appliance and if using self-signed certificates, insert –insecure at the end of the curl command)

curl -L -O https://sddcvic:9443/vic_1.1.1.tar.gz /root --insecure
gzip -d /root/vic_1.1.1.tar.gz
tar -xvf vic_1.1.1.tar

ESXi hosts communicate with the VCHs through port 2377 via Serial Over LAN. For the deployment of a VCH to succeed, port 2377 must be open for outgoing connections on all ESXi hosts before you run vic-machine-xxx create to deploy a VCH. The vic-machine utility includes an update firewall command, that we can use to modify the firewall on a standalone ESXi host or all of the ESXi hosts in a cluster. This command will allow tcp/2377 outgoing connctions for all ESXi servers that exist under the cluster defined with compute-resource option.

./vic-machine-linux update firewall \
    --target=sddcvcs.domain.sddc \
    --user=orcunuso \
    --compute-resource=SDDC.pCluster \
    --thumbprint="3F:6E:2F:16:FA:76:53:74:18:3F:26:9D:1A:58:40:AD:E5:D8:3E:52" \
    --allow

Initially, we may not know the thumbprint of the vcenter server. The trick here is to run the command without thumbprint option, get it from the error message, add the option and re-run the command.

Deployment of the VCH:

Normally, a few options (such as name, target, user and tls support mode) will suffice. But in order to deploy a more customized VCH, there are many options that we can provide (here is the full list). Below is the command that I used to deploy mine.

./vic-machine-linux create \
    --name=vch02.domain.sddc \
    --target=sddcvcs.domain.sddc/SDDC.Datacenter \
    --thumbprint="3F:6E:2F:16:FA:76:53:74:18:3F:26:9D:1A:58:40:AD:E5:D8:3E:52" \
    --user=orcunuso \
    --compute-resource=SDDC.Container \
    --image-store=VMFS01/VCHPOOL/vch02 \
    --volume-store=VMFS01/VCHPOOL/vch02:default \
    --bridge-network=LSW10_Bridge \
    --public-network=LSW10_Mgmt \
    --client-network=LSW10_Mgmt \
    --management-network=LSW10_Mgmt \
    --dns-server=10.10.100.10 \
    --public-network-ip=10.10.100.22/24 \
    --public-network-gateway=10.10.100.1 \
    --registry-ca=/root/ssl/sddcvic.crt \
    --no-tls

The option –no-tls disables TLS authentication of connections between the docker clients and the VCH, so VCH use neither client nor server certificates. In this case, docker clients connect to the VCH via port 2375, instead of port 2376.

Disabling TLS authentication is not the recommended way of deploying VCH because thus, any docker client can connect to this VCH in an insecure manner. But in this practice, we will use docker-compose to build our application, and I’ve encountered many issues with TLS enabled VCH. It’s on my to-do list.

Creating docker-compose.yml configuration file:

Docker-compose is a tool for defining and running multi-container docker applications. With compose, we use a YAML file to configure our application’s services. Then, using a single command, we can create and start all the services from our configuration. First we need to provide the binary to run docker-compose if it is not in place. Simply run curl command to download from GitHub.

curl -L https://github.com/docker/compose/releases/download/1.13.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Now we need to create our configuration file (docker-compose.yml).

version: "3"
services:
  gogsapp:
    image: sddcvic.domain.sddc/gogs/gogs
    container_name: gogsapp
    restart: on-failure
    depends_on:
      - "gogsdb"
    volumes:
      - "gogsvol2:/data"
    ports:
      - "10080:3000"
      - "10022:22"
    networks:
      - "gogsnet"
  gogsdb:
    image: sddcvic.domain.sddc/library/postgres:9.6
    container_name: gogsdb
    environment:
      - POSTGRES_PASSWORD=gogsdbpassword
      - POSTGRES_USER=gogsuser
      - POSTGRES_DB=gogsdb
      - PGDATA=/var/lib/postgresql/data/data
    restart: on-failure
    volumes:
      - "gogsvol1:/var/lib/postgresql/data"
    networks:
      - "gogsnet"
networks:
  gogsnet:
    driver: bridge
volumes:
  gogsvol1:
    driver: vsphere
  gogsvol2:
    driver: vsphere

From the architectural point of view, we have two services that communicate via a bridge network that we call “gogsnet”. Gogs application requires two ports to be exposed to the real world, which are 22 and 3000. Postgres container accepts connections via port 5432 but that port does not have to be exposed because it is an internal communication.

To make this application more reliable, it is a good practice to use persistent volumes so whenever we recreate the containers from the images that are already pushed to the private registry, the data will not be lost and the application will resume as expected. For the sake of this purpose, we create two volumes, gogsvol1 and gogsvol2, and map these volumes to relevent directories. The default volume driver with VIC is vsphere so what is going to happen is that VIC will create two VMDK files in the locations that we provided with the –volume-store option during the VCH deployment phase and attach those VMDKs to the containers.

Normally, we are not supposed to specify an alternate location for postgres database files but in this case, VIC uses VMDK for the volumes and as it is a new volume, it will have a lost+found folder which causes postgres init scripts to quit with exit code 1, so the container exits as well. That is the reason why we use PGDATA environment variable and specify a subdirectory to contain the data.

At the end of the day, this is how it will look like:

Run and test the application:

Before running the app, let’s make sure that everything is in place. Our check list includes;

  • A functional registry server
  • Required images that are ready to be pulled from registry server (gogs and postgres)
  • A functional VCH with no-tls support
  • Docker-compose and yaml file

Now let’s build our application

docker-compose -f /root/gogs/docker-compose-vic.yml up -d

Excellent!!! Our containers are up and running. Let’s connect to our application via port tcp/10080 and make the initial configurations that needs to be done for the first-time run. We give the options that we specified as environment variables during the build process of postgres container.

And voila!!! Our two-layered, containerized, self-hosted git service is up and running on virtual container host backed by vSphere Integrated Containers registry service (aka Harbor). Enjoy 🙂

Push Windows image layers to Harbor registry

If you ever try to push Windows based images to a private registry, only the custom layers that you create will be pushed, not the initial layers that come with the base image. To demonstrate this phenomenon;

Pull microsoft/nanoserver from Docker Hub.

docker pull microsoft/nanoserver

Tag the base image appropriately (sddcvic is my private Harbor registry and windows is the name of the project that I created before)

docker tag microsoft/nanoserver sddcvic.domain.sddc/windows/nanoserver:3.0

Login to Harbor and push the image that we tagged

docker login -u admin -p password sddcvic.domain.sddc 
docker push sddcvic.domain.sddc/windows/nanoserver:3.0
docker logout sddcvic.domain.sddc

After the push command, the first two layers which are marked as foreign are skipped and not pushed to the registry. This is a common phenomenon, so the container registry service does not explicitly have to be Harbor, it could happen with any private registry service.

P.S. If you use Harbor as the image registry and your push operation errors out with a “blob unknown to registry” message, please refer to my previous post.

As of Docker Registry v2.5.0, a new version of docker image manifest (Image Manifest Version 2, Schema 2) was introduced. Whenever an image gets pushed to a repository, an image manifest is also uploaded that provides a configuration and a set of layers for the container image. One of the most important changes with this schema version is the introduction of foreign layers that are widely used by Windows based containers. As their nature, foreign layers cannot be pushed to any private registry other than the URLs that exist in their descriptor files. This is specifically necessary to support downloading Windows base layers from Microsoft servers, since only Microsoft is allowed to distribute them. This makes any image built on them require internet connectivity to download the bits which is not appropriate for most enterprise environments. Even if it’s allowed, the amount of data needs to be downloaded to the docker host caches every time from the internet will be huge (windows images does not have a good reputation about image sizes), hence it contradicts the main benefits of containers such as mobility and flexibility. Luckily, there is a trick to workaround this issue.

Disclaimer: This procedure might not be recommended or supported by Microsoft, Docker or VMware!!!

In order to make a layer “non-foreign“, we need to manipulate its descriptor file. If there are many images and layers cached on the docker host server, it would take an effort to find the right descriptor.json files. The descriptor files exist under the folder: C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\

First we need to get the hashes.

docker inspect --format "{{.RootFS.Layers}}" microsoft/nanoserver

The hashes that we get here are diff (layer created during the docker image build process) hashes that can be found exactly in diff files in the same folder as descriptor.json. The below commands will search diff hashes within all the diff files and return with the full path that will be useful to locate the descriptor files.

Select-String -Pattern "6c357baed9f5177e8c8fd1fa35b39266f329535ec8801385134790eb08d8787d" -Path "C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\diff"
Select-String -Pattern "0a051a1149b43239af90a8c11824a685a737c9417387caea392b8c8fee7e3889" -Path "C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\diff"

Now as we have the full path to the right descriptor files, we need to modify them. In our scenario, this is how a regular descriptor.json file will look like:

{
   "mediaType": "application/vnd.docker.image.rootfs.foreign.diff.tar.gzip",
   "size": 252691002,
   "digest": "sha256:bce2fbc256ea437a87dadac2f69aabd25bed4f56255549090056c1131fad0277",
   "urls": ["https://go.microsoft.com/fwlink/?linkid=837858"]
}

We need to change the mediaType as below and remove the urls field.

{
   "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
   "size": 252691002,
   "digest": "sha256:bce2fbc256ea437a87dadac2f69aabd25bed4f56255549090056c1131fad0277"
}

As a final step, restart docker service

Restart-Service docker

After all those steps, we are ready to push our own image that we have created just by tagging the official nanoserver image… and voila!!

The good side is, now we have the ability to push this image to any repository regardless of its location, including Harbor. The downside is, Microsoft tends to patch its base images every month, so we need to repeat this procedure and recreate our images occasionally.

How to verify that data is really pushed to Harbor

If you have a suspicious mind like me and cannot help yourself wondering if the data is “really” pushed to the repository, it would make no harm to verify and see that tha data is actually there. The requirement of this task is the hash of the image manifest that was uploaded with the layers which we can easily get from the docker push command output. In our case, that is 2261d13476c671ba182e117001f1bc6ff7c0aa188c8225e6fa5bf0cddebce561.

On Harbor, the blobs exist under /data/harbor/registry/docker/registry/v2/blobs/ directory but once again we have to find the right sub-directory which are based on (surprise!!) hashes. So we start with the hash of the image manifest and get the data inside. Do not forget to add the first two characters of the hash to the full path, as below.

cat /data/harbor/registry/docker/registry/v2/blobs/sha256/22/2261d13476c671ba182e117001f1bc6ff7c0aa188c8225e6fa5bf0cddebce561/data
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 774,
      "digest": "sha256:6c367cf4cb9815b10e47545dc9539ee4bd5cd0f8697d33f4d9cb1e1850546403"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 259260404,
         "digest": "sha256:35aba4b22d486db55a401eebdef3ffa69d00539497b83d9f4b62e3582cb4ced7"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 125427565,
         "digest": "sha256:e1870ee27293b87c78bf45e078f94acd29c44a0a7963c91932c5c31dfbeeb510"
      }
   ]

Now we have a little bit more information about the layers such as MIME type of the referenced object, size and the digest of the content. With the help of the digest values of the layers, we can locate and verify that the data is actually uploaded, consume storage on the disk as expected and ready to be pulled upon request.

root@sddcvic [ / ]# ls -lh /data/harbor/registry/docker/registry/v2/blobs/sha256/35/35aba4b22d486db55a401eebdef3ffa69d00539497b83d9f4b62e3582cb4ced7
total 248M
-rw-r--r-- 1 root root 248M May 31 10:04 data
root@sddcvic [ / ]# ls -lh /data/harbor/registry/docker/registry/v2/blobs/sha256/e1/e1870ee27293b87c78bf45e078f94acd29c44a0a7963c91932c5c31dfbeeb510
total 120M
-rw-r--r-- 1 root root 120M May 31 10:03 data

Now considering that everything is in place, I can have my cup of coffee and enjoy pulling and pushing Windows container images as a whole 🙂

Push windows container images to harbor registry

I’ve been pulling and pushing container images to harbor registry (that comes with vSphere Integrated Containers 1.1 as told here) for a while without any problem. Last week, I decided to play with native windows containers on Windows 2016 but failed to push the image that I built from a microsoft/nanoserver image.

[As of today, the official version that can be downloaded from My VMware Portal is v1.1.1 and this has been tested with harbor registry version 1.1.0 and 1.1.1. Please note that this might not be supported by VMware]

Here is what it looks like from the client when I tried to push the container image to the registry. All layers get pushed but when the client pushes the manifest, it fails.

manifest blob unknown: blob unknown to registry

What happens under the hood is, whenever a container image is being pushed to the registry, an image manifest is also uploaded that provides a configuration and a set of layers for the container image. The format of the manifest must be recognized by the container registry, if not, the registry returns an 404 error code and errors out the push operation.

As it can be clearly seen from the first screenshot, the initial two layers of microsoft nanoserver container image are defined as foreign layers that require a new image manifest schema (image manifest version 2, schema 2). There is one container running in the vic appliance that is responsible for the registry service, which is vmware/registry:photon-2.6.0 and in order to remediate this issue, we need to swap the registry container version to 2.6.1 which recognizes and accepts this new schema. To accomplish the task;


docker pull vmware/registry:2.6.1-photon
docker-compose -f docker-compose.notary.yml -f docker-compose.yml down
sed -i -- 's/photon-2.6.0/2.6.1-photon/g' /etc/vmware/harbor/docker-compose.yml
docker-compose -f docker-compose.notary.yml -f docker-compose.yml up -d

  1. First, let’s pull 2.6.1 tagged version of vmware/registry repository from docker hub.
  2. Stop all running containers gracefully.
  3. Modify the yml file in order to swap the container image to run.
  4. Start all running containers.

After we spin up the containers, we can verify that registry container is swapped with the newer one, with docker ps command. So it’s time to push our windows based container image to harbor again… and voila!

P.S. Foreign layers are still skipped and not pushed to the harbor registry because of their nature. This is another topic that I will touch on and propose a workaround to push them to our private registry, in the next post.

vSphere Integrated Containers 1.1

Many has changed since my last blog post about vSphere Integrated Containers. So I though it would be a good time to make a fresh start.

On April 2017, VMware released vSphere Integrated Containers v1.1 (aka VIC). VIC comprises of three components that can be installed with an unified OVA package:

  • VMware vSphere Integrated Containers Engine (aka VIC Engine), a container runtime for vSphere that allows developers who are familiar with Docker to develop in containers and deploy them alongside traditional VM-based workloads on vSphere clusters. vSphere adminitrators can manage these workloads by using vSphere in a way that is familiar.
  • VMware vSphere Integrated Containers Registry (aka Harbor), an enterprise-class container registry server that stores and distributes container images. vSphere Integrated Containers Registry extends the Docker Distribution open source project by adding the functionalities that an enterprise requires, such as security, identity and management.
  • VMware vSphere Integrated Containers Management Portal (aka Admiral), a container management portal that provides a UI for DevOps teams to provision and manage containers, including retrieving stats and info about container instances. Cloud administrators can manage container hosts and apply governance to their usage, including capacity quotas and approval workflows. When integrated with vRealize Automation, more advanced capabilities become available, such as deployment blueprints and enterprise-grade Containers-as-a-Service.

These are the key highlights with this new version, but when you go deeper, you figure out that there are many improvements under the hood which I plan to mention in the blog posts to come. So the key new features are:

  • A unified OVA installer for all three components
  • Official support for vSphere Integrated Containers Management Portal
  • A unified UI for vSphere Integrated Containers Registry and vSphere Integrated Containers Management Portal
  • A plug-in for the HTML5 vSphere Client
  • Support for Docker Client 1.13 and Docker API version 1.25
  • Support for using Notary with vSphere Integrated Containers Registry
  • Support for additional Docker commands. For the list of Docker commands that this release supports, see Supported Docker Commands in Developing Container Applications with vSphere Integrated Containers.

The installation process is as easy as deploying an OVA appliance so I do not tend to go deeper with the installation as I already touched briefly on my previous post, it’s pretty much the same and obvious. All the components installed after the deployment are running as containers just like the older version. The difference is, there are more containers running as more services are bundled in the appliance. To list running containers;

docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Command}}"

The applications running in the appliance are multi-container applications so VMware leveraged docker-compose in order to define and run those containers. Under /etc/vmware/harbor there are two yml files.

  • docker-compose.yml: Composes harbor components
  • docker-compose.notary.yml: Composes docker notary components

You can start and stop harbor and notary by running docker-compose up and docker-compose down commands.

cd /etc/vmware/harbor
docker-compose -f docker-compose.notary.yml -f docker-compose.yml down
docker-compose -f docker-compose.notary.yml -f docker-compose.yml up -d

The only container that is not managed by docker-compose is vic-admiral. This is a standalone container and can be started and stopped by standard docker commands individually.

Harbor, an enterprise class registry server

newharbor0

With the GA release of vSphere Integrated Containers (aka VIC) on December, VMware also announced an enterprise class registry server based on the open source Docker Distribution that allows us to store and distribute container images in our datacenter. While VIC Engine (actually the core component of VIC) is great for providing a container runtime for vSphere, enterprise customers governed by strict regulations require a more private solution in order to store and manage container images rather than the default cloud-based registry service from Docker. Docker has already a private solution, which is called Trusted Registry, but the lack of enterprise grade functionalities such as increased control and security, identity and management triggered this open source project, Project Harbor.

Project Harbor extends Docker Trusted Registry and adds the following features,

  • Role based access control: Users and repositories are well organized and a user can have different permissions for images under a project.
  • Policy based image replication: Images can be replicated (synchronized) between multiple registry instances.
  • LDAP/AD support: Harbor integrates with existing enterprise LDAP/AD for user authentication and management.
  • Image deletion & garbage collection: Images can be deleted and their space can be recycled.
  • Graphical user portal: Users can easily browse, search repositories and manage projects.
  • Auditing: All the operations to the repositories are tracked.
  • RESTful API: RESTful APIs for most administrative operations, easy to integrate with external systems.
  • Easy deployment: Provide both an online and offline installer. Besides, a virtual appliance for vSphere platform (OVA) is available.

It’s possible to download the binary from My VMware Portal which is required to install Harbor as a virtual appliance. If we are not eligible to reach the binary through the portal, we can always visit the GitHub page of the project for manual installation and the OVA file.

Let’s assume that we have decided to use the OVA format in the name of the ease of the deployment. As there are many options we need to provide, such as regular virtual appliance options (hostname, datastore, IP configuration) and application specific configurations (passwords, SMTP configuration and a few harbor specific options), the most important one is the one that configures the authentication method that Harbor will be configured with. It can be either LDAP authentication or database authentication but it cannot be modified after the installation, if required, we have to install a fresh instance. LDAP authentication is a good practice because it will save us from managing custom users within the database and will be more secure. Below are the options that we need to provide (please modify according to your domain)

  • Authentication mode: ldap_auth
  • LDAP URL: ldap://DomainController.demo.local
  • LDAP Search DN: CN=User2MakeQueries,OU=Users,DC=demo,DC=local
  • LDAP Search Password: Password of the above user
  • LDAP Base DN: OU=OrganizationUnitInWhichUsersWillBeQueried,DC=demo,DC=local
  • LDAP UID: The filter to query the users, such as uid, cn, email, sAMAccountName or any other attribute.

After the installation, it’s highly recommended to change the admin password of Harbor provided by us during the deployment phase because it will persist in the configuration file (/harbor/harbor/harbor.cfg) in plain text.

It will take a few minutes to complete the installation. If we set the “Permit Root Login” option to ‘true’ during the deployment phase, we can connect to the server via SSH with root credentials and begin to play around. The deployed operating system is Photon OS and the sub-components of Harbor are actually running as containers. When we run docker ps -a, all those running containers come to daylight. Harbor consists of six containers composed by docker-compose.

  • Proxy: This is a NGINX reverse-proxy. The proxy forwards requests from browsers and Docker clients to backend services such as the registry service and core services.
  • Registry: This registry service is based on Docker Registry 2.5 and is responsible for storing Docker images and processing Docker push/pull commands.
  • Core Services: Harbor’s core functions, which mainly provides the following services:
    • UI: A graphical user interface to help users manage images on the Registry
    • Webhook: Webhook is a mechanism configured in the Registry so that image status changes in the Registry can be populated to the Webhook endpoint of Harbor. Harbor uses webhook to update logs, initiate replications, and some other functions.
    • Token service: Responsible for issuing a token for every docker push/pull command according to a user’s role of a project. If there is no token in a request sent from a Docker client, the Registry will redirect the request to the token service.
  • Database: Derived from official MySQL image and is responsible for storing the metadata of projects, users, roles, replication policies and images.
  • Job Services: This service is responsible for image replication to other Harbor instances (if there are any).
  • Log Collector: This service is responsible for collecting logs of other modules in a single place.

And this is what it looks like from an architectural point of view;

All the blue boxes shown in the diagram are running as containers. If we would like to know more about those containers and how they are configured, we can always run docker inspect commands on them.


docker inspect nginx
docker inspect harbor-jobservice
docker inspect harbor-db
docker inspect registry
docker inspect harbor-ui
docker inspect harbor-log

As a result of the inspect commands, the thing that attracts my attention is the persistent volume configurations. By their nature, containers are immutable and disposable. So in order to keep the data and the service configurations persistent, Harbor takes advantage of volume mounts between the docker host and the containers. And it’s also useful to modify configuration of the services and to replace the ui certificate.

This is the list of the volume mounts (sources and destinations) used in all containers.

We can now enjoy and push images to our on-prem brand new registry server.

Introduction to PhotonOS

photonos-1

Since the cloud native landscape has been greatly embraced by the developers and the open source community, we witness an increasing momentum on container runtimes. The focus center is shifting from virtual machines to containers and running microservices in minimalistic operating systems is becoming mainstream (ok, not that fast but we will definitely be there). So the increasing popularity of minimal operating systems such as CoreOS, RancherOS, RedHat Atomic or even Windows Nano Server has forced VMware to build a lightweight operating system that is optimized to run containers, in order to support it’s cloud native strategy.

VMware introduced Photon OS as an open source project on April 2015 and released the first general available version (v 1.0) on June 2016. Briefly, Photon OS is a minimal linux container host designed to have a small footprint and optimized for VMware platforms. With the 1.0 release, the library of packages within Photos OS had been greatly expanded, making Photon OS more broadly applicable to a range of use-cases while keeping both the disk and memory footprints extremely small [kernel boot times are around 200ms with a 384MB memory footprint and 396MB on disk (minimal installation)].

Photos OS is compatible with container runtimes, such as Docker, rkt and Garden (Pivotal), and container scheduling frameworks, like Kubernetes. It contains a new, open source, yum compatible package manager (TDNF, Tiny Dandified Yum) that makes the system as small as possible, but preserves robust yum package management capabilities.

P.S. In conjunction with all the lightweight nature of Photon OS, we already start to witness some other use-cases rather than just running containers (such as virtual appliances owned by VMware). vCenter Server Appliance 6.5 is running on Photon OS and I expect that more virtual appliances will follow vCSA in the near future.

Now, let’s see how we can spin up a Photon OS based virtual machine. First thing we need to do is to download the binaries from GitHub. Photon OS is available in a few different pre-packaged, binary formats but two of them makes more sense to us:

  • ISO Image: The full ISO contains everything we need to install either the minimal or full installation of Photon OS. The bootable ISO has a manual installer or can be used with PXE/kickstart environments for automated installations.
  • OVA format: The OVA is a pre-installed minimal environment. OVA is a complete virtual machine definition, that’s why it exists in two different versions, hardware version 10 and 11. This will allow for compatibility with several versions of VMware platforms.

As deploying Photon OS via OVA format is a valid and an easy option, I will prefer the ISO method to go through with the steps already less in number. Now as step 2 (step 1 was to get the binaries), we need to create a fresh virtual machine. These are the values that I used:

  • Disk space: 8 GB
  • Compatibility: Hardware Level 11 (ESXi 6.0)
  • Guest OS version: Other 3.x or later Linux (64-bit)
  • vCPU: 1
  • vRAM: 512 MB

Now, mount the ISO to the CD-ROM drive, make sure that “Connect at power on” checkbox is selected, and power the VM on. The installation will be so straight-forward, we will pass through the welcome page, licensing agreement and disk formating sections. Probably the most important selection is the Photon OS type that we want to install.

photonos-2

Each install option provides a different runtime environment:

  • Photon Minimal: It is a very lightweight version of the container host runtime that is best suited for container management and hosting. There is sufficient packaging and functionality to allow most common operations around modifying existing containers, as well as being a highly performant and full-featured runtime.
  • Photon Full: Photon Full includes several additional packages to enhance the authoring and packaging of containerized applications and system customization. It’s better to use Photon Full for developing and packaging the application that will be run as a container, as well as authoring the container itself.
  • Photon OSTree Host: This installation profile creates a Photon OS instance that will source its packages from a central rpm-ostree server and continue to have the library and state of packages managed by the definition that is maintained on the central rpm-ostree server.
  • Photon OSTree Server: This installation profile will create the server instance that will host the filesystem tree and managed definitions for rpm-ostree managed hosts created with the Photon OSTree Host installation profile.

After installation type selection, we will be prompted for a hostname. Installation will come up with a randomly generated hostname, we can enter our own hostname right now or we can always modifiy after the installation with hostnamectl command. Lastly, we enter the root password and the installation starts. In my lab environment, it took 157 seconds to complete the full installation.

photonos-3

Voila, we have an up and running Photos OS server. Photon OS is pre-configured to set the ip address dynamically but if we don’t have a DHCP server in our environment, we have to configure it manually.

Network Configuration:

The network service, which is enabled by default, starts when the system boots. We manage the network service by using systemd commands, such as systemd-networkd, systemd-resolvd, and networkctl. The network configurations are based on .network files that are present at /etc/systemd/network/ and /usr/lib/systemd/network folder. By default, when Photon OS starts, it creates a DHCP network configuration file but we are free to add our own configuration files. Photon OS applies the configuration files in the alphabetical order specified by the file names. Once Photon OS matches an interface in a file, it ignores the interface if it appears in files processed later in the alphabetical order. So, to set a static IP address, we

  • create a configuration file with .network extension
  • place it in the /etc/systemd/network directory
  • set the file’s mode bits to 644
  • restart the systemd-networkd service

photonos-4

Please refer to this guide for further instructions on systemd.network service.

SSH Configuration:

The default iptables policy accepts SSH connections but the sshd configuration file on the full version of Photon OS is set to reject root login over SSH. To permit root login over SSH, we need to open /etc/ssh/sshd_config with the vim text editor, set PermitRootLogin to yes and restart the SSH daemon.

photonos-6

Enable Docker:

Among all these configuration steps, I have almost forgotten why we are doing this, so the ultimate objective is to run containers, right? Full version of Photon OS includes the open source version of Docker (v.1.12.1) but it is not started by default. So it requires us to start the daemon and enable it.

  • Start docker: systemctl start docker
  • Enable docker: systemctl enable docker

Now, we are ready to run docker commands and spin up some containers.

photonos-7

Interrupt Remapping and PSoD effects on HPE servers

[This post is also available in Turkish]

Yesterday, I received a critical customer advisory email from HPE which warns about LINT1/NMI error and an inevitable PSoD (purple screen of death) under certain circumstances. What I found interesting is, this advisory is not new, it was written at 2015 and has not been updated since. So what has triggered these successive updates in two days?

First, let’s touch the concept of Interrupt Remapping feature which has been implemented by Intel VT-d. According to Intel’s official documentation;

The interrupt-remapping architecture enables system software to control and censor external interrupt requests generated by all sources including those from interrupt controllers (I/OxAPICs), MSI/MSI-X capable devices including endpoints, root-ports and Root-Complex integrated end-points.

If required to translate for mortal human-beings, it simply enables more efficient IRQ routing and should improve overall performance. This feature is provided by the chipset vendor and ESXi/ESX 4.1 introduced support for interrupt remapping that is enabled by default. As everything seems to be great, it has some huge side effects if the hardware does not support that feature. In that case, VMware recommends disabling that feature by simply changing a kernel parameter.


esxcli system settings kernel set –-setting=iovDisableIR -v TRUE

Meanwhile, VMware strongly advices to modify this setting only if specific alert is observed in the vmkernel or messages log files:

ALERT: APIC: 1823: APICID 0x00000000 – ESR = 0x40

So, there is another side of the coin. HPE ProLiant servers do not have the reported issue with the BIOS as described in the VMware KB article. HPE BIOS for ProLiant servers supports vmkernel remapping of PCIe devices without any additional upgrades. But if you modify the iovDisableIR as TRUE (means that you disable interrupt mapping), you may encounter PSoD and HPE recommends re-enabling the feature:


esxcli system settings kernel set –-setting=iovDisableIR -v FALSE

iovdisableir

The important thing that I want to mention is, since the introduction of interrupt remapping support with ESXi, the default value of iovDisableIR parameter was FALSE and everything was fine with HPE ProLiant servers. But after the recent 2016-Q4 ESXi patches, the default value is switched to TRUE. I could not find any information about this modification in the release notes (ESXi 6.0 and ESXi 5.5) but the change is real and effects both ESXi 5.5 and 6.0 versions.

For this very reason, this is very crucial that we have to be sure what the runtime value of iovDisableIR kernel parameter is and if it is aligned with hardware vendor best-practices. Luckily, it is possible to check the value of all ESXi servers managed by a vCenter with a powercli function:


function Get-iovDisableIR {
$VMHosts = Get-VMHost | Sort-Object Name
foreach ($VMHost in $VMHosts) {
$esxcli = Get-EsxCli -VMHost $VMHost
$item = $esxcli.system.settings.kernel.list($false,"iovDisableIR")
Write-Host ("{0} ({1}) -> Def:{2}-Con:{3}-Run:{4}" -f $VMHost.Name,$VMHost.Build,$item.Default,$item.Configured,$item.Runtime)
}
}

The output of this function will display the ESXi hosts, their build numbers (in parenthesis) and Default, Configured, Runtime values of iovDisableIR kernel parameter.  The exemple below shows us that two of our hosts have not been patched yet and their default values are FALSE as expected. But the rest is patched and I had to modify the default value as FALSE.

iovdisableir2

To wrap up, if we run our hosts on HPE DL/ML/BL Gen8 servers and recently updated the vmkernel, in order not to encounter a PSoD, it is highly recommended to modify the iovDisableIR kernel parameter value to FALSE and reboot the server to enable it on runtime. I know that VMware and HPE is working on this issue but until they come up with a permenant solution, this is the workaround we need to apply. To modify the value from esxcli, we can refer to HPE customer advisory. To bulk modify the value, again, powercli is the best way to go;


function Disable-iovDisableIR {
$VMHosts = Get-VMHost | Sort-Object Name
foreach ($VMHost in $VMHosts) {
$esxcli = Get-EsxCli -VMHost $VMHost
$item = $esxcli.system.settings.kernel.set("iovDisableIR","FALSE")
Write-Host ("{0}: iovDisableIR is disabled, Please reboot the server" -f $VMHost.Name)
}
}