Push Windows image layers to Harbor registry

If you ever try to push Windows based images to a private registry, only the custom layers that you create will be pushed, not the initial layers that come with the base image. To demonstrate this phenomenon;

Pull microsoft/nanoserver from Docker Hub.

docker pull microsoft/nanoserver

Tag the base image appropriately (sddcvic is my private Harbor registry and windows is the name of the project that I created before)

docker tag microsoft/nanoserver sddcvic.domain.sddc/windows/nanoserver:3.0

Login to Harbor and push the image that we tagged

docker login -u admin -p password sddcvic.domain.sddc 
docker push sddcvic.domain.sddc/windows/nanoserver:3.0
docker logout sddcvic.domain.sddc

After the push command, the first two layers which are marked as foreign are skipped and not pushed to the registry. This is a common phenomenon, so the container registry service does not explicitly have to be Harbor, it could happen with any private registry service.

P.S. If you use Harbor as the image registry and your push operation errors out with a “blob unknown to registry” message, please refer to my previous post.

As of Docker Registry v2.5.0, a new version of docker image manifest (Image Manifest Version 2, Schema 2) was introduced. Whenever an image gets pushed to a repository, an image manifest is also uploaded that provides a configuration and a set of layers for the container image. One of the most important changes with this schema version is the introduction of foreign layers that are widely used by Windows based containers. As their nature, foreign layers cannot be pushed to any private registry other than the URLs that exist in their descriptor files. This is specifically necessary to support downloading Windows base layers from Microsoft servers, since only Microsoft is allowed to distribute them. This makes any image built on them require internet connectivity to download the bits which is not appropriate for most enterprise environments. Even if it’s allowed, the amount of data needs to be downloaded to the docker host caches every time from the internet will be huge (windows images does not have a good reputation about image sizes), hence it contradicts the main benefits of containers such as mobility and flexibility. Luckily, there is a trick to workaround this issue.

Disclaimer: This procedure might not be recommended or supported by Microsoft, Docker or VMware!!!

In order to make a layer “non-foreign“, we need to manipulate its descriptor file. If there are many images and layers cached on the docker host server, it would take an effort to find the right descriptor.json files. The descriptor files exist under the folder: C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\

First we need to get the hashes.

docker inspect --format "{{.RootFS.Layers}}" microsoft/nanoserver

The hashes that we get here are diff (layer created during the docker image build process) hashes that can be found exactly in diff files in the same folder as descriptor.json. The below commands will search diff hashes within all the diff files and return with the full path that will be useful to locate the descriptor files.

Select-String -Pattern "6c357baed9f5177e8c8fd1fa35b39266f329535ec8801385134790eb08d8787d" -Path "C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\diff"
Select-String -Pattern "0a051a1149b43239af90a8c11824a685a737c9417387caea392b8c8fee7e3889" -Path "C:\ProgramData\docker\image\windowsfilter\layerdb\sha256\*\diff"

Now as we have the full path to the right descriptor files, we need to modify them. In our scenario, this is how a regular descriptor.json file will look like:

{
   "mediaType": "application/vnd.docker.image.rootfs.foreign.diff.tar.gzip",
   "size": 252691002,
   "digest": "sha256:bce2fbc256ea437a87dadac2f69aabd25bed4f56255549090056c1131fad0277",
   "urls": ["https://go.microsoft.com/fwlink/?linkid=837858"]
}

We need to change the mediaType as below and remove the urls field.

{
   "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
   "size": 252691002,
   "digest": "sha256:bce2fbc256ea437a87dadac2f69aabd25bed4f56255549090056c1131fad0277"
}

As a final step, restart docker service

Restart-Service docker

After all those steps, we are ready to push our own image that we have created just by tagging the official nanoserver image… and voila!!

The good side is, now we have the ability to push this image to any repository regardless of its location, including Harbor. The downside is, Microsoft tends to patch its base images every month, so we need to repeat this procedure and recreate our images occasionally.

How to verify that data is really pushed to Harbor

If you have a suspicious mind like me and cannot help yourself wondering if the data is “really” pushed to the repository, it would make no harm to verify and see that tha data is actually there. The requirement of this task is the hash of the image manifest that was uploaded with the layers which we can easily get from the docker push command output. In our case, that is 2261d13476c671ba182e117001f1bc6ff7c0aa188c8225e6fa5bf0cddebce561.

On Harbor, the blobs exist under /data/harbor/registry/docker/registry/v2/blobs/ directory but once again we have to find the right sub-directory which are based on (surprise!!) hashes. So we start with the hash of the image manifest and get the data inside. Do not forget to add the first two characters of the hash to the full path, as below.

cat /data/harbor/registry/docker/registry/v2/blobs/sha256/22/2261d13476c671ba182e117001f1bc6ff7c0aa188c8225e6fa5bf0cddebce561/data
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 774,
      "digest": "sha256:6c367cf4cb9815b10e47545dc9539ee4bd5cd0f8697d33f4d9cb1e1850546403"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 259260404,
         "digest": "sha256:35aba4b22d486db55a401eebdef3ffa69d00539497b83d9f4b62e3582cb4ced7"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 125427565,
         "digest": "sha256:e1870ee27293b87c78bf45e078f94acd29c44a0a7963c91932c5c31dfbeeb510"
      }
   ]

Now we have a little bit more information about the layers such as MIME type of the referenced object, size and the digest of the content. With the help of the digest values of the layers, we can locate and verify that the data is actually uploaded, consume storage on the disk as expected and ready to be pulled upon request.

root@sddcvic [ / ]# ls -lh /data/harbor/registry/docker/registry/v2/blobs/sha256/35/35aba4b22d486db55a401eebdef3ffa69d00539497b83d9f4b62e3582cb4ced7
total 248M
-rw-r--r-- 1 root root 248M May 31 10:04 data
root@sddcvic [ / ]# ls -lh /data/harbor/registry/docker/registry/v2/blobs/sha256/e1/e1870ee27293b87c78bf45e078f94acd29c44a0a7963c91932c5c31dfbeeb510
total 120M
-rw-r--r-- 1 root root 120M May 31 10:03 data

Now considering that everything is in place, I can have my cup of coffee and enjoy pulling and pushing Windows container images as a whole 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s