Docker 出现 51 年前创建的镜像之迷

执行 docker images 时,列出一个 51 年前创建的镜像:

$ sudo docker images
REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE
...
anjia0532/distroless.python3                    latest              1709d022036c        51 years ago        50.7MB

而且删不掉:

$ docker rmi 1709d022036c
Error: No such image: 1709d022036c

比较诡异,先用 strace 抓下执行 docker images 时的系统调用:

strace -of images_list docker images

对输出的结果做分析,先找到打印这行的系统调用:

151212 write(1, "anjia0532/distroless.python3", 28) = 28

这句输出是在一大堆 write 调用里,按程序实现逻辑来说,一般是从某个地方 read 了数据再 write 到标准输出的,来源可能是某个文件,也可能某是 socket,所以要分析上下文去找到最近的 read 调用:

151212 read(8,  <unfinished ...>
151215 <... nanosleep resumed>NULL)     = 0
151212 <... read resumed>"ull,\"ParentId\":\"\",\"RepoDigests\":"..., 4096) = 3190

read 是从句柄 8 上读数据的,不熟悉 read 参数的可以 man 2 read 一下。顺着继续往上找,看哪个系统调用后返回的句柄是 8,这里根据 strace 输出格式,我们可以用“ = 8”作为关键字去往上翻,找到:

151212 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 8
151212 connect(8, {sa_family=AF_UNIX, sun_path="/var/run/docker.sock"}, 23) = 0

句柄 8 是个 UNIX socket 连接,对应的是 /var/run/docker.sock 这个句柄文件,那说明是从 Docker 守护进程获取的数据。

为了省事,抓下 dockerd 的系统调用:

$ ps aux | fgrep dockerd # 找到 dockerd 的 PID
$ sudo strace -o /tmp/dockerd_trace -f -p 136391

再次执行 docker images,然后中断 strace。直接用镜像的 ID 号在 /tmp/dockerd_trace 中查找下:

$ fgrep 1709d022036c /tmp/dockerd_trace
136440 openat(AT_FDCWD, "/var/lib/docker/image/overlay2/imagedb/content/sha256/1709d022036c43292700cfb56e8228c45690dc22cc88029b212b9642060a9138", O_RDONLY|O_CLOEXEC) = 22
136440 openat(AT_FDCWD, "/var/lib/docker/image/overlay2/imagedb/content/sha256/1709d022036c43292700cfb56e8228c45690dc22cc88029b212b9642060a9138", O_RDONLY|O_CLOEXEC) = 22
136440 openat(AT_FDCWD, "/var/lib/docker/image/overlay2/imagedb/metadata/sha256/1709d022036c43292700cfb56e8228c45690dc22cc88029b212b9642060a9138/parent", O_RDONLY|O_CLOEXEC) = -1 ENOENT (没有那个文件或目录)

读取文件 /var/lib/docker/image/overlay2/imagedb/content/sha256/1709d022036c43292700cfb56e8228c45690dc22cc88029b212b9642060a9138,看下它内容:

{"architecture": "amd64", "author": "Bazel", "config": {"Entrypoint": ["/usr/bin/python3.5"], "Env": ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"]}, "created": "1970-01-01T00:00:00Z", "history": [{"author": "Bazel", "created": "1970-01-01T00:00:00Z", "created_by": "bazel build ..."}, {"author": "Bazel", "created": "1970-01-01T00:00:00Z", "created_by": "bazel build ..."}, {"author": "Bazel", "created": "1970-01-01T00:00:00Z", "created_by": "bazel build ..."}], "os": "linux", "rootfs": {"diff_ids": ["sha256:668afdbd44627e124b4e875a3aacf9efedf3aedd56f157af29559bcf693e6ba1", "sha256:6189abe095d53c1c9f2bfc8f50128ee876b9a5d10f9eda1564e5f5357d6ffe61", "sha256:3135e6239459e68f8b64f8fd6fa4138e5215fc8a4fcd773d4b9dffb74d5a9603"], "type": "layers"}}

注意 created 字段的值是 1970-01-01T00:00:00Z,因此问题出在元数据上。

至于为何删不掉,我注意到这句:

136440 openat(AT_FDCWD, "/var/lib/docker/image/overlay2/imagedb/metadata/sha256/1709d022036c43292700cfb56e8228c45690dc22cc88029b212b9642060a9138/parent", O_RDONLY|O_CLOEXEC) = -1 ENOENT (没有那个文件或目录)

猜测是因为这个存储元数据的路径不存在导致的,所以手动删目录,问题解决。