Fossil: Diff

Differences From Artifact [ad952bd8e2]:

File www/containers.md — part of check-in [19abf0ac13] at 2022-09-07 07:35:09 on branch trunk — Updated the "nojail" patch for our Dockerfile to track the recent changes: rename back from Dockerfile.in and the layer refactoring. It does essentially the same thing as before. (user: wyoung size: 42017)

To Artifact [e3ef13f6a1]:

File www/containers.md — part of check-in [457c14a490] at 2022-09-07 09:11:31 on branch trunk — Mentioned containerd+nerdctl in place of runc in the containers doc. A tightened-up version of the prior runc and crun sections are now collected below the Podman section. This gives a better flow: each successive option is smaller than the last, excepting only nspawn, which is a bit bigger than crun. (We leave nspawn last because we can't get it to work!) (user: wyoung size: 41248) [more...]

︙			︙
528 529 530 531 532 533 534 ~~535~~ 536 537 ~~538 539 540 541 542 543 544 545 546 547~~ 548 ~~549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575~~ 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 ~~700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739~~ 740 741 742 743 ~~744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763~~ 764 765 766 ~~767 768 769~~ 770 771 772 773 774 775 776	this idea to the rest of your site.) [DD]: https://www.docker.com/products/docker-desktop/ [DE]: https://docs.docker.com/engine/ [DNT]: ./server/debian/nginx.md ~~### 6.1 <a id="~~runc~~" name="containerd"></a>Stripping Docker Engine Down~~ The core of Docker Engine is its [`containerd`][ctrd] daemon and the [`runc`][runc] container runner. ~~It’s possible to dig in~~to the ~~subt~~ree ~~managed by `containerd` on the build host and extract what we need to~~ ~~run our Fossil container elsewhere with `runc`, leaving out all the~~ ~~rest. `runc` alone is about 18 MiB, and you can do without `containerd`~~ ~~entirely, if you want.~~ ~~The method isn’t complicated, but it is cryptic enough to want a shell~~ ~~script:~~ ~~----~~ ~~```shell~~ ~~#!/bin/sh~~ ~~c=fossil~~ ~~b=$HOME/containers/$c~~ ~~r=$b/rootfs~~ ~~m=/run/containerd/io.containerd.runtime.v2.task/moby~~ ~~if [ -d "$t" ] && mkdir -p $r~~ ~~then~~ ~~docker container start $c~~ ~~docker container export $c \| sudo tar -C $r -xf -~~ ~~id=$(docker inspect --format="{{.Id}}" $c)~~ ~~sudo cat $m/$id/config.json \|~~ ~~jq '.root.path = "'$r'"' \|~~ ~~jq '.linux.cgroupsPath = ""' \|~~ ~~jq 'del(.linux.sysctl)' \|~~ ~~jq 'del(.linux.namespaces[] \| select(.type == "network"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/hostname"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/resolv.conf"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/hosts"))' \|~~ ~~jq 'del(.hooks)' > $b/config.json~~ fi ~~```~~ ~~----~~ ~~The first several lines list configurables:~~ * `b`: the path of the exported container, called the “bundle” in OCI ~~jargon~~ * `c`: the name of the Docker container you’re bundling up for use ~~with `runc`~~ * `m`: the directory holding the running machines, configurable ~~because:~~ * it’s long * it’s been known to change from one version of Docker to the next * you might be using [Podman](#podman)/[`crun`](#crun), so it has ~~to be “`/run/user/$UID/crun`” instead~~ * `r`: the path of the directory containing the bundle’s root file ~~system.~~ ~~That last doesn’t have to be called `rootfs/`, and it doesn’t have to~~ ~~live in the same directory as `config.json`, but it is conventional.~~ ~~Because some OCI tools use those names as defaults, it’s best to follow~~ ~~suit.~~ ~~The rest is generic, but you’re welcome to freestyle here. We’ll show an~~ ~~example of this below.~~ ~~We’re using [jq] for two separate purposes:~~ ~~1. To automatically transmogrify Docker’s container configuration so it~~ ~~will work with `runc`:~~ * point it where we unpacked the container’s exported rootfs * accede to its wish to [manage cgroups by itself][ecg] * remove the `sysctl` calls that will break after… * …we remove the network namespace to allow Fossil’s TCP listening ~~port to be available on the host; `runc` doesn’t offer the~~ ~~equivalent of `docker create --publish`, and we can’t be~~ ~~bothered to set up a manual mapping from the host port into the~~ ~~container~~ * remove file bindings that point into the local runtime managed ~~directories; one of the things we give up by using a bare~~ ~~container runner is automatic management of these files~~ * remove the hooks for essentially the same reason ~~2. To make the Docker-managed machine-readable `config.json` more~~ ~~human-readable, in case there are other things you want changed in~~ ~~this version of the container. Exposing the `config.json` file like~~ ~~this means you don’t have to rebuild the container merely to change~~ ~~a value like a mount point, the kernel capability set, and so forth.~~ ~~<a id="why-sudo"></a>~~ ~~We have to do this transformation of `config.json` as the local root~~ ~~user because it isn’t readable by your normal user. Additionally, that~~ ~~input file is only available while the container is started, which is~~ ~~why we ensure that before exporting the container’s rootfs.~~ ~~With the container exported like this, you can start it as:~~ ~~```~~ ~~$ cd /path/to/bundle~~ ~~$ c=any-name-you-like~~ ~~$ sudo runc create $c~~ ~~$ sudo runc start $c~~ ~~$ sudo runc exec $c -t sh -l~~ ~~~ $ ls museum~~ ~~repo.fossil~~ ~~~ $ ps -eaf~~ ~~PID USER TIME COMMAND~~ ~~1 fossil 0:00 bin/fossil server --create …~~ ~~~ $ exit~~ ~~$ sudo runc kill fossil-runc~~ ~~$ sudo runc delete fossil-runc~~ ~~```~~ ~~If you’re doing this on the export host, the first command is “`cd $b`”~~ ~~if we’re using the variables from the shell script above. We do this~~ ~~because `runc` assumes you’re running it from the bundle directory. If~~ ~~you prefer, the `runc` commands that care about this take a~~ ~~`--bundle/-b` flag to let you avoid switching directories.~~ ~~The rest should be straightforward: create and start the container as~~ ~~root so the `chroot(2)` call inside the container will succeed, then get~~ ~~into it with a login shell and poke around to prove to ourselves that~~ ~~everything is working properly. It is. Yay!~~ ~~The remaining commands show shutting the container down and destroying~~ ~~it, simply to show how these commands change relative to using the~~ ~~Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not~~ ~~“rm.”~~ ~~If you want the bundle to run on a remote host, the local and remote~~ ~~bundle directories likely will not match, as the shell script above~~ ~~assumes. This is a more realistic shell script for that case:~~ ~~----~~ ~~```shell~~ ~~#!/bin/bash -ex~~ ~~c=fossil~~ ~~b=/var/lib/machines/$c~~ ~~h=my-host.example.com~~ ~~m=/run/containerd/io.containerd.runtime.v2.task/moby~~ ~~t=$(mktemp -d /tmp/$c-bundle.XXXXXX)~~ ~~if [ -d "$t" ]~~ ~~then~~ ~~docker~~ container s~~tart $c~~ ~~docker container export $c > $t/rootfs.tar~~ ~~id=$(docker inspect --format="{{.Id}}" $c)~~ ~~sudo cat $m/$id/config.json \|~~ ~~jq '.root.path = "'$b/rootfs'"' \|~~ ~~jq '.linux.cgroupsPath = ""' \|~~ ~~jq 'del(.linux.sysctl)' \|~~ ~~jq 'del(.linux.namespaces[] \| select(.type == "network"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/hostname"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/resolv.conf"))' \|~~ ~~jq 'del(.mounts[] \| select(.destination == "/etc/hosts"))' \|~~ ~~jq 'del(.hooks)' > $t/config.json~~ ~~scp -r $t $h:tmp~~ ~~ssh -t $h "{~~ ~~mv ./$t/config.json $b &&~~ ~~sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&~~ ~~rm -r ./$t~~ }" ~~rm -r $t~~ fi ~~```~~ ~~----~~ ~~We’ve introduced two new variables:~~ * `h`: the remote host name * `t`: a temporary bundle directory we populate locally, then ~~`scp` to the remote machine, where it’s unpacked~~ ~~We dropped the `r` variable because now we have two different~~ ~~“rootfs” types: the tarball and the unpacked version of that tarball.~~ ~~To avoid confusing ourselves between these cases, we’ve replaced uses of~~ ~~`$r` with explicit paths.~~ ~~You need to be aware that this script uses `sudo` for two different purposes:~~ ~~1. To read the local `config.json` file out of the `containerd` managed~~ ~~directory. ([Details above](#why-sudo).)~~ ~~2. To unpack the bundle onto the remote machine. If you try to get~~ ~~clever and unpack it locally, then `rsync` it to the remote host to~~ ~~avoid re-copying files that haven’t changed since the last update,~~ ~~you’ll find that it fails when it tries to copy device nodes, to~~ ~~create files owned only by the remote root user, and so forth. If the~~ ~~container bundle is small, it’s simpler to re-copy and unpack it~~ ~~fresh each time.~~ ~~I point that out because it might ask for your password twice: once for~~ ~~the local sudo command, and once for the remote.~~ ~~The default for the `b` variable is the convention for systemd based~~ ~~machines, which will play into the [`nspawn` alternative below][sdnsp].~~ ~~Even if you aren’t using `nspawn`, it’s a reasonable place to put~~ ~~containers under the [Linux FHS rules][LFHS].~~ [ctrd]: https://containerd.io/ [~~ecg~~]: https://github.com/~~open~~containers/r~~unc/pull/3131~~ ~~[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard~~ ~~[jq]: https://stedolan.github.io/jq/~~ ~~[sdnsp]: #nspawn~~ [runc]: https://github.com/opencontainers/runc ### 6.2 <a id="podman"></a>Podman ~~Although your humble author claims the `runc` methods above are not~~ ~~complicated, merely cryptic, you might be fondly recollecting the~~ ~~carefree commands at the top of this document, pondering whether you can~~ ~~live without the abstractions a proper container runtime system~~ ~~provides.~~ ~~More than that, there’s a hidden cost to the `runc` method: there is no~~ ~~layer sharing among containers. If you have multiple Fossil containers~~ ~~on a single host — perhaps because each serves an independent section of~~ ~~the overall web site — and you export them to a remote host using the~~ ~~shell script above, you’ll end up with redundant copies of the `rootfs`~~ ~~in each. A proper OCI container runtime knows they’re all derived from~~ ~~the same base image, differing only in minor configuration details,~~ ~~giving us one of the major advantages of containerization: if none of~~ ~~the running containers can change these immutable base layers, it~~ ~~doesn’t have to copy them.~~ ~~A lighter-weight alternative to Docker Engine that doesn’t give up so~~ ~~many of its administrator affordances is [Podman], initially created by~~ Red Hat and thus popular on that family of OSes, ~~although~~ it will run on any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] or [on Windows via WSL2][pmwin]. ~~On Ubuntu 22.04, it’s about a quarter the size of Docker Engine~~. T~~hat ~~isn’t nearly so slim as `runc`, but we may be willing to pay this~~ ~~overhead to get shorter and fewer commands.~~~~ Although Podman [bills itself][whatis] as a drop-in replacement for the `docker` command and everything that sits behind it, some of the tool’s design decisions affect how our Fossil containers run, as compared to using Docker. The most important of these is that, by default, Podman wants to run your container “rootless,” meaning that it runs as a regular user. This is generally better for security, but [we dealt with	\| \| < < < < \| < < \| < < < < < < < \| < < < < < < < < < < < < < < < < \| < \| < < < < < < < < < < < < < \| < < < < \| < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < \| < < < < \| < < < < \| < < < \| < < < < < < < < < < < < \| < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < \| \| < < < \| < < < < < \| < < < < < < < < < < \| < < \| \| < < >	528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576	this idea to the rest of your site.) [DD]: https://www.docker.com/products/docker-desktop/ [DE]: https://docs.docker.com/engine/ [DNT]: ./server/debian/nginx.md ### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down The core of Docker Engine is its [`containerd`][ctrd] daemon and the [`runc`][runc] container runner. Add to this the out-of-core CLI program [`nerdctl`][nerdctl] and you have enough of the engine to run Fossil containers. The big things you’re missing are: * BuildKit: The container build engine, which doesn’t matter if you’re building elsewhere and using a container registry as an intermediary between that build host and the deployment host. * SwarmKit: A powerful yet simple orchestrator for Docker that you probably aren’t using with Fossil anyway. In exchange, you get a runtime that’s about half the size of Docker Engine. The commands are essentially the same as above, but you say “`nerdctl`” instead of “`docker`”. You might alias one to the other, because you’re still going to be using Docker to build and ship your container images. [ctrd]: https://containerd.io/ [nerdctl]: https://github.com/containerd/nerdctl [runc]: https://github.com/opencontainers/runc ### 6.2 <a id="podman"></a>Podman A lighter-weight alternative to either of the prior options that doesn’t give up the image builder is [Podman]. Initially created by Red Hat and thus popular on that family of OSes, it will run on any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] or [on Windows via WSL2][pmwin]. On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half that of the “full” distribution of `nerdctl` and all its dependencies. Although Podman [bills itself][whatis] as a drop-in replacement for the `docker` command and everything that sits behind it, some of the tool’s design decisions affect how our Fossil containers run, as compared to using Docker. The most important of these is that, by default, Podman wants to run your container “rootless,” meaning that it runs as a regular user. This is generally better for security, but [we dealt with
︙			︙
815 816 817 818 819 820 821 ~~822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842~~ 843 844 845 ~~846 847 848 849 850~~ 851 852 853 854 855 856 857	around in. That shouldn’t be enough to let them break out of the container entirely, but they’ll have powerful tools like `wget`, and they’ll be connected to the network the container runs on. Once the bad guy is inside the house, he doesn’t necessarily have to go after the residents directly to cause problems for them. ~~#### 6.2.2 <a id="crun"></a>`crun`~~ ~~In the same way that [Docker Engine is based on `runc`](#runc), Podman’s~~ ~~engine is based on [`crun`][crun], a lighter-weight alternative to~~ ~~`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run~~ ~~the same container bundles as in my `runc` examples above.~~ ~~Above, we saved more than that by compressing the container’s Fossil~~ ~~executable with UPX!~~ ~~This makes `crun` a great option for tiny remote hosts with a single~~ ~~container, or at least where none of the containers share base layers,~~ ~~so that there is no effective cost to duplicating the immutable base~~ ~~layers of the containers’ source images.~~ ~~This suggests one method around the problem of rootless Podman containers:~~ ~~`sudo crun`, following the examples above.~~ ~~[crun]: https://github.com/containers/crun~~ #### 6.2.3 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container ##### Simple Method ~~As we saw above with `runc`, switching to `crun` just to get your~~ ~~containers to run as root loses a lot of functionality and requires a~~ ~~bunch of cryptic commands to get the same effect as a single command~~ ~~under Podman.~~ Fortunately, it’s easy enough to have it both ways. Simply run your `podman` commands as root: ``` $ sudo podman build -t fossil --cap-add MKNOD . $ sudo podman create \ --name fossil \	< < < < < < < < < < < < < < < < < < < < \| < < < < <	615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632	around in. That shouldn’t be enough to let them break out of the container entirely, but they’ll have powerful tools like `wget`, and they’ll be connected to the network the container runs on. Once the bad guy is inside the house, he doesn’t necessarily have to go after the residents directly to cause problems for them. #### 6.2.2 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container ##### Simple Method Fortunately, it’s easy enough to have it both ways. Simply run your `podman` commands as root: ``` $ sudo podman build -t fossil --cap-add MKNOD . $ sudo podman create \ --name fossil \
︙			︙
873 874 875 876 877 878 879 ~~880~~ 881 882 883 884 885 886 887	We have to do the build under `sudo` in part because we’re doing rootly things with the file system image layers we’re building up. Just because it’s done inside a container runtime’s build environment doesn’t mean we can get away without root privileges to do things like create the `/jail/dev/null` node. The other reason we need “`sudo podman build`” is because it puts the result ~~into root’s Podman image re~~posito~~ry, where the next steps look for it.~~ That in turn explains why we need “`sudo podman create`:” because it’s creating a container based on an image that was created by root. If you ran that step without `sudo`, it wouldn’t be able to find the image. If Docker is looking better and better to you as a result of all this, realize that it’s doing the same thing. It just hides it better by	\|	648 649 650 651 652 653 654 655 656 657 658 659 660 661 662	We have to do the build under `sudo` in part because we’re doing rootly things with the file system image layers we’re building up. Just because it’s done inside a container runtime’s build environment doesn’t mean we can get away without root privileges to do things like create the `/jail/dev/null` node. The other reason we need “`sudo podman build`” is because it puts the result into root’s Podman image registry, where the next steps look for it. That in turn explains why we need “`sudo podman create`:” because it’s creating a container based on an image that was created by root. If you ran that step without `sudo`, it wouldn’t be able to find the image. If Docker is looking better and better to you as a result of all this, realize that it’s doing the same thing. It just hides it better by
︙			︙
925 926 927 928 929 930 931 ~~932~~ 933 934 935 936 937 938 939 ~~940~~ 941 942 943 ~~944~~ 945 946 947 948 949 950 951	``` $ sudo podman create \ --any-options-you-like \ docker.io/mydockername/fossil ``` ~~This round-trip through the public image re~~posito~~ry has another side~~ benefit: your local system might be a lot faster than your remote one, as when the remote is a small VPS. Even with the overhead of schlepping container images across the Internet, it can be a net win in terms of build time. ~~### 6.3 <a id="nspawn"></a>`systemd-nspawn`~~ As of `systemd` version 242, its optional `nspawn` piece [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime) ~~~~now has~~ the ability to run OCI ~~container~~ bundles directly. You might~~ have it installed already, but if not, it’s only about 2 MiB. It’s in the `systemd-containers` package as of Ubuntu 22.04 LTS: ``` $ sudo apt install systemd-containers ```	\| > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > \| \|	700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930	``` $ sudo podman create \ --any-options-you-like \ docker.io/mydockername/fossil ``` This round-trip through the public image registry has another side benefit: your local system might be a lot faster than your remote one, as when the remote is a small VPS. Even with the overhead of schlepping container images across the Internet, it can be a net win in terms of build time. ### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners If even the Podman stack is too big for you, you still have options for running containers that are considerably slimmer, at a high cost to administration complexity and loss of features. Part of the OCI standard is the notion of a “bundle,” being a consistent way to present a pre-built and configured container to the runtime. Essentially, it consists of a directory containing a `config.json` file and a `rootfs/` subdirectory containing the root filesystem image. Many tools can produce these for you. We’ll show only one method in the first section below, then reuse that in the following sections. #### 6.3.1 <a id="runc"></a>`runc` We mentioned `runc` [above](#nerdctl), but it’s possible to use it standalone, without `containerd` or its CLI frontend `nerdctl`. You also lose the build engine, intelligent image layer sharing, image registry connections, and much more. The plus side is that `runc` alone is 18 MiB. Using it without all the support tooling isn’t complicated, but it is cryptic enough to want a shell script. Let’s say we want to build on our big desktop machine but ship the resulting container to a small remote host. This should serve: ---- ```shell #!/bin/bash -ex c=fossil b=/var/lib/machines/$c h=my-host.example.com m=/run/containerd/io.containerd.runtime.v2.task/moby t=$(mktemp -d /tmp/$c-bundle.XXXXXX) if [ -d "$t" ] then docker container start $c docker container export $c > $t/rootfs.tar id=$(docker inspect --format="{{.Id}}" $c) sudo cat $m/$id/config.json \ \| jq '.root.path = "'$b/rootfs'"' \| jq '.linux.cgroupsPath = ""' \| jq 'del(.linux.sysctl)' \| jq 'del(.linux.namespaces[] \| select(.type == "network"))' \| jq 'del(.mounts[] \| select(.destination == "/etc/hostname"))' \| jq 'del(.mounts[] \| select(.destination == "/etc/resolv.conf"))' \| jq 'del(.mounts[] \| select(.destination == "/etc/hosts"))' \| jq 'del(.hooks)' > $t/config.json scp -r $t $h:tmp ssh -t $h "{ mv ./$t/config.json $b && sudo tar -C $b/rootfs -xf ./$t/rootfs.tar && rm -r ./$t }" rm -r $t fi ``` ---- The first several lines list configurables: * `c`: the name of the Docker container you’re bundling up for use with `runc` * `b`: the path of the exported container, called the “bundle” in OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a reasonable choice under the [Linux FHS rules][LFHS] * `h`: the remote host name * `m`: the local directory holding the running machines, configurable because: * the path name is longer than we want to use inline * it’s been known to change from one version of Docker to the next * you might be building and testing with [Podman](#podman), so it has to be “`/run/user/$UID/crun`” instead * `t`: the temporary bundle directory we populate locally, then `scp` to the remote machine, where it’s unpacked [LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard ##### Why All That `sudo` Stuff? This script uses `sudo` for two different purposes: 1. To read the local `config.json` file out of the `containerd` managed directory, which is owned by `root` on Docker systems. Additionally, that input file is only available while the container is started, so we must ensure that before extracting it. 2. To unpack the bundle onto the remote machine. If you try to get clever and unpack it locally, then `rsync` it to the remote host to avoid re-copying files that haven’t changed since the last update, you’ll find that it fails when it tries to copy device nodes, to create files owned only by the remote root user, and so forth. If the container bundle is small, it’s simpler to re-copy and unpack it fresh each time. I point all this out because it might ask for your password twice: once for the local sudo command, and once for the remote. ##### Why All That `jq` Stuff? We’re using [jq] for two separate purposes: 1. To automatically transmogrify Docker’s container configuration so it will work with `runc`: * point it where we unpacked the container’s exported rootfs * accede to its wish to [manage cgroups by itself][ecg] * remove the `sysctl` calls that will break after… * …we remove the network namespace to allow Fossil’s TCP listening port to be available on the host; `runc` doesn’t offer the equivalent of `docker create --publish`, and we can’t be bothered to set up a manual mapping from the host port into the container * remove file bindings that point into the local runtime managed directories; one of the things we give up by using a bare container runner is automatic management of these files * remove the hooks for essentially the same reason 2. To make the Docker-managed machine-readable `config.json` more human-readable, in case there are other things you want changed in this version of the container. Exposing the `config.json` file like this means you don’t have to rebuild the container merely to change a value like a mount point, the kernel capability set, and so forth. ##### Running the Bundle With the container exported to a bundle like this, you can start it as: ``` $ cd /path/to/bundle $ c=fossil-runc ← …or anything else you prefer $ sudo runc create $c $ sudo runc start $c $ sudo runc exec $c -t sh -l ~ $ ls museum repo.fossil ~ $ ps -eaf PID USER TIME COMMAND 1 fossil 0:00 bin/fossil server --create … ~ $ exit $ sudo runc kill $c $ sudo runc delete $c ``` If you’re doing this on the export host, the first command is “`cd $b`” if we’re using the variables from the shell script above. Alternately, the `runc` subcommands that need to read the bundle files take a `--bundle/-b` flag to let you avoid switching directories. The rest should be straightforward: create and start the container as root so the `chroot(2)` call inside the container will succeed, then get into it with a login shell and poke around to prove to ourselves that everything is working properly. It is. Yay! The remaining commands show shutting the container down and destroying it, simply to show how these commands change relative to using the Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not “rm.” [ecg]: https://github.com/opencontainers/runc/pull/3131 [jq]: https://stedolan.github.io/jq/ ##### Lack of Layer Sharing The bundle export process collapses Docker’s union filesystem down to a single layer. Atop that, it makes all files mutable. All of this is fine for tiny remote hosts with a single container, or at least one where none of the containers share base layers. Where it becomes a problem is when you have multiple Fossil containers on a single host, since they all derive from the same base image. The full-featured container runtimes above will intelligently share these immutable base layers among the containers, storing only the differences in each individual container. More, when pulling images from a registry host, they’ll transfer only the layers you don’t have copies of locally, so you don’t have to burn bandwidth sending copies of Alpine and BusyBox each time, even though they’re unlikely to change from one build to the next. #### 6.3.2 <a id="crun"></a>`crun` In the same way that [Docker Engine is based on `runc`](#runc), Podman’s engine is based on [`crun`][crun], a lighter-weight alternative to `runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run the same container bundles as in my `runc` examples above. We saved more than that by compressing the container’s Fossil executable with UPX, making the runtime virtually free in this case. The only question is whether you can put up with its limitations, which are the same as for `runc`. [crun]: https://github.com/containers/crun #### 6.3.3 <a id="nspawn"></a>`systemd-nspawn` As of `systemd` version 242, its optional `nspawn` piece [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime) got the ability to run OCI bundles directly. You might have it installed already, but if not, it’s only about 2 MiB. It’s in the `systemd-containers` package as of Ubuntu 22.04 LTS: ``` $ sudo apt install systemd-containers ```
︙			︙
961 962 963 964 965 966 967 ~~968 969~~ 970 971 972 973 974 975 976 977 978 979 980 981	--machine=fossil \ --network-veth \ --port=127.0.0.1:127.0.0.1:9999:8080 $ sudo machinectl list No machines. ``` ~~This is why I wrote “reportedly” above: it ~~does~~n’t work on two different Linux distributions, and I can’t see why. I’m ~~putt~~ing this here to give~~ someone else a leg up, with the hope that they will work out what’s needed to get the container running and registered with `machinectl`. As of this writing, the tool expects an OCI container version of “1.0.0”. I had to edit this at the top of my `config.json` file to get the first command to read the bundle. The fact that it errored out when I had “`1.0.2-dev`” in there proves it’s reading the file, but it doesn’t seem able to make sense of what it finds there, and it doesn’t give any diagnostics to say why. <div style="height:50em" id="this-space-intentionally-left-blank"></div>	\| \|	940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960	--machine=fossil \ --network-veth \ --port=127.0.0.1:127.0.0.1:9999:8080 $ sudo machinectl list No machines. ``` This is why I wrote “reportedly” above: I couldn’t get it to work on two different Linux distributions, and I can’t see why. I’m leaving this here to give someone else a leg up, with the hope that they will work out what’s needed to get the container running and registered with `machinectl`. As of this writing, the tool expects an OCI container version of “1.0.0”. I had to edit this at the top of my `config.json` file to get the first command to read the bundle. The fact that it errored out when I had “`1.0.2-dev`” in there proves it’s reading the file, but it doesn’t seem able to make sense of what it finds there, and it doesn’t give any diagnostics to say why. <div style="height:50em" id="this-space-intentionally-left-blank"></div>