Fossil: Diff

Differences From Artifact [901a4a2dfb]:

File www/containers.md — part of check-in [d778a02392] at 2023-03-24 08:07:53 on branch trunk — Dropped our canned /etc/os-release file entirely, recommending instead that those who need a VM-like container image switch the second stage from "scratch" to one of Google's "distroless" images, which provide that and more. That in turn gets rid of the need for the dummied up /usr/bin and /run, which simplifies the mainstream case. (user: wyoung size: 44813)

To Artifact [130f7243b0]:

File www/containers.md — part of check-in [79ac06a540] at 2023-03-27 04:59:27 on branch trunk — The container now uses BusyBox only in the build and setup stages, leaving just the static Fossil binary in the final stage, plus absolute necessities like a /tmp directory. This removes the justification for the custom BusyBox configuration, which then means we can use Alpine's busybox-static package in the second stage, saving a bunch of network I/O and build time. That in turn means we no longer have any justification for jailing the Fossil binary, since there's nothing extra left inside the container for it to play with. Doing this required bumping the Dockerfile syntax back up from 1.0 to 1.3 to get the "COPY --chmod" feature; tested it in Podman, which has had it [https://github.com/containers/buildah/issues/2961 | for two years now]. Doing all of this simplifies the Dockerfile and its documentation considerably. As a bonus, it builds quicker, and it's nearly a meg lighter in compressed image form. Especially for the case of using the container as a static "fossil" binary builder, this is nothing but win. (user: wyoung size: 43063)

︙			︙
28 29 30 31 32 33 34 ~~35 36 37 38 39 40 41~~ 42 43 44 45 46 47 48	This shows us remapping the internal TCP listening port as 9999 on the host. This feature of OCI runtimes means there’s little point to using the “`fossil server --port`” feature inside the container. We can let Fossil default to 8080 internally, then remap it to wherever we want it on the host instead. ~~For debugging the live container while it runs, you can get an interactive~~ ~~shell like so:~~ ~~```~~ ~~$ docker exec -it -u fossil fossil sh~~ ~~```~~ Our stock `Dockerfile` configures Fossil with the default feature set, so you may wish to modify the `Dockerfile` to add configuration options, add APK packages to support those options, and so forth. The Fossil `Makefile` provides two convenience targets, “`make container-image`” and “`make container-run`”. The first creates a versioned container image, and the second does that and then launches a	< < < < < < <	28 29 30 31 32 33 34 35 36 37 38 39 40 41	This shows us remapping the internal TCP listening port as 9999 on the host. This feature of OCI runtimes means there’s little point to using the “`fossil server --port`” feature inside the container. We can let Fossil default to 8080 internally, then remap it to wherever we want it on the host instead. Our stock `Dockerfile` configures Fossil with the default feature set, so you may wish to modify the `Dockerfile` to add configuration options, add APK packages to support those options, and so forth. The Fossil `Makefile` provides two convenience targets, “`make container-image`” and “`make container-run`”. The first creates a versioned container image, and the second does that and then launches a
︙			︙
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103	### <a id="repo-inside"></a> 2.1 Storing the Repo Inside the Container The simplest method is to stop the container if it was running, then say: ``` ~~$ docker cp /path/to/my-project.fossil fossil:~~/jail~~/museum/repo.fossil~~ $ docker start fossil ~~$ docker exec fossil chown -R 499 ~~/jail~~/museum~~ ``` That copies the local Fossil repo into the container where the server expects to find it, so that the “start” command causes it to serve from that copied-in file instead. Since it lives atop the immutable base layers, it persists as part of the container proper, surviving restarts.	\| \|	80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96	### <a id="repo-inside"></a> 2.1 Storing the Repo Inside the Container The simplest method is to stop the container if it was running, then say: ``` $ docker cp /path/to/my-project.fossil fossil:/museum/repo.fossil $ docker start fossil $ docker exec fossil chown -R 499 /museum ``` That copies the local Fossil repo into the container where the server expects to find it, so that the “start” command causes it to serve from that copied-in file instead. Since it lives atop the immutable base layers, it persists as part of the container proper, surviving restarts.
︙			︙
129 130 131 132 133 134 135 ~~136~~ 137 138 139 140 141 142 143	destroyed, too. The solution is to replace the “run” command above with the following: ``` $ docker run \ --publish 9999:8080 \ --name fossil-bind-mount \ ~~--volume ~/museum:~~/jail~~/museum \~~ fossil ``` Because this bind mount maps a host-side directory (`~/museum`) into the container, you don’t need to `docker cp` the repo into the container at all. It still expects to find the repository as `repo.fossil` under that directory, but now both the host and the container can see that repo DB.	\|	122 123 124 125 126 127 128 129 130 131 132 133 134 135 136	destroyed, too. The solution is to replace the “run” command above with the following: ``` $ docker run \ --publish 9999:8080 \ --name fossil-bind-mount \ --volume ~/museum:/museum \ fossil ``` Because this bind mount maps a host-side directory (`~/museum`) into the container, you don’t need to `docker cp` the repo into the container at all. It still expects to find the repository as `repo.fossil` under that directory, but now both the host and the container can see that repo DB.
︙			︙
157 158 159 160 161 162 163 ~~164~~ 165 166 167 168 169 170 171	You might be aware that OCI containers allow mapping a single file into the repository rather than a whole directory. Since Fossil repositories are specially-formatted SQLite databases, you might be wondering why we don’t say things like: ``` ~~--volume ~/museum/my-project.fossil:~~/jail~~/museum/repo.fossil~~ ``` That lets us have a convenient file name for the project outside the container while letting the configuration inside the container refer to the generic “`/museum/repo.fossil`” name. Why should we have to name the repo generically on the outside merely to placate the container?	\|	150 151 152 153 154 155 156 157 158 159 160 161 162 163 164	You might be aware that OCI containers allow mapping a single file into the repository rather than a whole directory. Since Fossil repositories are specially-formatted SQLite databases, you might be wondering why we don’t say things like: ``` --volume ~/museum/my-project.fossil:/museum/repo.fossil ``` That lets us have a convenient file name for the project outside the container while letting the configuration inside the container refer to the generic “`/museum/repo.fossil`” name. Why should we have to name the repo generically on the outside merely to placate the container?
︙			︙
186 187 188 189 190 191 192 ~~193~~ 194 ~~195 196~~ ~~197~~ ~~198~~ 199 ~~200~~ ~~201 202~~ 203 ~~204~~ ~~205~~ ~~206~~ ~~207~~ ~~208 209 210 211 212 213 214~~ ~~215 216~~ 217 ~~218 219 220 221~~ 222 ~~223~~ ~~224~~ ~~225 226~~ ~~227~~ ~~228~~ 229 ~~230~~ ~~231 232~~ ~~233 234~~ ~~235~~ ~~236 237~~ 238 239 240 241 ~~242~~ 243 244 245 246 247 248 249	[dbcorr]: https://www.sqlite.org/howtocorrupt.html#_deleting_a_hot_journal [wal]: https://www.sqlite.org/wal.html ## 3. <a id="security"></a>Security ~~### 3.1 <a id="chroot"></a>Why Chroot?~~ ~~~~A potentially surp~~ri~~sing~~ ~~fea~~t~~ure of~~ this container is th~~at it runs~~ ~~Fossil as root. Since that causes [the chroot~~ jail feature](./chroot.md)~~ ~~to ~~kick in, and~~ a Do~~cker~~ container ~~is a type of über-jail already, you~~~~ ~~may be wondering why we bother. Instead, why not either:~~ * run `fossil server --nojail` to skip the internal chroot; or * set “`USER fossil`” in the `Dockerfile` ~~so i~~t st~~arts Fossil as~~ ~~that user instead~~ ~~The reason is, although this container is quite stripped-down by today’s~~ ~~standards, it’s based on the [surprisingly powerful Busybox~~ ~~project](https://www.busybox.net/BusyBox.html). (This author made a~~ ~~living for years in the early 1990s using Unix systems that were less~~ ~~powerful than this container.) If someone ever figured out how to make a~~ Fossil bina~~ry execute~~ arb~~itrary commands on the host or to open up a~~ ~~remote~~ shell, the po~~wer available to them a~~t that point ~~would make it~~ ~~likely that they’d be able to island-hop from there into the rest of~~ ~~your network. That power is there for you as the system administrator~~ alo~~ne, to le~~t you i~~nspect th~~e container~~’s runtime behav~~io~~r, change~~ ~~things on the fly, and so forth. Fossil proper doesn’t need that power;~~ ~~if we take it away via this cute double-jail dance, we keep any~~ ~~potential attacker from making use of it should they ever get in.~~ ~~Having said this, know that we deem this risk low since a) it’s never~~ ~~happened, that we know of; and b) we haven’t enabled any of the risky~~ ~~features of Fossil such as [TH1 docs][th1docrisk]. Nevertheless, we~~ be~~liev~~e ~~def~~ense-in~~-dep~~th ~~str~~at~~egies are wise.~~ ~~If ~~you say someth~~ing ~~like “`docker exec~~ fossil ~~ps`” wh~~il~~e the syste~~m is~~ ~~idle, it’s likely to report a single `fossil` process running as `root`~~ ~~eve~~n th~~ough the chroo~~t ~~feature is~~ documented ~~as causing Fo~~ssi~~l to drop~~ ~~its privileges in favor of the owner of the repository database or its~~ ~~contain~~ing~~ folde~~r. If~~ the ~~repo~~ ~~file is owned by the in-conta~~ine~~r user~~~~ ~~“`fossil`”, why is the server still running as root?~~ ~~It’s~~ because ~~you’r~~e ~~see~~ing only the pare~~nt process, which assumes it’s~~ ~~running on bare metal or a VM and thus may need to do rootly things like~~ ~~listening on port 80 or 443 before forking off any children to handle~~ ~~HTTP~~ hits. Fo~~ssil’s chroot~~ feature ~~only takes effect in these~~ ch~~ild~~ ~~processes. This is why you can fix broken permissions with `chown`~~ ~~~~after t~~he container is ~~already running, wit~~hout ~~restart~~ing i~~t: each hi~~t~~ ~~reevaluat~~es the rep~~ository file permissions when deciding what user t~~o ~~become when dropping root privileges.~~ [th1docrisk]: https://fossil-scm.org/forum/forumpost/42e0c16544 ~~### 3.2 <a id="caps"></a>Dropping Unnecessary Capabilities~~ The example commands above create the container with [a default set of Linux kernel capabilities][defcap]. Although Docker strips away almost all of the traditional root capabilities by default, and Fossil doesn’t need any of those it does take away, Docker does leave some enabled that Fossil doesn’t actually need. You can tighten the scope of capabilities by adding “`--cap-drop`” options to your container creation commands.	\| \| \| > > > \| > > \| > > > > < > > \| \| > \| > > > \| > \| > \| > \| \| \| \| < \| \| > > \| < > > > > > > > > < < < \| > > > > > > > \| > > > \| > > \| < > > > \| > \| > > > \| > \| < > > > \| < > > \| > \| < > > > \|	179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289	[dbcorr]: https://www.sqlite.org/howtocorrupt.html#_deleting_a_hot_journal [wal]: https://www.sqlite.org/wal.html ## 3. <a id="security"></a>Security ### 3.1 <a id="chroot"></a>Why Not Chroot? Prior to 2023.03.26, the stock Fossil container made use of [the chroot jail feature](./chroot.md) in order to wall away the shell and other tools provided by [BusyBox](https://www.busybox.net/BusyBox.html). This author made a living for years in the early 1990s using Unix systems that offered less power, so there was a legitimate worry that if someone ever figured out how to get a shell on one of these Fossil containers, it would constitute a powerful island from which to attack the rest of the network. The thing is, Fossil is self-contained, needing none of that power in the main-line use cases. The only reason we included BusyBox in the container at all was on the off chance that someone needed it for debugging. That justification collapsed when we realized you could restore this basic shell environment on an as-needed basis with a one-line change to the `Dockerfile`, as we show in the next section. ### 3.2 <a id="run"></a>Swapping Out the Run Layer If you want a basic shell environment for temporary debugging of the running container, that’s easily added. Simply change this line in the `Dockerfile`… FROM scratch AS run …to this: FROM busybox AS run Rebuild, redeploy, and your Fossil container now has a BusyBox based shell environment that you can get into via: $ docker exec -it -u fossil $(make container-version) sh (That command assumes you built the container via “`make container`” and are therefore using its versioning scheme.) Another case where you might need to replace this bare-bones “`run`” layer with something more functional is that you’ve installed a [server extension](./serverext.wiki) and you need an interpreter for that script. The advice above won’t work except in the unlikely case that it’s written in one of the bare-bones script interpreters that BusyBox ships.(^BusyBox’s `/bin/sh` is based on the old 4.4BSD Lite Almquist shell, implementing little more than what POSIX specified in 1989, plus equally stripped-down versions of AWK and `sed`.) Let’s say the extension is written in Python. You could inject that into the stock container via one of “[distroless]” images. Because this will conflict with the bare-bones “`os`” layer we create, the method is more complicated. Essentially, you replace everything in STAGE 2 and 3 inside the `Dockerfile` with: FROM grc.io/distroless/python3-debian11 AS run ARG UID=499 RUN set -x \ && install -d -m 700 -o fossil -g fossil log museum \ && echo "fossil:x:${UID}:${UID}:User:/museum:/false" >> /etc/passwd \ && echo "fossil:x:${UID}:fossil" >> /etc/group COPY --from=builder /tmp/fossil /bin/ Another case is that you’re setting up [email alerts](./alerts.md) and need some way to integrate with the host’s [MTA]. There are a number of alternatives in that linked document, so for the sake of discussion, we’ll say you’ve chosen Method 2, which requires a Tcl interpreter to push messages into the outbound email queue DB, presumably bind-mounted into the container. As of this writing, Google offers no “distroless” container images for Tcl, but you could replace the `FROM` line above with: FROM alpine AS run RUN apk add --no-cache tcl Everything else remains the same as in the distroless Python example because even Alpine will conflict with the way we set up core Linux directories like `/etc` and `/tmp` in the absence of any OS image. Beware that there’s a limit to how much the über-jail nature of containers can save you when you go and provide a more capable OS layer like this. For instance, you might have enabled Fossil’s [risky TH1 docs feature][th1docrisk] along with the Tcl integration feature, which effectively gives anyone with check-in rights on your repo the ability to run arbitrary Tcl code on the host when that document is rendered. The container layer should stop that script from accessing any files out on the host that you haven’t explicitly mounted into the container’s namespace, but it can still make network connections, modify the repo DB inside the container, and who knows what else. [distroless]: https://github.com/GoogleContainerTools/distroless [MTA]: https://en.wikipedia.org/wiki/Message_transfer_agent [th1docrisk]: https://fossil-scm.org/forum/forumpost/42e0c16544 ### 3.3 <a id="caps"></a>Dropping Unnecessary Capabilities The example commands above create the container with [a default set of Linux kernel capabilities][defcap]. Although Docker strips away almost all of the traditional root capabilities by default, and Fossil doesn’t need any of those it does take away, Docker does leave some enabled that Fossil doesn’t actually need. You can tighten the scope of capabilities by adding “`--cap-drop`” options to your container creation commands.
︙			︙
258 259 260 261 262 263 264 ~~265~~ 266 267 268 269 270 271 272	* `CHOWN`: The Fossil server never even calls `chown(2)`, and our image build process sets up all file ownership properly, to the extent that this is possible under the limitations of our automation. Curiously, stripping this capability doesn’t affect your ability to ~~run commands like “`chown -R fossil:fossil ~~/jail~~/museum`” when~~ you’re using bind mounts or external volumes — as we recommend [above](#bind-mount) — because it’s the host OS’s kernel capabilities that affect the underlying `chown(2)` call in that case, not those of the container. If for some reason you did have to change file ownership of in-container files, it’s best to do that by changing the	\|	298 299 300 301 302 303 304 305 306 307 308 309 310 311 312	* `CHOWN`: The Fossil server never even calls `chown(2)`, and our image build process sets up all file ownership properly, to the extent that this is possible under the limitations of our automation. Curiously, stripping this capability doesn’t affect your ability to run commands like “`chown -R fossil:fossil /museum`” when you’re using bind mounts or external volumes — as we recommend [above](#bind-mount) — because it’s the host OS’s kernel capabilities that affect the underlying `chown(2)` call in that case, not those of the container. If for some reason you did have to change file ownership of in-container files, it’s best to do that by changing the
︙			︙
286 287 288 289 290 291 292 ~~293 294 295 296 297 298~~ 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 ~~316~~ 317 318 319 320 321 322 323	[backoffice], and then only for processes it created on earlier runs; it doesn’t need the ability to kill processes created by other users. You might wish for this ability as an administrator shelled into the container, but you can pass the “`docker exec --user`” option to run commands within your container as the legitimate owner of the process, removing the need for this capability. * `MKNOD`: ~~All~~ device nodes are created at build time and ~~are~~ never changed at run time. Re~~alize~~ that ~~the virtualized device nodes~~ ~~inside the container get mapped onto real devices on the host, so if~~ ~~an attacker ever got a root shell on the container, they might be~~ ~~able to do actual damage to the host if we didn’t preemptively strip~~ ~~this~~ capability ~~away~~. * `NET_BIND_SERVICE`: With containerized deployment, Fossil never needs the ability to bind the server to low-numbered TCP ports, not even if you’re running the server in production with TLS enabled and want the service bound to port 443. It’s perfectly fine to let the Fossil instance inside the container bind to its default port (8080) because you can rebind it on the host with the “`docker create --publish 443:8080`” option. It’s the container’s _host_ that needs this ability, not the container itself. (Even the container runtime might not need that capability if you’re [terminating TLS with a front-end proxy](./ssl.wiki#server). You’re more likely to say something like “`-p localhost:12345:8080`” and then configure the reverse proxy to translate external HTTPS calls into HTTP directed at this internal port 12345.) * `NET_RAW`: Fossil itself doesn’t use raw sockets, and our build ~~process leaves out all the Busybox utilities that require them.~~ Although that set includes common tools like `ping`, we foresee no compelling reason to use that or any of these other elided utilities — `ether-wake`, `netstat`, `traceroute`, and `udhcp` — inside the container. If you need to ping something, do it on the host. If we did not take this hard-line stance, an attacker that broke into the container and gained root privileges might use raw sockets	> > > \| \| < < < \| \|	326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363	[backoffice], and then only for processes it created on earlier runs; it doesn’t need the ability to kill processes created by other users. You might wish for this ability as an administrator shelled into the container, but you can pass the “`docker exec --user`” option to run commands within your container as the legitimate owner of the process, removing the need for this capability. * `MKNOD`: As of 2023.03.26, the stock container uses the runtime’s default `/dev` node tree. Prior to this, we had to create `/dev/null` and `/dev/urandom` inside [the chroot jail](#chroot), but even then, these device nodes were created at build time and were never changed at run time, so we didn’t need this run-time capability even then. * `NET_BIND_SERVICE`: With containerized deployment, Fossil never needs the ability to bind the server to low-numbered TCP ports, not even if you’re running the server in production with TLS enabled and want the service bound to port 443. It’s perfectly fine to let the Fossil instance inside the container bind to its default port (8080) because you can rebind it on the host with the “`docker create --publish 443:8080`” option. It’s the container’s _host_ that needs this ability, not the container itself. (Even the container runtime might not need that capability if you’re [terminating TLS with a front-end proxy](./ssl.wiki#server). You’re more likely to say something like “`-p localhost:12345:8080`” and then configure the reverse proxy to translate external HTTPS calls into HTTP directed at this internal port 12345.) * `NET_RAW`: Fossil itself doesn’t use raw sockets, and our build process leaves out all the BusyBox utilities that require them. Although that set includes common tools like `ping`, we foresee no compelling reason to use that or any of these other elided utilities — `ether-wake`, `netstat`, `traceroute`, and `udhcp` — inside the container. If you need to ping something, do it on the host. If we did not take this hard-line stance, an attacker that broke into the container and gained root privileges might use raw sockets
︙			︙
365 366 367 368 369 370 371 ~~372~~ 373 374 375 376 377 378 379	back-to-basics nature makes static builds work the way they used to, back in the day. If that’s all you’re after, you can do so as easily as this: ``` $ docker build -t fossil . $ docker create --name fossil-static-tmp fossil ~~$ docker cp fossil-static-tmp:~~/jail~~/bin/fossil .~~ $ docker container rm fossil-static-tmp ``` The resulting binary is the single largest file inside that container, at about 6 MiB. (It’s built stripped.) [lsl]: https://stackoverflow.com/questions/3430400/linux-static-linking-is-dead	\|	405 406 407 408 409 410 411 412 413 414 415 416 417 418 419	back-to-basics nature makes static builds work the way they used to, back in the day. If that’s all you’re after, you can do so as easily as this: ``` $ docker build -t fossil . $ docker create --name fossil-static-tmp fossil $ docker cp fossil-static-tmp:/bin/fossil . $ docker container rm fossil-static-tmp ``` The resulting binary is the single largest file inside that container, at about 6 MiB. (It’s built stripped.) [lsl]: https://stackoverflow.com/questions/3430400/linux-static-linking-is-dead
︙			︙
559 560 561 562 563 564 565 ~~566~~ 567 568 569 570 571 572 573 574 575 576 577 ~~578 579 580 581 582 583~~ 584 ~~585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605~~ 606 607 ~~608 609 610 611 612 613 614~~ 615 616 ~~617 618 619 620 621 622 623 624~~ 625 ~~626 627 628 629 630 631 632~~ 633 ~~634~~ 635 636 637 638 639 640 641 642 643 644 ~~645~~ 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723	[ctrd]: https://containerd.io/ [nerdctl]: https://github.com/containerd/nerdctl [runc]: https://github.com/opencontainers/runc ### 6.2 <a id="podman"></a>Podman ~~A lighter-weight alternative ~~to either of the prior options~~ that doesn’t~~ give up the image builder is [Podman]. Initially created by Red Hat and thus popular on that family of OSes, it will run on any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] or [on Windows via WSL2][pmwin]. On Ubuntu 22.04, the installation size is about 38 MiB, roughly a tenth the size of Docker Engine. Although Podman [bills itself][whatis] as a drop-in replacement for the `docker` command and everything that sits behind it, some of the tool’s design decisions affect how our Fossil containers run, as compared to using Docker. ~~The most important of these is that, by default, Podman~~ ~~wants to run your container “rootless,” meaning that it runs as a~~ ~~regular user. This is generally better for security, but [we dealt with~~ ~~that risk differently above](#chroot) already. Since neither choice is~~ ~~unassailably correct in all conditions, we’ll document both options~~ ~~here.~~ ~~[pmmac]: https://podman.io/getting-started/installation.html#macos~~ ~~[pmwin]: https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md~~ [Podman~~]: https://podm~~an~~.io/~~ ~~[whatis]: https://podman.io/whatis.html~~ ~~#### 6.2.1 <a id="podman-rootless"></a>Fossil in a Rootless Podman Container~~ If ~~you~~ build the ~~stock Fossil~~ container ~~und~~er ~~`podman`, it will fail~~ at ~~two key steps:~~ ~~1. The `mknod` calls in the second stage, which create the `/jail/dev`~~ ~~nodes. For a rootless container, we want it to use the “real” `/dev`~~ ~~tree mounted into the container’s root filesystem instead.~~ ~~2. Anything that depends on the `/jail` directory and the fact that it~~ ~~becomes the file system’s root once the Fossil server is up and running.~~ ~~[The change~~s ~~to fix this](/file/~~containers~~/Dockerf~~ile-no~~jail.patch)~~ ~~are~~n’t c~~omplic~~ate~~d. Simply~~ a~~pply~~ that ~~patch~~ to ~~our stock `Dockerfile`~~ ~~and~~ r~~ebuild:~~ ``` $ pa~~tch -p0 < conta~~in~~ers/Dockerfile-nojail.patch~~ $ podman build -t fossil~~:nojail~~ . $ podman ~~cre~~ate \ ~~--name fossil-nojail \~~ ~~--publish 127.0.0.1:9999:8080 \~~ ~~--volume ~/museum:/museum \~~ fossil:~~noj~~ail ``` ~~Do realize that by doing this, if an attacker ever managed to get shell~~ ~~access on your container, they’d have a BusyBox installation to play~~ ~~around in. That shouldn’t be enough to let them break out of the~~ ~~container entirely, but they’ll have powerful tools like `wget`, and~~ ~~they’ll be connected to the network the container runs on. Once the bad~~ ~~guy is inside the house, he doesn’t necessarily have to go after the~~ ~~residents directly to cause problems for them.~~ ~~#### 6.2.2 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container~~ ~~##### Simple Method~~ ~~Fortunately, it’s easy enough to have it both ways. Simply run your~~ ~~`podman` commands as root:~~ ``` ~~$ sudo podman build -t fossil --cap-add MKNOD .~~ $ sudo podman create \ --name fossil \ --cap-drop CHOWN \ --cap-drop FSETID \ --cap-drop KILL \ --cap-drop NET_BIND_SERVICE \ --cap-drop SETFCAP \ --cap-drop SETPCAP \ --publish 127.0.0.1:9999:8080 \ localhost/fossil ~~$ ~~sudo~~ podman start fossil~~ ``` ~~It’s obvious why we have to start the container as root, but why create~~ ~~and build it as root, too? Isn’t that a regression from the modern~~ ~~practice of doing as much as possible with a normal user?~~ ~~We have to do the build under `sudo` in part because we’re doing rootly~~ ~~things with the file system image layers we’re building up. Just because~~ ~~it’s done inside a container runtime’s build environment doesn’t mean we~~ ~~can get away without root privileges to do things like create the~~ ~~`/jail/dev/null` node.~~ ~~The other reason we need “`sudo podman build`” is because it puts the result~~ ~~into root’s Podman image registry, where the next steps look for it.~~ ~~That in turn explains why we need “`sudo podman create`:” because it’s~~ ~~creating a container based on an image that was created by root. If you~~ ~~ran that step without `sudo`, it wouldn’t be able to find the image.~~ ~~If Docker is looking better and better to you as a result of all this,~~ ~~realize that it’s doing the same thing. It just hides it better by~~ ~~creating the `docker` group, so that when your user gets added to that~~ ~~group, you get silent root privilege escalation on your build machine.~~ ~~This is why Podman defaults to rootless containers. If you can get away~~ ~~with it, it’s a better way to work. We would not be recommending~~ ~~running `podman` under `sudo` if it didn’t buy us [something we wanted~~ ~~badly](#chroot).~~ ~~Notice that we had to add the ability to run `mknod(8)` during the~~ ~~build. [Podman sensibly denies this by default][nomknod], which lets us~~ ~~leave off the corresponding `--cap-drop` option. Podman also denies~~ ~~`CAP_NET_RAW` and `CAP_AUDIT_WRITE` by default, which we don’t need, so~~ ~~we’ve simply removed them from the `--cap-drop` list relative to the~~ ~~commands for Docker above.~~ [no~~mknod~~]: https://github.com/containers/podman/issu~~es/15626~~ ~~##### <a id="pm-root-workaround"></a>Building Under Docker, Running Under Podman~~ ~~If you have a remote host where the Fossil instance needs to run, it’s~~ ~~possible to get around this need to build the image as root on the~~ ~~remote system. You still have to build as root on the local system, but~~ ~~as I said above, Docker already does this. What we’re doing is shifting~~ ~~the risk of running as root from the public host to the local one.~~ ~~Once you have the image built on the local machine, create a “`fossil`”~~ ~~repository on your container repository of choice such as [Docker~~ ~~Hub](https://hub.docker.com), then say:~~ ~~```~~ ~~$ docker login~~ ~~$ docker tag fossil:latest mydockername/fossil:latest~~ ~~$ docker image push mydockername/fossil:latest~~ ~~```~~ ~~That will push the image up to your account, so that you can then switch~~ ~~to the remote machine and say:~~ ~~```~~ ~~$ sudo podman create \~~ ~~--any-options-you-like \~~ ~~docker.io/mydockername/fossil~~ ~~```~~ ~~This round-trip through the public image registry has another side~~ ~~benefit: your local system might be a lot faster than your remote one,~~ ~~as when the remote is a small VPS. Even with the overhead of schlepping~~ ~~container images across the Internet, it can be a net win in terms of~~ ~~build time.~~ ### 6.3 <a id="nspawn"></a>`systemd-container` If even the Podman stack is too big for you, the next-best option I’m aware of is the `systemd-container` infrastructure on modern Linuxes, available since version 239 or so. Its runtime tooling requires only	\| \| < < < < < < < \| < \| \| < \| \| < \| < < < \| < < \| \| \| \| > > > > \| \| \| < < < \| < < < < < < < \| < < < < < < < < > > > > > > > > > > > > > > > \| < < < \| < < < < < \| < < \| < < < < < < < < < < < < < < < < < < < < \| \| < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < <	599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685	[ctrd]: https://containerd.io/ [nerdctl]: https://github.com/containerd/nerdctl [runc]: https://github.com/opencontainers/runc ### 6.2 <a id="podman"></a>Podman A lighter-weight alternative that doesn’t give up the image builder is [Podman]. Initially created by Red Hat and thus popular on that family of OSes, it will run on any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] or [on Windows via WSL2][pmwin]. On Ubuntu 22.04, the installation size is about 38 MiB, roughly a tenth the size of Docker Engine. Although Podman [bills itself][whatis] as a drop-in replacement for the `docker` command and everything that sits behind it, some of the tool’s design decisions affect how our Fossil containers run, as compared to using Docker. The most important of these is that, by default, Podman wants to build and run your container “[rootless].” This is generally better for security, but there’s something you need to be aware of: each user has their own local container registry. Let’s say you’re following good security practice by building the container on the server as a regular user, but you then want to start it as root because your server OS of choice won’t start user-level `systemd` units until and unless that user logs in first. The problem is, the root user can’t see the unprivileged user’s container registry, so even though it did build the image, you can’t create the actual container from that image since that needs to be done as root. The simple way to deal with this is to bounce the container through a registry that both users can see, such as [Docker Hub](https://hub.docker.com): ``` $ podman login $ podman build -t fossil . $ podman tag fossil:latest mydockername/fossil:latest $ podman image push mydockername/fossil:latest ``` That will push the image up to your account, so that you can then say: ``` $ sudo podman create \ --any-options-you-like \ docker.io/mydockername/fossil ``` This round-trip through the public image registry has another side benefit: it lets you build on a local system that might be a lot faster than your remote one, as when the remote is a small VPS. Even with the overhead of schlepping container images across the Internet, it can be a net win in terms of build time. Another oddity compared to Docker is that Podman doesn’t have the same [default Linux kernel capability set](#caps). The changes distill to: ``` $ podman create \ --name fossil \ --cap-drop CHOWN \ --cap-drop FSETID \ --cap-drop KILL \ --cap-drop NET_BIND_SERVICE \ --cap-drop SETFCAP \ --cap-drop SETPCAP \ --publish 127.0.0.1:9999:8080 \ localhost/fossil $ podman start fossil ``` [pmmac]: https://podman.io/getting-started/installation.html#macos [pmwin]: https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md [Podman]: https://podman.io/ [rootless]: https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md [whatis]: https://podman.io/whatis.html ### 6.3 <a id="nspawn"></a>`systemd-container` If even the Podman stack is too big for you, the next-best option I’m aware of is the `systemd-container` infrastructure on modern Linuxes, available since version 239 or so. Its runtime tooling requires only
︙			︙
761 762 763 764 765 766 767 ~~768~~ 769 770 ~~771~~ 772 773 774 775 776 777 778	Next, create `/etc/systemd/nspawn/myproject.nspawn`: ---- ``` [Exec] ~~WorkingDirectory=/~~jail~~~~ Parameters=bin/fossil server \ --baseurl https://example.com/myproject \ ~~--chroot /jail \~~ --create \ --jsmode bundled \ --localhost \ --port 9000 \ --scgi \ --user admin \ museum/repo.fossil	\| <	723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739	Next, create `/etc/systemd/nspawn/myproject.nspawn`: ---- ``` [Exec] WorkingDirectory=/ Parameters=bin/fossil server \ --baseurl https://example.com/myproject \ --create \ --jsmode bundled \ --localhost \ --port 9000 \ --scgi \ --user admin \ museum/repo.fossil
︙			︙
787 788 789 790 791 792 793 ~~794~~ 795 796 797 798 799 800 801	CAP_SETFCAP \ CAP_SETPCAP ProcessTwo=yes LinkJournal=no Timezone=no [Files] ~~Bind=/home/fossil/museum/myproject:~~/jail~~/museum~~ [Network] VirtualEthernet=no ``` ----	\|	748 749 750 751 752 753 754 755 756 757 758 759 760 761 762	CAP_SETFCAP \ CAP_SETPCAP ProcessTwo=yes LinkJournal=no Timezone=no [Files] Bind=/home/fossil/museum/myproject:/museum [Network] VirtualEthernet=no ``` ----
︙			︙
811 812 813 814 815 816 817 ~~818~~ 819 820 821 822 823 824 825	* The command given in the `Parameters` directive assumes you’re setting up [SCGI proxying via nginx][DNT], but with adjustment, it’ll work with the other repository service methods we’ve [documented][srv]. * The path in the host-side part of the `Bind` value must point at the directory containing the `repo.fossil` file referenced in said ~~command so that `~~/jail~~/museum/repo.fossil` refers to your repo out~~ on the host for the reasons given [above](#bind-mount). That being done, we also need a generic systemd unit file called `/etc/systemd/system/fossil@.service`, containing: ----	\|	772 773 774 775 776 777 778 779 780 781 782 783 784 785 786	* The command given in the `Parameters` directive assumes you’re setting up [SCGI proxying via nginx][DNT], but with adjustment, it’ll work with the other repository service methods we’ve [documented][srv]. * The path in the host-side part of the `Bind` value must point at the directory containing the `repo.fossil` file referenced in said command so that `/museum/repo.fossil` refers to your repo out on the host for the reasons given [above](#bind-mount). That being done, we also need a generic systemd unit file called `/etc/systemd/system/fossil@.service`, containing: ----
︙			︙
857 858 859 860 861 862 863 ~~864~~ 865 866 867 868 869 870 871	public using nginx via SCGI. If you aren’t using a front-end proxy and want Fossil exposed to the world via HTTPS, you might say this instead in the `*.nspawn` file: ``` Parameters=bin/fossil server \ --cert /path/to/cert.pem \ ~~--chroot /jail \~~ --create \ --jsmode bundled \ --port 443 \ --user admin \ museum/repo.fossil ```	<	818 819 820 821 822 823 824 825 826 827 828 829 830 831	public using nginx via SCGI. If you aren’t using a front-end proxy and want Fossil exposed to the world via HTTPS, you might say this instead in the `*.nspawn` file: ``` Parameters=bin/fossil server \ --cert /path/to/cert.pem \ --create \ --jsmode bundled \ --port 443 \ --user admin \ museum/repo.fossil ```
︙			︙
1031 1032 1033 1034 1035 1036 1037 ~~1038~~ 1039 1040 1041 1042 1043 1044 1045	rather than “boot” an OS image. That causes a bunch of commands to fail: * `machinectl poweroff` will fail because the container isn’t running dbus. * `machinectl start` will try to find an `/sbin/init` program in the rootfs, which we haven’t got. We could ~~rename `~~/jail~~/bin/fossil` to `/sbin/init` and then hack~~ the chroot scheme to match, but ick. (This, incidentally, is why we set `ProcessTwo=yes` above even though Fossil is perfectly capable of running as PID 1, a fact we depend on in the other methods above.) * `machinectl shell` will fail because there is no login daemon running, which we purposefully avoided adding by	\|	991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005	rather than “boot” an OS image. That causes a bunch of commands to fail: * `machinectl poweroff` will fail because the container isn’t running dbus. * `machinectl start` will try to find an `/sbin/init` program in the rootfs, which we haven’t got. We could rename `/bin/fossil` to `/sbin/init` and then hack the chroot scheme to match, but ick. (This, incidentally, is why we set `ProcessTwo=yes` above even though Fossil is perfectly capable of running as PID 1, a fact we depend on in the other methods above.) * `machinectl shell` will fail because there is no login daemon running, which we purposefully avoided adding by
︙			︙