Fossil

Artifact [34a0b0637f]
Login

Artifact [34a0b0637f]

Artifact 34a0b0637fd50d761de694c0c61afffddc3b898a5b151ac938c3a117dc30cb80:


{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image Format vs Fossil Repository Size\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "This notebook was developed with [JupyterLab][jl]. To follow in my footsteps, install that and the needed Python packages:\n",
    "\n",
    "    $ pip install jupyterlab matplotlib pandas wand\n",
    "\n",
    "In principle, it should also work with [Anaconda Navigator][an], but because [Wand][wp] is not currently in the Anaconda base package set, you may run into difficulties making it work, as we did on macOS. These problems might not occur on Windows or Linux.\n",
    "\n",
    "This notebook uses the Python 2 kernel because macOS does not include Python 3, and we don't want to make adding that a prerequisite for those re-running this experiement on their own macOS systems. The code was written with Python 3 syntax changes in mind, but we haven't yet successfully tried it in a Python 3 Jupyter kernel.\n",
    "\n",
    "[an]: https://www.anaconda.com/distribution/\n",
    "[jl]: https://github.com/jupyterlab/\n",
    "[wp]: http://wand-py.org/\n",
    "\n",
    "\n",
    "## Running\n",
    "\n",
    "The next cell generates the test repositories. This takes about 45 seconds to run, primarily due to the `sleep 1` synchronization call, made 40 times in the main test loop.\n",
    "\n",
    "The one after that produces the bar chart from the collected data, all but instantaneously.\n",
    "\n",
    "This split allows you to generate the expensive experimental data in a single pass, then play as many games as you like with the generated data.\n",
    "\n",
    "\n",
    "## Discussion\n",
    "\n",
    "That is kept in [a separate document](image-format-vs-repo-size.md) so we can share that document with Fossil's Markdown renderer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "import time\n",
    "\n",
    "from wand.color import Color\n",
    "from wand.drawing import Drawing\n",
    "from wand.image import Image\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "size = 256\n",
    "iterations = 10\n",
    "start = time.time()\n",
    "repo_sizes = []\n",
    "\n",
    "formats = ['JPEG', 'BMP', 'TIFF', 'PNG']\n",
    "for f in formats:\n",
    "    ext = f.lower()\n",
    "    tdir = 'test' + '-' + ext\n",
    "    repo = tdir + '.fossil'\n",
    "    ifn = 'test.' + ext\n",
    "    ipath = os.path.join(tdir, ifn)\n",
    "    rs = []\n",
    "    \n",
    "    def add_repo_size():\n",
    "        rs.append(os.path.getsize(repo) / 1024.0 / 1024.0)\n",
    "\n",
    "    try:\n",
    "        # Create test repo\n",
    "        if not os.path.exists(tdir): os.mkdir(tdir, 0o700)\n",
    "        cmd = 'cd {0} ; fossil init ../{1} && fossil open ../{1} && fossil set binary-glob \"*.{2}\"'.format(\n",
    "            tdir, repo, ext\n",
    "        )\n",
    "        if os.system(cmd) != 0:\n",
    "            raise RuntimeError('Failed to create test repo ' + repo)\n",
    "        add_repo_size()\n",
    "\n",
    "        # Create test image and add it to the repo\n",
    "        img = Image(width = size, height = size, depth = 8,\n",
    "                    background = 'white')\n",
    "        img.alpha_channel = 'remove'\n",
    "        img.evaluate('gaussiannoise', 1.0)\n",
    "        img.save(filename = ipath)\n",
    "        cmd = 'cd {0} ; fossil add {1} && fossil ci -m \"initial\"'.format(\n",
    "            tdir, ifn\n",
    "        )\n",
    "        if os.system(cmd) != 0:\n",
    "            raise RuntimeError('Failed to add ' + ifn + ' to test repo')\n",
    "        #print \"Created test repo \" + repo + \" for format \" + f + \".\"\n",
    "        add_repo_size()\n",
    "\n",
    "        # Change a random pixel to a random RGB value and check it in\n",
    "        # $iterations times.\n",
    "        for i in range(iterations):\n",
    "            with Drawing() as draw:\n",
    "                x = random.randint(0, size - 1)\n",
    "                y = random.randint(0, size - 1)\n",
    "\n",
    "                r = random.randint(0, 255)\n",
    "                g = random.randint(0, 255)\n",
    "                b = random.randint(0, 255)\n",
    "                \n",
    "                draw.fill_color = Color('rgb({0},{1},{2})'.format(\n",
    "                    r, g, b\n",
    "                ))\n",
    "                draw.color(x, y, 'point')\n",
    "                draw(img)\n",
    "                img.save(filename = ipath)\n",
    "                \n",
    "                # ImageMagick appears to use some kind of asynchronous\n",
    "                # file saving mechanism, so we have to give it time to\n",
    "                # complete.\n",
    "                time.sleep(1.0)\n",
    "  \n",
    "                cmd = 'cd {0} ; fossil ci -m \"change {1} step {2}\"'.format(\n",
    "                    tdir, f, i\n",
    "                )\n",
    "                if os.system(cmd) != 0:\n",
    "                    raise RuntimeError('Failed to change ' + f + ' image, step ' + str(i))\n",
    "                add_repo_size()\n",
    "                \n",
    "        # Repo complete for this format\n",
    "        repo_sizes.append(pd.Series(rs, name=f))\n",
    "\n",
    "    finally:\n",
    "        if os.path.exists(ipath): os.remove(ipath)\n",
    "        if os.path.exists(tdir):\n",
    "            os.system('cd ' + tdir + ' ; fossil close -f')\n",
    "            os.rmdir(tdir)\n",
    "        if os.path.exists(repo): os.remove(repo)\n",
    "            \n",
    "print(\"Experiment completed in \" + str(time.time() - start) + \" seconds.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%config InlineBackend.figure_formats = ['svg']\n",
    "\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Merge per-format test data into a single DataFrame without the first\n",
    "# first 3 rows: the initial empty repo state (boring) and the repo DB\n",
    "# size as it \"settles\" in its first few checkins.\n",
    "data = pd.concat(repo_sizes, axis=1).drop(range(3))\n",
    "\n",
    "mpl.rcParams['figure.figsize'] = (6, 4)\n",
    "ax = data.plot(kind = 'bar', colormap = 'coolwarm',\n",
    "          grid = False, width = 0.8,\n",
    "          edgecolor = 'white', linewidth = 2)\n",
    "ax.axes.set_xlabel('Checkin index')\n",
    "ax.axes.set_ylabel('Repo size (MiB)')\n",
    "plt.savefig('image-format-vs-repo-size.svg', transparent=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}