Order of files inside ZIP archives
One of the most annoying bugs I ever had to track down has to do with the fact that files in the ZIP archive are ordered. As a ZIP file consumer, you won't notice it: if you get two archives where only order of the files is changed and unpack them, you won't see any difference.
However, once you embark on a journey of reproducible builds, there's nothing as frustrating as a cache miss with unchanged sources - and the resulting ZIPs only differ when compared by checksum.
Reproducible example:
>>> from zipfile import ZipFile
>>> zf1 = ZipFile('test1.zip', 'w')
>>> zf1.writestr('a', '')
>>> zf1.writestr('b', '')
>>> zf1.close()
>>> zf2 = ZipFile('test2.zip', 'w')
>>> zf2.writestr('b', '')
>>> zf2.writestr('a', '')
>>> zf2.close()
$ md5sum test1.zip
72bf1a6d0e0ad7a0c2548168b701e4ad test1.zip
$ md5sum test2.zip
f5fc3ae82cbd94c70b27f2d479648b26 test2.zip
This is the fix we applied - file names are pre-sorted before getting packaged into the archive.
2019-03-28