I added an articles section to my website with all blog posts up until now.
I also fixed the very, very old archived blog from 2014.
I added an articles section to my website with all blog posts up until now.
I also fixed the very, very old archived blog from 2014.
I retired at 31, and get asked about it sometimes. I wrote an article about how the math of retirement, which explains how I retired early (and some some extent, why). And of course, how and why you might want to as well.
I want to edit my finances articles, so this one is on my website instead: https://za3k.com/finance/retire_forever
There will probably be some more finances articles to come soon.
qr-backup is a program to back up digital documents to physical paper. Restore is done with a webcam, video camera, or scanner. Someday smart phone cameras will work.
I’ve been making some progress on qr-backup v1.1. So far I’ve added:
--restore
, which does a one-step restore for you, instead of needing a bash one-line restore process--encrypt
provides password-based encryption--instructions
to give more fine-tuned control over printing instructions. There’s a “plain english” explanation of how qr-backup works that you can attach to the backup.--note
for adding an arbitrary message to every sheetv1.1 will be released when I make qr-backup feature complete:
v1.2 will focus on adding a GUI and support for Windows, Mac, and Android. Switching off zbar is a requirement to allow multi-platform support, and will likely improve storage density.
Year 0 – I filled 10 32-GB Kingston flash drives with random data.
Year 1 – Tested drive 1, zero bit rot. Re-wrote the drive with the same data.
Year 2 – Re-tested drive 1, zero bit rot. Tested drive 2, zero bit rot. Re-wrote both with the same data.
They have been stored in a box on my shelf, with a 1-month period in a moving van (probably below freezing) this year.
Will report back in 1 more year when I test the third 🙂
FAQs:
Here’s a list of books I read in 2021. The ones in bold I recommend.
Fiction:
Enigma by Graeme Base
City of Stairs by Robert Jackson Bennett
Look to Windward (Culture 7) by Ian Banks
Surface Detail (Culture 8) by Ian M Banks
Pump Six by Paolo Bacigalupi
Six of Crows by Leigh Bardugo
Lexicon by Max Barry
Mage Errant 1 by John Bierce
Mage Errant 2 by John Bierce
Mage Errant 3 by John Bierce
Mage Errant 4 by John Bierce
Mage Errant 5 by John Bierce
The Atlas Six by Olivie Blake
Lilith’s Brood (Xenogenesis 1) by Octavia E Butler
Elegy Beach (Change 2) by Steven Boyett
Curse of Charion by Louis Bujold
Xenocide by Orson Scott Card
Bohemian Gospel by Dan Carpenter
Convergence (Foreigner 18) by C J Cherryh
Emergence (Foreigner 19) by C J Cherryh
Convergence (Foreigner 21) by C J Cherryh
Iron Prince by Bryce O’Conner and Luke Chmilenko
Murder on the Orient Express by Agatha Christie
The Alchemist by Paulo Coelho
Artemis Fowl (Artemis Fowl 1) by Eoin Colfer
The Arctic Incident (Artemis Fowl 2) by Eoin Colfer
Eternity Code (Artemis Fowl 3) by Eoin Colfer
Opal Deception (Artemis Fowl 4) by Eoin Colfer
Space Between Worlds by J Conrad and Micaiah Johnson
Little Brother by Cory Doctrow
Homeland (Little Brother 2) by Cory Doctrow
Children of Chaos by Dave Duncan
The Alchemist’s Apprentice by Dave Duncan
The Alchemist’s Code by Dave Duncan
The Alchemist’s Pursuit by Dave Duncan
The Cutting Edge by Dave Duncan
Upland Outlaws by Dave Duncan
The Stricken Field by Dave Duncan
Queen of Blood by Sarah Beth Durst
Vita Nostra by Maryna and Serhiy Dyachenko
How Rory Thorne Destroyed the Multiverse by K. Eason
Malazan (Malazan 1) by Steven Erikson
Daughter of the Empire by Raymond Feist and Janny Wurts
Mistress of the Empire by Raymond Feist and Janny Wurts
Servant of the Empire by Raymond Feist and Janny Wurts
Dragon’s Egg (Cheela 1) by Robert L Forward
Mother of Learning by Domagoj Kurmaic/nobody103
Books of Magic by Neil Gaiman
The Midnight Library by Matt Haig
The Warehouse by Rob Hart
Forging Hephestus by Drew Hayes
Super Powereds, v1 by Drew Hayes
Super Powereds, v2 by Drew Hayes
Super Powereds, v3 by Drew Hayes
Super Powereds, v4 by Drew Hayes
Johannes Cabal by Johnathan L. Howard
The Medusa Plague by Mary Kirchoff
Six Wakes by Muir Lafferty
King of Thorns by Mark Lawrence
Emperor of Thorns by Mark Lawrence
First Contacts by Murray Leinster
Futurological Congress by Stanislaw Lem
Perfect Vacuum by Stanislaw Lem
Tuf Voyaging by George R R Martin
Memory of Empire by Arkady Martine
A Desolation Called Peace by Arkady Martine
Middlegame by Seanan McGuire
The Host by Stephanie Meyers
The city & the city by China Mieville
*The House that Made the 16 Loops of time by Tamsyn Muir
Harrow the Ninth by Tamsyn Muir
Convenience Store Woman by Sayaka Murata
A Deadly Education by Naomi Novik
The Last Graduate (Schoolomance 2) by Naomi Novik
Stiletto (Chequey, book 2) by Daniel O’Malley
Special Topics in Calamity Physics by Marisha Pessl
Carpe Jugulum by Terry Pratchett
Guards! Guards! by Terry Pratchett
Jingo by Terry Pratchett
The Last Continent by Terry Pratchett
Monsterous Regiment by Terry Pratchett
Men at Arms by Terry Pratchett
Night Watch by Terry Pratchett
Snuff by Terry Pratchett
Sourcery by Terry Pratchett
The Truth by Terry Pratchett
The Woven Ring (Sol’s Harvest 1) by M D Presley
Years of Rice + Salt by Kim Stanley Robinson
The Torch That Ignites the Stars by Andrew Rowe
Sleep Donation by Karen Russell
A Darker Shade of Magic by V E Schwab
Invisible Life of Addie LaRue by V E Schwab
Vicious by V E Schwab
Vengeance by V E Schwab
Grasshopper Jungle by Andrew Smith
Why Is This Night Different Than All Other Nights? by Lemony Snicket
Dark Storm (Rhenwars 1) by M L Spenser
Anathem by Neal Stephenson
Cryptonomicon by Neal Stephenson
Nimona by Noele Stevenson
Hunter x Hunter manga v1-36 by Yoshihiro Togashi
Worth the Candle by Alexander Wales
Educated by Tara Westover
Soulsmith (Cradle 2) by Will Wight
Blackflame (Cradle 3) by Will Wight
Skysworn (Cradle 4) by Will Wight
Ghostwater (Cradle 5) by Will Wight
Underlord (Cradle 6) by Will Wight
Uncrowned (Cradle 7) by Will Wight
Wintersteel (Cradle 8) by Will Wight
Bloodlines (Cradle 9) by Will Wight
Reaper (Cradle 10) by Will Wight
The Crimson Vault (Travelers Gate 2) by Will Wight
*Dinosaurs by Walter Jon Williams
Blind Lake by Robert Charles Wilson
Thousand Li by Tao Wong
Thousand Li 2 by Tao Wong
Thousand Li 3 by Tao Wong
Thousand Li 4 by Tao Wong
Thousand Li 5 by Tao Wong
Sorcerer’s Legacy by Janny Wurts (see also Feist)
Heretical Edge by ceruleuanscrawling
Mark of the Fool by UnstoppableJuggernaut
there is no antimemetics division by qntm
Only Villains Do That by Webbonomicon
Worm by wildbow
Nonfiction:
Compiling with Continuations by Andrew W. Appel
The Rule of Benedict by St Benedict (read the front material only)
Programming Pearls by Jon Bentley
Whole Brain Emulation Roadmap by Nick Bostrom
Data Matching by Peter Christen
Attack and Defense by James Davies and Akira Ishida
Engines of Creation by K. Eric Drexler
Class by Paul Fussell
The Food Lab by J Kenzi Lopez-Alt
Primitive Technology by John Plant
Monero whitepaper by Nicolas van Saberhagen
Secrets and Lies by Bruce Schneier
The Cuckoo’s Egg by Clifford Stoll
I downloaded all 27 million Go games from online-go.com, aka OGS, with permission. They are available on Internet Archive or here as SGF files or JSON. You can use them for whatever you like.
Recently I wrote a scraper. First, I downloaded all the HTML files. Next, I wanted to parse the content. However, real world data is pretty messy. I would run the scraper, and it would get partway though the file and fail. Then I would improve it, and it would get further and fail. I’d improve it more, and it would finish the whole file, but fail on the fifth one. Then I’d re-run things, and it would fail on file #52, #1035, and #553,956.
To make testing faster, I added a scaffold. Whenever my parser hit an error, it would print the filename (for me, the tester) and record the filename to an error log. Then, it would immediately exit. When I re-ran the parser, it would test all the files where it had hit a problem first. That way, I didn’t have to wait 20 minutes until it got to the failure case.
if __name__ == "__main__":
if os.path.exists("failures.log"):
# Quicker failures
with open("failures.log", "r") as f:
failures = set([x.strip() for x in f])
for path in tqdm.tqdm(failures, desc="re-checking known tricky files"):
try:
with open(path) as input:
parse_file(input)
except Exception:
print(path, "failed again (already failed once")
raise
paths = []
for root, dirs, files in os.walk("html"):
for file in sorted(files):
path = os.path.join(root, file)
paths.append(path)
paths.sort()
with open("output.json", "w") as out:
for path in tqdm.tqdm(paths, desc="parse files"): # tqdm is just a progress bar. you can also use 'for path in paths:
with open(input, "r") as input:
try:
result = parse_file(input)
except Exception:
print(path, "failed, adding to quick-fail test list")
with open("failures.log", "a") as fatal:
print(path, file=fatal)
raise
json.dump(result, out, sort_keys=True) # my desired output is one JSON dict per line
out.write("\n")
I was looking into building a raspberry pi based supercomputer lately. Here’s the background research I did comparing pi models. Most of this information is sourced from raspberrypi.org. I was especially interested in which boot methods worked for which models, which is very scattered, as well as prices.
Let’s take a look at the gzip format. Why might you want to do this?
Let’s work a few examples and look at the format in close detail. For all these examples, I’m using GNU gzip 1.10-3 on an x86_64 machine.
I recommend checking out the linked resources below for a deeper conceptual overview if you want to learn more. That said, these are the only worked examples of gzip and/or DEFLATE of which I’m aware, so they’re a great companion to one another. In particular, you may want to learn what a prefix code is ahead of time.
References:
[1] RFC 1951, DEFLATE standard, by Peter Deutsch
[2] RFC 1952, gzip standard, by Peter Deutsch
[3] infgen, by Mark Adler (one of the zlib/gzip/DEFLATE authors), a tool for dis-assembling and printing a gzip or DEFLATE stream. I found this useful in figuring out the endian-ness of bitfields, and somewhat in understanding the dynamic huffman decoding process. Documentation is here.
[4] An explanation of the ‘deflate’ algorithm by Antaeus Feldspar. A great conceptual overview of LZ77 and Huffman coding. I recommend reading this before reading my DEFLATE explanation.
[5] LZ77 compression, Wikipedia.
[6] Prefix-free codes generally and Huffman‘s algorithm specifically
[7] After writing this, I learned about puff.c, a reference (simple) implementation of a DEFLATE decompressor by Mark Adler.
Let’s take a look at our first example. If you’re on Linux, feel free to run the examples I use as we go.
echo "hello hello hello hello" | gzip
The bytes gzip outputs are below. You can use xxd or any other hex dump tool to view binary files. Notice that the original is 24 bytes, while the compressed version is 29 bytes–gzip is not really intended for data this short, so all of the examples in this article actually get bigger.
Byte | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
Hex | 1f | 8b | 08 | 00 | 00 | 00 | 00 | 00 | 00 | 03 | cb | 48 | cd | c9 | c9 | 57 | c8 | 40 | 27 | b9 | 00 | 00 | 88 | 59 | 0b | 18 | 00 | 00 | 00 |
The beginning and end in bold are the gzip header and footer. I learned the details of the format by reading RFC 1952: gzip
Byte | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
Hex | cb | 48 | cd | c9 | c9 | 57 | c8 | 40 | 27 | b9 | 00 |
Binary | 11001011 | 01001000 | 11001101 | 11001001 | 11001001 | 01010111 | 11001000 | 01000000 | 00100111 | 10111001 | 00000000 |
R. Bin. | 11010011 | 00010010 | 10110011 | 10010011 | 10010011 | 11101010 | 00010011 | 00000010 | 11100100 | 10011101 | 00000000 |
DEFLATE is the actual compression format used inside gzip. The format is detailed in RFC 1951: DEFLATE. DEFLATE is a dense format which uses bits instead of bytes, so we need to take a look at the binary, not the hex, and things will not be byte-aligned. The endian-ness is a little confusing in gzip, so we’ll usually be looking at the “reversed binary” row.
Byte 10: 1 10 10011. Fixed huffman coding. We reverse the bits (because it’s always 2 bits, and we reverse any fixed number of bits) to get 01.
00: Not compressed
01: Fixed huffman coding.
Binary | Bits | Extra bits | Type | Code |
---|---|---|---|---|
00110000-10111111 | 8 | 0 | Literal byte | 0-143 |
110010000-111111111 | 9 | 0 | Literal byte | 144-255 |
0000000 | 7 | 0 | End of block | 256 |
0000001-0010111 | 7 | varies | Length | 257-279 |
11000000-11000111 | 8 | varies | Length | 280-285 |
Binary Code | Bits | Extra bits | Type | Value |
---|---|---|---|---|
00000-111111 | 5 | varies | Distance | 0-31 |
Code | Binary | Meaning | Extra bits |
---|---|---|---|
267 | 0001011 | Length 15-16 | 1 |
Code | Binary | Meaning | Extra bits |
---|---|---|---|
4 | 00100 | Distance 5-6 | 1 |
Our final output is “hello hello hello hello\n”, which is exactly what we expected.
Let’s generate a second example using a file.
echo -en "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8\xf7\xf6\xf5\xf4\xf3\xf2\xf1" >test.bin
gzip test.bin
This input file is pretty weird. In fact, it’s so weird that gzip compression will fail to reduce its size at all. We’ll take a look at what happens when compression fails in the next DEFLATE section below. But first, let’s see how gzip changes with a file instead of a stdin stream.
Byte | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19-38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 |
Hex | 1f | 8b | 08 | 08 | 9f | 08 | ea | 60 | 00 | 03 | 74 | 65 | 73 | 74 | 2e | 62 | 69 | 6e | 00 | see below | c6 | d3 | 15 | 7e | 0f | 00 | 00 | 00 |
Okay, let’s take a look at how the header and footer changed.
Uncompressed data is fairly rare in the wild from what I’ve seen, but for the sake of completeness we’ll cover it.
Byte | 19 | 20 | 21 | 22 | 23 | 24-38 |
Hex | 01 | 0f | 00 | f0 | ff | ff fe fd fc fa f9 f8 f7 f6 f5 f4 f3 f2 f1 |
Binary | 00000001 | 00001111 | 00000000 | 11110000 | 11111111 | omitted |
R. Binary | 10000000 | 11110000 | 00000000 | 00001111 | 11111111 | omitted |
Byte 19: 100 00000. Not compressed. For a non-compressed block only, we also skip until the end of the byte.
00: Not compressed
01: Fixed huffman coding.
Dynamic huffman coding is by far the most complicated part of the DEFLATE and gzip specs. It also shows up a lot in practice, so we need to learn this too. Let’s take a look with a third and final example.
echo -n "abaabbbabaababbaababaaaabaaabbbbbaa" | gzip
The bytes we get are:
We’ve already seen everything interesting in the gzip format, so we’ll skip the header and footer, and move straight to looking at DEFLATE this time.
Byte | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 |
Hex | 1d | c6 | 49 | 01 | 00 | 00 | 10 | 40 | c0 | ac | a3 | 7f | 88 | 3d | 3c | 20 | 2a | 97 | 9d | 37 | 5e | 1d | 0c |
Binary | 00011101 | 11000110 | 01001001 | 00000001 | 00000000 | 00000000 | 00010000 | 01000000 | 11000000 | 10101100 | 10100011 | 01111111 | 10001000 | 00111101 | 00111100 | 00100000 | 00101010 | 10010111 | 10011101 | 00110111 | 01011110 | 00011101 | 00001100 |
R. Binary | 10111000 | 01100011 | 10010010 | 10000000 | 00000000 | 00000000 | 10000000 | 00000010 | 00000011 | 00110101 | 11000101 | 11111110 | 00010001 | 10111100 | 00111100 | 00000100 | 01010100 | 11101001 | 10111001 | 11101100 | 01111010 | 10111000 | 00110000 |
Byte 10: 10111000. 10=Dynamic huffman coding
00: Not compressed
01: Fixed huffman coding.
Okay, so what does “dynamic” huffman coding mean? A fixed huffman code had several hardcoded values defined by the spec. Some are still hardcoded, but some will now be defined by the gzip file.
So basically, the possible lengths and distances are still the same (fixed) ranges, and the literals are still the same fixed literals. But where we had two hardcoded tables before, now we will load these two tables from the file. Since storing a table is bulky, the DEFLATE authors heavily compressed the representation of the tables, which is why dynamic huffman coding is so complicated.
Suppose we have a set of prefix-free codewords: 0, 10, 1100, 1101, 1110, 1111. Forget about what each codeword means for a second, we’re just going to look at the codewords themselves.
We can store the lengths as a list: 1, 2, 4, 4, 4, 4.
Finally, we need to make them correspond to symbols, so we actually store
What’s a “code length”? It’s yet another hardcoded lookup table, which explains how to compress the dynamic huffman code tree itself. We’ll get to it in a second–the important thing about it for now is that there are 19 rows in the table. The binary column (not yet filled in) is what we’re about to decode.
Binary | Code | What it means | Extra bits |
? | 0-15 | Code length 0-15 | 0 |
? | 16 | Copy the previous code length 3-6 times | 2 |
? | 17 | Copy “0” code length 3-10 times | 3 |
? | 18 | Copy “0” code length 11-138 times | 7 |
Binary | Code | What it means | Extra bits |
---|---|---|---|
1100 | 1 | Code length 1 | 0 |
0 | 2 | Code length 2 | 0 |
1101 | 4 | Code length 4 | 0 |
1110 | 16 | Copy the previous code length 3-6 times | 2 |
1111 | 17 | Copy “0” code length 3-10 times | 3 |
10 | 18 | Copy “0” code length 11-138 times | 7 |
Literal Code | Code Length | Binary | Meaning | Extra bits |
---|---|---|---|---|
97 | 1 | 0 | Literal ‘a’ | 0 |
98 | 2 | 10 | Literal ‘b’ | 0 |
256 | 4 | 1100 | End-of-block | 0 |
257 | 4 | 1101 | Length 3 | 0 |
258 | 4 | 1110 | Length 4 | 0 |
259 | 4 | 1111 | Length 5 | 0 |
Code | Bits | Binary | Meaning | Extra Bits |
---|---|---|---|---|
0 | 2 | 00 | Distance 1 | 0 |
4 | 2 | 01 | Distance 5-6 | 1 |
5 | 2 | 10 | Distance 7-8 | 1 |
6 | 2 | 11 | Distance 9-12 | 2 |
I want my debian boot to work as follows:
As in part 1, this guide is debian-specific. To learn more about the Linux boot process, see part 1.
First, we need to prepare the USB stick. Use ‘dmesg’ and/or ‘lsblk’ to make a note of the USB stick’s path (/dev/sdae for me). I chose to write to a filesystem rather than a raw block device.
sudo mkfs.ext4 /dev/sdae # Make a filesystem directly on the device. No partition table.
sudo blkid /dev/sdae # Make a note of the filesystem UUID for later
Next, we’ll generate a key.
sudo mount /dev/sdae /mnt
sudo dd if=/dev/urandom of=/mnt/root-disk.key bs=1000 count=8
Add the key to your root so it can actually decrypt things. You’ll be prompted for your password:
sudo cryptsetup luksAddKey ROOT_DISK_DEVICE /mnt/root-disk.key
Make a script at /usr/local/sbin/unlockusbkey.sh
#!/bin/sh
USB_DEVICE=/dev/disk/by-uuid/a4b190b8-39d0-43cd-b3c9-7f13d807da48 # copy from blkid's output UUID=XXXX
if [ -b $USB_DEVICE ]; then
# if device exists then output the keyfile from the usb key
mkdir -p /usb
mount $USB_DEVICE -t ext4 -o ro /usb
cat /usb/root-disk.key
umount /usb
rmdir /usb
echo "Loaded decryption key from USB key." >&2
else
echo "FAILED to get USB key file ..." >&2
/lib/cryptsetup/askpass "Enter passphrase"
fi
Mark the script as executable, and optionally test it.
chmod +x /usr/local/sbin/unlockusbkey.sh
sudo /usr/local/sbin/unlockusbkey.sh | cmp /mnt/root-disk.key
Edit /etc/crypttab to add the script.
root PARTLABEL=root_cipher none luks,keyscript=/usr/local/sbin/unlockusbkey.sh
Finally, re-generate your initramfs. I recommend either having a live USB or keeping a backup initramfs.
sudo update-initramfs -u
[1] This post is loosely based on a chain of tutorials based on each other, including this
[2] However, those collectively looked both out of date and like they were written without true understanding, and I wanted to clean up the mess. More definitive information was sourced from the actual cryptsetup documentation.