I'm throwing my hat in the ring for the Computercraft Competition to make a self-replicating turtle. It's a bit of late entry -- the deadline was Nov 1, 2012, and the forum has long closed. But I love computercraft, so who cares!
Computercraft is a mod for minecraft. In it you program Lua code to control little turtles.
turtles can only interact with these three blocks
They can:
Move up, down, and forward. This costs 1 fuel.
Look up, down, and forward (1 block) -- they can't see their environment
Mine blocks up, down, and forward.
Place blocks up, down, and forward.
Turn left or right.
Look in and manage an inventory of 16 slots
Craft items, assuming their inventory is completely empty other than the craft.
Refuel, using any item that can be used as fuel in a smelter.
Take the FIRST item out of a chest, or dump items in a chest
So... they can mine and turn for free, but moving costs fuel. And the biggest problem is the list of things they can't do:
They don't know their position or location
They have no idea what's in any block around them, other than directly in front of them
They can't interact with a chest other than the very first slot
They have a few capabilities added since the 2012 post, which I'm taking full advantage of
They can move an item from one slot in a chest to another slot, and generally look at the list of items in a chest
They can detect what item is in their inventory, or what's in front of them. So they learn it's "oak_planks". Previously, all they could do was check whether it was the same as another item in their inventory! Much harder.
This brings us to the challenge, which is to use a computercraft turtle... to build two computercraft turtles. Possible in theory, but in practice I've only seen maybe 1 completion of the challenge. You're guaranteed that the turtle starts at the bottom of an oak tree. There are various additional requirements for the challenge, which I've basically ignored, but I did display the status for the human watcher.
Here's a video of it happening. There's no sound or audio commentary. Sorry!
I proceed in hardcoded phases:
Chop a single log, craft it into planks, and consume it for fuel, so the turtle can move.
Chop down the first tree. Place a block at the top, so only small oak trees grow (not large oaks, which are more complicated to chop down). Also craft a chest to store materials we gather.
Note: At this point I speed up tick speed and place an automatic bonemeal machine to grow the tree, so it's more fun to watch.
Continue to chop down trees until we build up enough planks and fuel for later phases. We also add a sign to the left, to update the player on where we're at (phase, fuel, material-gathering progress).
Determine the turtle's height by going to some known height and counting back to where we were. We could either go down to bedrock, or up to world height. Since bedrock is bumpy, I picked world height.
Dig at ideal gold ore height, gather gold. Along the way, we've gotten some cobblestone.
Dig just above bedrock, gathering diamonds and redstone
Dig sideways at sea level in a straight line, looking for sand. Note that I temporarily slow down tick speed, because if the turtle moves itself out of loaded chunks, it shuts off and forgets everything.
Craft and place a furnace. Smelt the gold and sand.
Craft: a glass pane, a computer, a pickaxe, a crafting table, a turtle, and finally a crafting-mining turtle, same as we started it.
Along the way, the turtle refuels when it gets low on fuel, and deposits items in the chest or drops them to clear space for crafting and more gathering.
How long does it take to make two copies? Well, in a deep sense it doesn't matter, because you can keep doubling indefinitely. But just for amusement, let's find out. I added some logging profiling code to find out what the slow steps are, and they tell us the answers.
I sped up the tick rate, but luckily the internal clock also gets adjusted the same way, so we can measure what would have been the clock time no problem: main (1 times): 6959 seconds
We also bonemealed the trees! So we better take that into account too: awaitTree (22 times): 175 seconds. Let's change that to a more average value. A minecraft tree takes an average of 16 minutes to grow (provided there's space and light -- we actually set it to perpetual noon, but since it would be easy to place a torch, I'll ignore that)
So the real time is 5.8 hours waiting for trees to grow, plus 1.9 hours for everything else -- a total of 7.75 hours.
If you kept re-placing the turtles, that means you'd have over 1 million turtles in a week. (Well, you wouldn't, because chunkloading--but that's something you could do with turtles too, in theory.)
Recently I was making a villager trading hall in minecraft.
One of the main goals of a trading hall is to collect all villager trades. One of the trickiest is books, provided by a librarian. I got to wondering -- how long is this going to take?
Well, we can do some math to find out.
There are currently (as of Minecraft Java Edition 1.21.11) 40 trade-able books. 36 of them are available from the enchanting table, treasure chests[1], or trading.
4 are available only from treasure chests and trading. These are called Treasure enchantments.
Curse of Binding
Curse of Vanishing
Frost Walker I and II
Mending
There's also three books, which can be found only from treasure chests. We don't care about them for trading halls:
Soul Speed
Swift Sneak
Wind Burst
There are no books available only from enchanting and not trading.
The core mechanic of searching for book trades is resetting. If we look at a librarian and find it has a trade we don't want, we reset it.
Villagers remember their profession and trades forever after trading. But if we haven't traded with a villager, we simply remove its profession, and then give it a profession again. Then we can see if we like the new starting trades better.
This is very useful for librarians, because they have every book available as a starting trade, so there's no need to investigate later trades for books.
These are the options to make the villager forget their profession I'm aware of:
Ignore the villager and get a new one (for example with a breeder), moving or killing the old one. This isn't a "reset" per se, but it acts similarly.
Break the profession block manually. In the case of a librarian, the lectern. When breaking the block, the villager loses their profession instantly.
Block the path between the villager and their profession block. I haven't seen this documented, but they reset at the same time as trades reset (twice per minecraft day). I did this by dropping the villager 1 block using a piston.
Move the villager at least 48 taxicab blocks from their profession block. (Not tested)
Move the profession block with a piston. This is an instant reset, but you can't do it for a lectern in Java edition. (Not tested)
Some of these are instant, some take longer. Once villagers are shown a profession block, it only takes them a couple seconds to get their new profession, so that part is easy.
I found breaking and re-placing blocks to be annoying, so I settled on moving librarians up and down with pistons. It takes about 5 real-time minutes for them to reset, so I used about 50 librarians to counteract that. By the time I finished checking all 50 librarians, they were ready to reset again because 5 minutes had passed.
Then the question is: How many librarians do we need to look at, to get every book?
Well, the first question is: what are we interested in? Let's say we're interested in getting each of the 40 enchantments.
Well, it turns out each enchantment is equally likely: there's a 1/40 chance of getting it. Well actually, 1/60 -- there's a chance that no book trade is offered at all.
The number of trades to look at turns out to be: 3/2 x n x H(n) where H(n) is the n-th harmonic number. For n=40, H(40) = 1/1 + 1/2 + 1/3 + ... + 1/39 + 1/40 = 4.2785. So we need to check 257 librarians on average to get every enchantment.
But, are we really okay with that result? Given that Efficiency V is available as a starting trade, I want a librarian with Efficiency V, not Efficiency I!
There are:
Enchantment
Level
Aqua Affinity
I
Channeling
I
Curse of Binding
I
Curse of Vanishing
I
Flame
I
Infinity
I
Mending
I
Multishot
I
Silk Touch
I
Fire Aspect
II
Frost Walker
II
Knockback
II
Punch
II
Depth Strider
III
Fortune
III
Looting
III
Loyalty
III
Luck of the Sea
III
Lunge
III
Lure
III
Quick Charge
III
Respiration
III
Riptide
III
Sweeping Edge
III
Thorns
III
Unbreaking
III
Blast Protection
IV
Breach
IV
Feather Falling
IV
Fire Protection
IV
Piercing
IV
Projectile Protection
IV
Protection
IV
Bane of Arthropods
V
Density
V
Efficiency
V
Impaling
V
Power
V
Sharpness
V
Smite
V
9 tradable enchantments with a max level of I
4 with a max level of II
13 with a max level of III
7 with a max level of IV
7 with a max level of V
What's the chance of getting each level of enchantment? It's equal. So for Mending, there's a 1/60 chance to get Mending I, because it's the only choice. For Efficiency, there's a 2/3 * 1/40 x 1/5 = 1/200 chance to get Efficiency I, Efficiency II, or Efficiency V.
How do we calculate the coupon collector's problem for un-equal probabilities? Well... it's really complicated[2].
But the answer is that we will have to talk to an average of 933 librarians to get all enchants at max level.
But hey. We can buy Efficiency V for 17 emeralds, if we get the right trade. Are we really okay getting a 64 emerald trade? What if we want only the best trades?
Enchantment
Level
Cost
Aqua Affinity
I
5-19
Bane of Arthropods
V
17-71
Blast Protection
IV
14-58
Breach
IV
14-58
Channeling
I
5-19
Curse of Binding
I
10-38
Curse of Vanishing
I
10-38
Depth Strider
III
11-45
Density
V
17-71
Efficiency
V
17-71
Feather Falling
IV
14-58
Fire Aspect
II
8-32
Fire Protection
IV
14-58
Flame
I
5-19
Fortune
III
11-45
Frost Walker
II
16-64
Impaling
V
17-71
Infinity
I
5-19
Knockback
II
8-32
Looting
III
11-45
Loyalty
III
11-45
Luck of the Sea
III
11-45
Lunge
III
11-45
Lure
III
11-45
Mending
I
10-38
Multishot
I
5-19
Piercing
IV
14-58
Power
V
17-71
Projectile Protection
IV
14-58
Protection
IV
14-58
Punch
II
8-32
Quick Charge
III
11-45
Respiration
III
11-45
Riptide
III
11-45
Sharpness
V
17-71
Silk Touch
I
5-19
Smite
V
17-71
Sweeping Edge
III
11-45
Thorns
III
11-45
Unbreaking
III
11-45
Mostly, the price range is based only on the level, but there are a few minor complications:
Some price ranges go above 64! In the game, these get capped. For this reason, you're 8 times more likely to get Efficiency V for 64 emeralds than any other number.
Treasure enchantments (in bold above) are double the price of any other enchantment. This is actually a double -- they're never offered for odd numbers of emeralds. Interesting!
The chance of getting an Efficiency V book at the best possible price is: 1/16,500 = 2/3 x 1/40 x 1/5 x 1/55 (because there are 55 possible different prices -- counting ones above 64).
To get every book at the best price, we'd need to talk to 45,594 librarians[2] to get every max-level enchant at the best price.
[1]: I think
[2]: Source code here. This uses the inclusion-exclusion principle to estimate set sizes, together with optimizations to take care of repeat probabilities.
Today's project was a vibe-coded chat program. For those unfamiliar, "vibe coding" is programming where an AI does the majority of the coding, and in fact is often undertaken by non-programmers. In my case I took an approach a bit closer to "architect" than entirely hands-off, but an LLM did all the heavy lifting.
The code is here -- roughly one commit per interaction, with a few combined. The prompts are not included.
I've mostly been using AI very little during hack-a-day... sometimes to help debug, and in one case to write another "boring bit" (convert Minecraft world to JSON, for the voxel engine). It might get stuff done, but it's not going to improve the same set of skills to do stuff with an AI. And I'm generally a bit wary of using AI, because it can really just spew some absolute bullshit, which is in my head afterwards.
I've had a relatively better experience using Anthropic's Claude than most other products (for which I have a paid plan). Unfortunately they have very opaque usage caps, and I'd hit limits repeatedly during this project. Then it would say "please try again at 4pm" (in 3 hours). So I pretty much ran out of LLM usage on this one.
Overall I'd say I got to do some coding I usually wouldn't. The project was a curses frontend for a chat (and backend, but that didn't really get done yet). Something like making a curses interface would usually be a bit too boring for me--being able to collaborate with an LLM, who doesn't find such things boring, is great. Other than tooling issues, the main problem is that Claude doesn't write the best code. It generally has a very "junior programmer" vibe, with no use of abstraction, and tends toward the verbose.
My general take on AI though is that someone showed me a horse than can write an essay, and I'm complaining its penmanship is atrocious. It's pretty amazing stuff, and we're probably all going to be dead soon.
In the meantime it's pretty fun to mess about with.
PS: I do plan to update this one further, it just will require a bit of work each day given the rate limits. I had really grand plans, but we only got the bare minimum done.
Today's hack-a-day project was the Pokédex -- the fictional companion that tells you about pokemon in the game. My main goal was just to get the info into a reasonable database format, but along with way I built a little viewer too.
The plan is to make some kind of art game where you do pokemon fanart, a coloring book, or even a tracing game in the coming days. And now I'm ready, with art of each pokemon on hand.
Yikes, been having some back pain, and the past few days it's been tougher to work. I've started four projects in four days, without too much to show for it.
Day 01 project is waiting on computation to run; overall I'm happy with it but will post when I get the results.
Day 02 project I barely started and won't finish, most likely. It takes a photo of a Go board and tries to output the game. I'd learn some image processing doing it, but I'm sure there's plenty of existing and better tools to do the same thing.
Day 03 project was a bit ambitious. Will post it if I finish (and hopefully I will, it's cool!)
Today's hack-a-day project was Reverse Vibe Coding. I sometimes use LLMs such as Claude for "vibe coding", mostly on throwaway type projects. It didn't seem fair for that to go only one way, so today I offered to vibe code for Claude -- it picks what I should make, and I code it up for Claude.
The result is the Conversation Flow Visualizer. This graphs when new topics come up in conversation, and what they are.
Frankly I think it's dumb and useless, but Claude is the boss, so there ya go! Can't pick who you work fo... okay, I guess I could this time.
In any case, it was pretty relaxing to be a junior dev and just do as I'm told for a bit, honestly. Easy win.
I honestly think this would be a good way to learn a new programming language or a new library.
Hack-a-Day is my self-imposed challenge to do one project a day, for all of November.
How do you render 3D graphics? Here's a picture of a cube:
a 3D cube
But when you draw it on paper or a screen, it's flattened. All you see are these three faces.
a 2D cube
In fact, if you turn off your brain, it's just three weird polygons. And we can figure out the corners of the polygon. For example, I figured out these with a ruler, measuring they they are on the paper in centimeters.
some polygons
So to draw a cube, we just need to draw polygons. That's the essence of today's project.
Here's a minecraft world.
my minecraft base
Here's the same thing in my voxel engine. If you squint, you might be able to recognize they're the same thing. Ignore the stripe at the top.
my "minecraft" base
Here's a much simpler scene. If you click, you can explore it online
The board game Go has been revolutionized in recent years by computer play. In 2016, AlphaGo beat Lee Sedol, a top Go player. This was the equivalent of what happened in Chess in 1997.
Since then, computers have continued to outstrip human players, but we have been learning a lot from Go engines. In this article I did some investigation using KataGo, which I understand to essentially be an open-source clone of the AlphaGo architecture.
This article assumes familiarity with the board game. If you're not familiar, I encourage you to give it a try sometime! Find a local try, or play online.
We have only one operation we can do. We can ask KataGo to analyze a position, and tell us how good that position is. That's the only operation we'll use in this article. And we're supposed to tell KataGo what the komi is.
KataGo returns two pieces of information for a position. An estimate of the score, and a percentage win chance.
Score is B+12, black win rate is 99.8%
The estimate of the score is determined (according to my very poor understanding), using an estimator which looks at the board, but doesn't try any moves. This is a fast, but low-quality metric.
On the other hand, the win rate is detemined by, simplifying some details, trying playing the game a bunch of times really fast and seeing how often black wins. It's slower, but more accurate.
Our first question is: How much should komi be?
Using only our one tool, let's figure out what KataGo thinks.
Well, in theory, what does a "good" komi mean? It means black and white should both win about 50% of the time. So let's just guess every possible komi, and find the one with the closest to 50% win rate.
Or, we could use the fast score estimator on an empty board with zero komi. If it thinks black is ahead by 6.0, maybe we could set komi to 6.0.
size
komi estimate (winrate)
komi estimate (neural)
3
+14.0
+4.4
4
+0.5
+2.4
5
+25.5
+23.3
6
+3.0
+4.3
7
+8.0
+8.7
8
+9.0
+6.6
9
+6.0
+6.0
10
+6.0
+5.6
11
+6.0
+5.5
12
+6.0
+5.5
13
+6.0
+5.6
14
+6.0
+5.5
15
+6.0
+5.7
16
+6.0
+5.8
17
+6.0
+5.9
18
+6.0
+6.1
19
+6.5
+6.2
It turns out both methods give similar results. We're going to use the win rate method going forward, because in general I've been told it's more accurate for many board positions.
In fact, we can use the same method to evaluate any board position accurately. We can figure out what komi would make that board position 50-50 for white or black to win. And then we can treat that as the "value" of the position.
For the rest of the article, we're going to simplify, and only ask the value of board positions. We don't care which method we use, but I'll mark the fast-and-simple method as "neural", and the winrate method as "komi" or "winrate" in pictures.
Our next question is, what are different starting moves worth? Well, let's just play every one and see what KataGo says the score is.
estimating score by finding win rates around 50% estimating score with the fast neural estimator
Note that all scores are relative to +6.5 for the empty board, which is why some values are negative.
Okay, easy enough. What about different numbers of handicap stones? Using standard placements, we get:
size
handicap
value estimate (winrate)
value estimate (neural)
19
1
+6.5
+6.2
19
2
+20.0
+19.2
19
3
+32.5
+32.5
19
4
+47.5
+46.6
19
5
+59.5
+58.3
19
6
+72.5
+71.8
19
7
+86.0
+85.1
19
8
+100.5
+100.3
19
9
+115.5
+114.7
13
1
+6.0
+5.6
13
2
+19.5
+18.6
13
3
+32.5
+30.7
13
4
+48.0
+47.4
13
5
+59.0
+58.6
13
6
+75.0
+74.5
13
7
+87.0
+84.0
13
8
+100.5
+96.1
13
9
+109.5
+102.3
9
1
+6.0
+6.0
9
2
+16.0
+16.0
9
3
+27.5
+27.1
9
4
+75.0
+53.5
9
5
+74.5
+79.0
Now let's make things more spicy. I keep winning every 1-stone game, but losing every 2-stone game. I want a 1.5 stone handicap. Well we can't add fractional stones, but we can look for something worth between 6.5 and 20 points.
Or, let's find something worth 0.0 points. I want a board position we can start with and not need that dumb komi rule.
Let's do the full analysis. Every possible starting board positions. Then we'll look for one that KataGo says is worth around... say, 12 points.
Of course, we can't really analyze every board position, so I just did ones with up to 2 stones. I included ones with white stones, because why not?
Here's what the ones with two black stones on 19x19 look like. It might take a bit to load, and you'll need to zoom in.
9x9, positions closest to exact point values winrate
You can also get the raw score of 2-stone (and lower) positions on 9x9 and 19x19 boards. The code to do analysis and generate the pictures is on github, as are details on exact software settings used.
Thanks to Google for AlphaGo, and to lightvector for Katago (and Katago support).
Addendum.
After doing this project, I found it had already been done (better) at katagobooks.com. Apparently what I've done is called an "opening book", even if my goal was a bit different.
As previously mentioned, I have switched off wordpress. Hopefully, you can't tell. It's meant to be behind-the-scenes.
The only change should be the new comment system. Feel free to try it out by commenting below. You could be the very first commenter!
The rest of this post is for anyone curious why and how, which I skipped last time.
If all is well, my blog looks exactly the same. All links should continue to work. The RSS feed should keep working. Basically it should be a behind-the-scenes change.
I want to edit markdown locally, not use the Wordpress editor, which is getting increasingly bloated.
My server (a VPS) has previously been hacked due to an insecure wordpress installation. Hopefully it can't happen again due to some security changes I made, but that's always a danger. Static sites have almost no security problems.
Static site generators are just nice.
After some discussion with folks on IRC, I realized I could do the migration easier than I thought. (I didn't do it the easy way, but I could have.)
Why not make the change?
It's a lot of work. Not doing things is easier than doing them. Specifically, I have about 200 posts here, so migrating would be a lot of work. Starting a new blog is a valid avenue I didn't take either.
Really, seriously, it's a lot of work.
Comments are hard to deal with on a static site generator. You can not have them (but I like comments), you can have someone else like disqus host them (which is icky), or you can host them yourself (which leaves security problems). In addition, most static-site comment systems require javascript, which is sort of a shame.
It's pretty hard to check whether you've done it right. Reviewing 200 posts is no joke. If you want a computer to check, you'd need the before and after to match exactly, which may not be quite the right goal -- an exact match is only a reasonable goal is it was perfect before.
Nonetheless, I forged on and decided to change. It was probably not worth the work, but since I put in the work, I'll at least share what I did.
Let's talk about how, rather than why, for the rest of the post. This took the better part of a month.
I thought about what I wanted to use. There were a few good options -- Jekyll and Hugo both came recommended, and I've used Jekyll before. They both use a format called frontmatter. Below is an example of a frontmatter document. The top is YAML and the bottom is HTML.
---
type: blog post
title: The worst types of pizza
---
<ol>
<li> Ham and Pineapple
<li> Anchovy
<li> Reheated in the microwave
</ol>
Basically, frontmatter consists of a "front" metadata section, in YAML or TOML or JSON, all of which are different ways of representing metadata. Metadata for a blog post includes things like the title of the blog post, when it was published and updated, and the author. And then below that, is a main content section in HTML or markdown. For a blog post, the main section is the text of the blog post.
I wasn't sure what engine I wanted to use, but I decided to use frontmatter. The content would just be the HTML, verbatim and unchanged from my existing blog. That way, everything would display right. I could write new posts in a new format. Old posts would be ugly behind the scenes, but it work, and I wouldn't have to migrate 200 posts.
I also really, really, didn't want to break the blog. I hate people who break a website changing things halfway. My work would only see the light of day once it was ready to wholly replace my existing blog. All the old links would work perfectly, even if I had to hand-code 200 redirects.
First, I wanted to have my existing posts in some format. Wordpress stores everything in a database. There are a couple options to get them out:
- We could do a database dump. (This is very ugly. Don't do it.)
- You can export them as an XML file. This is probably the best option.
- You can download your website as HTML by crawling it. This is what I did, because I wanted to be sure I could have a blog that looked the same as my current one, and it seemed pretty foolproof.
So I had a big directory full of HTML blog posts, images, comments, etc. Next, I wrote an extractor. It looked at each file corresponding to a post, and grabbed the <article> element with the content of the post, together with any comments. It also extracted some relevant info like the author, publication date, title, and so on. It put them all together into a file. Now I had something that Jekyll and Hugo could use.
I took a look at Hugo. Wow, was it big. It supported YAML, TOML, JSON, HTML, Markdown. It had an asset pipeline. It had three different module systems to extend it. It did overlay filesystem mounts. Templating in Golang's templates. I slowly backed away.
I took a look at Jekyll--small, very opinioned. I generally like that in software. But, absolutely no customization. You had to put everything in a folder called _posts, and the publication date had to be the first part of the name. YAML only for the top. Etc. It seemd good, but I wasn't quite feeling it.
I decided I would roll my own. This was a small project. I only wanted a very limited set of functionality.
I wrote a template. It was an HTML page with a hole in it. You put the blog article HTML in the hole, and you got a finished HTML document. Looked fine. I used mustache from the templating, because I remembered liking it in the past. I got a blog showing. It looked great. It loaded lots of files (like icons images and styling) from the live site, rather than having a local copy. Most of the links went to the live site too.
I converted all the links. I wrote a checker to search for dead links. I decided to generate a page for each tag, since those would change over time. I noticed the tag pages and the post pages had most of the layout in common, so I factored that out. I discovered my python mustache library didn't do "factoring out", and only the javascript library did. I realized I had never liked mustache--I had been thinking of handlebars or spacebars. I decided to put it off--switching templating engines was easy, but it's better not to switch horses mid-stream. I factored out the tag cloud. I got the number of dead links down to just the page of links by one author and the RSS feed. I generated those too. I started generating more of the blog post--the title and author and comments section, too. The HTML shrunk. I had a working version.
I started feeling super disheartened. This was a giant mess. I just wasn't feeling motivated. I took a step back. Was it the work? No, I decided. It was that I didn't want to put in a ton of work, to get a system I wasn't all that happy with. Wordpress was already okay. It wasn't perfect, but it was alright. If I was going to put in work, I wanted the new system to be better. I wanted... I think I wanted to convert the old HTML posts to markdown?
Hoo boy. That was going to be a lot of work.
I took a look around. A year ago (the last time I saw a gorgeous Amal Murali blog post), I had tried a wordpress conversion. I had tried wordpress-export-to-markdown, but I had remembered not liking the output that much. Things had been missing. They hadn't looked right. But it done 80% of things correctly. I checked what it used. Hmm, turndown. A javascript tool to convert HTML to markdown. Sounded promising.
I converted everything to markdown. I took at look at the output. Seemed... reasonable. I'd have to take a look before I decided anything past that. So I needed a tool to convert markdown back to HTML. I was using Python, so I picked markdown2 -- the markdown (1?) page seemed pretty.. theoretical. User comfort seemed like maybe a fourth priority. It hadn't been updated in a few years. markdown2 seemed to care about speed and user comfort. It had lots of plugins. It had been updated last week, though it looked like they hadn't done anything major in a couple years. I gave it a try.
I took HTML, converted it to markdown, converted it back to HTML, and looked at the result. It was... eh. It had some of the same content, but it didn't look quite right. I looked at the HTML. Oh, I had forgotten to wrap it in an <article> tag with all those special wordpress classes. I gave it another try. WOW! That looked almost identical. I made a webpage to look at them side-by-side.
before and after view
Okay, I could do this. There was going to be a list of problems, but I could get through them one by one.
I started looking at articles. Okay, this was missing a class. Galleries were just a series of images now. iframes were being dropped. This was all stuff I could fix. Some of it would be problems converting HTML to markdown. Cases of stripping vital information was especially problematic, because I couldn't fix it later.
Some problems happened when converting markdown to HTML--code blocks inside lists disappeared and became regular text. I started looking into fixes. I was annoyed how hard it was to extend Turndown. I considered writing my own HTML to markdown converter. That was the easy direction--anyone can parse HTML, there are libraries for it. Outputting is easy in any language. Wait, I thought. Turndown would disappear in the final version. Once I had converted the old HTML, that was it. How many problems were there, really? If it was just a few articles, I should fix it by hand instead. That would be easier. I decided I'd wait until I had a better overview.
Other problems happened when converting markdown back to HTML. Parsing markdown would be a nightmare, so I crossed my fingers and prayed I wouldn't have to. I hoped markdown2 was easy to extend. I started thinking with distaste about if I would have to... rewrite the HTML output shudder. I put things off--disappearing information was more important.
I decided to take stock. How would I tell if I was making progress? What if fixing one thing broke another? I had some kind of visual diff tool in mind. If the HTML and markdown versions looked the same, that was good enough for me. But would they? I don't care about little changes. One font slightly different, a section a few pixels to the left. I was worried I would compare the before and after, and none of them would match. I don't know how to tell a computer to ignore that stuff. Oh well, I'd check. Maybe it would work.
I ran a first check using puppeteer to take Chrome screenshots. 24% of posts were identical, right out the gate. That was more than 0%. That meant that yes, this method would actually let me make a TODO list. 0% would have been bad. OK. I started opening up articles. Yes, they actually looked different. It wasn't a few pixels. Every page I opened, seemed to have genuine differences I wanted to fix.
I started fixing the problems. Some big problems got fixed. Smaller ones started cropping up. The first one I found was these. They were comparing different. Was that right?
see the difference? me neither
I stared. I saw nothing. I visually showed the difference. The fonts were highlighted in red. Was it a font issue? I looked at the HTML. Oh, one gray was 10% lighter. Should I fix it? No wait, I didn't want things to be pixel-perfect identical. That was just a tool to measure how close I was to done, let's not lose track of the actual goal. Hmm.
I was starting to feel burnt-out. I wasn't sure where to go next. I talked to friends. I ended up using a heuristic to rank the pages from most to least similar. I'd tackle the big problems. As it happened, some contractors were jackhammering my basement for a few hours, so I had time to kill where I couldn't focus anyway. I opened all ~100 blog posts in chromium, and make little notes about each problem before I closed the tab. If I would be fine not fixing a problem, I didn't write it down. If I saw the same problem twice, I'd add a little + mark next to it. At the end, I had some problems with a lot of + marks next to them. Those were the ones I'd tackle first. Maybe more importantly, I had a good idea of the total amount of work. It was maybe 10 or 20 things to fix, even if I was very fussy. I was okay with that. I could do it.
I went in and started fixing. I found out that Turndown was pretty unmaintained, just like I suspected. I made about 5 PRs--none had any response, so I used a local fork. pyhon-markdown2 usually worked. Every time I thought I found a bug, it was my fault--I hadn't understood something about the nuances of markdown. In one case a bug was real but already fixed in a newer version.
After fixing a dozen problems, I was done. I took a look through the articles again. Most of them looked fine now. I generated the markdown one more time, and then hand-fixed 5-10 articles with problems. I filed fixed articles into a "finished" folder, so they wouldn't be overwritten if I changed my mind and did an automated rebuild.
It was done. I looked, and looked again. Then I deleted all the HTML sources. The side-by-side view. The visual comparison tools. The side-by-side view. The dead link checker. The crawler that extracted the original HTML. I was left with a single tool--it took markdown, and generated a blog. It was tiny again. I rejoiced, and took a well-needed break.
At this point, I had a working blog. Posts were YAML frontmatter, and markdown content. I could write new posts easily in markdown, and all my old posts were in markdown too. I was pretty happy.
I had two more big tasks. One, which I'm punting indefinitely, is to re-style the blog. My current approach is to just have a copy of the old wordpress CSS in one file. It's 7,838 lines long, which is too long. I could reduce it, but it's probably equally reasonable to just make an entirely new stylesheet from scratch. I'm not sure whether old articles will keep the old stylesheet. Probably yes, just to avoid breaking anything. That is... not urgent. I'll do it sometime.
The other part, which I did care about, was to get comments working again. I looked around at a few static site commenting options, and settled on Isso. The user-friendly front page encouraged me. It didn't require registration, it had email moderation where you click a link to approve a comment, comments could use markdown, there was no database setup. And it supported wordpress comment import (although I didn't do this actually).
Great! Now how to install? Oh... the debian package is discontinued? Okay, it was actually a bit of work.
I didn't bother with RSS -- no one reads an RSS feed of comments, and they get included in the RSS feed of posts.
I ran isso by hand:
sudo -u isso /bin/isso &
tail -f /var/log/isso
Added an nginx frontend proxy:
# Run as isso.service
upstream isso {
server 127.0.0.1:9007;
}
server {
listen [::]:443 ssl;
server_name blog.za3k.com;
[... rest of blog.za3k.com ... ]
# comments
location = /comments {
return 302 /comments/;
}
location = /comments/ {
proxy_pass http://isso/;
}
location /comments/ {
proxy_pass http://isso/;
}
}
Added some code to the static generation:
<script src="https://blog.za3k.com/comments/js/embed.min.js"></script>
<section id="isso-thread">
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
And debugged a few errors here and there. Then I added a systemd unit, which I enabled and started: