Time log transcribed

I write down everything I do.

I transcribed my journals by hand. That is, I typed them up myself, instead of trying to use handwriting recognition or outsourcing to Mechanical Turk.

  • I started on 2019-11-02, and finished today, 2020-11-20. That’s roughly one year.
  • The 15 journals transcribed go from 2011 to 2020, 10 years. The 2011-2015 period is sparser.
  • Of the 15 journals, 13 of them them I transcribed from the physical version. Two I had thrown out, because my old scanner was feed-through, and I had to destroy the spines to scan the books.
  • That’s 1779 pages total (small ones, these are pocket journals). It’s also 32,000 lines, and 164K words. The text is 1.1MB, the scanned PNG files are 12GB (12000 MB).
  • In general, it takes me 1 hour to transcribe the last week of notes. Going farther back is harder, partly because my handwriting gets more readable as time progresses (due at least as much to my choice of pen, as my neatness), and partly because I have a harder time guessing at poor handwriting without memory to fill it in, and partly because I didn’t use standard formats back then.
  • I do have exact numbers I could check, but a lower bound based on this rate is that was overall 90 hours of work. It probably didn’t take more than twice that.

fabric1 AUR package

Fabric is a system administration tool used to run commands on remote machines over SSH. You program it using python. In 2018, Fabric 2 came out. In a lot of ways it’s better, but it’s incompatible, and removes some features I really need. I talked to the Fabric dev (bitprophet) and he seemed on board with keeping a Fabric 1 package around (and maybe renaming the current package to Fabric 2).

Here’s an arch package: https://aur.archlinux.org/packages/fabric1/

Currently Fabric 1 runs only on Python2. But there was a project to port it to Python 3 (confusingly named fabric3), which is currently attempting to merge into mainline fabric. Once that’s done, I’m hoping to see a ‘fabric1’ and ‘fabric2’ package in all the main distros.

mon(8)

I had previously hand-rolled a status monitor, status.za3k.com, which I am in the process of replacing (new version). I am replacing it with a linux monitoring daemon, mon, which I recommend. It is targeted at working system administrators. ‘mon’ adds many features over my own system, but still has a very bare-bones feeling.

The old service, ‘simple-status‘ worked as follows:

  • You visited the URL. Then, the status page would (live) kick of about 30 parallel jobs, to check the status of 30 services
  • The list of services is one-per-file in a the services.d directory.
  • For each service, it ran a short script, with no command line arguments.
  • All output is displayed in a simple html table, with the name of the service, the status (with color coding), and a short output line.
  • The script could return with a success (0) or non-success status code. If it returned success, that status line would display in green for success. If it failed, the line would be highlighted red for failure.
  • Scripts can be anything, but I wrote several utility functions to be called from scripts. For example, “ping?” checks whether a host is pingable.
  • Each script was wrapped in timeout. If the script took too long to run, it would be highlighted yellow.
  • The reason all scripts ran every time, is to prevent a failure mode where the information could ever be stale without me noticing.

Mon works as follows

  • The list of 30 services is defined in /etc/mon/con.cf.
  • For each service, it runs a single-line command (monitor) with arguments. The hostname(s) are added to the command line automatically.
  • All output can be displayed in a simple html table, with the name of the service, the status (with color coding), the time of last and next run, and a short output line. Or, I use ‘monshow‘, which is similar but in a text format.
  • Monitors can be anything, but several useful ones are provided in /usr/lib/mon/mon.d (on debian). For example the monitor “ping” checks whether a host is pingable.
  • The script could return with a success (0) or non-success status code. If it returned success, the status line would display in green for success (on the web interface), or red for failure.
  • All scripts run periodically. A script have many states, not just “success” or “failure”. For example “untested” (not yet run) or “dependency failing” (I assume, not yet seen).

As you can see, the two have a very similar approach to the component scripts, which is natural in the Linux world. Here is a comparison.

  • ‘simple-status’ does exactly one thing. ‘mon’ has many features, but does the minimum possible to provide each.
  • ‘simple-status’ is stateless. ‘mon’ has state.
  • ‘simple-status’ runs on demand. ‘mon’ is a daemon which runs monitors periodically.
  • Input is different. ‘simple-status’ is one script which takes a timeout. ‘mon’ listens for trap signals and talks to clients who want to know its state.
  • both can show an HTML status page that looks about the same, with some CGI parameters accepted.
  • ‘mon’ can also show a text status page.
  • both run monitors which return success based on status code, and provide extra information as standard output. ‘mon’ scripts are expected to be able to run on a list of hosts, rather than just one.
  • ‘mon’ has a config file. ‘simple-status’ has no options.
  • ‘simple-status’ is simple (27 lines). ‘mon’ has longer code (4922 lines)
  • ‘simple-status’ is written in bash, and does not expose this. ‘mon’ is written in perl, all the monitors are written in perl, and it allows inline perl in the config file
  • ‘simple-status’ limits the execution time of monitors. ‘mon’ does not.
  • ‘mon’ allows alerting, which call an arbitrary program to deliver the alert (email is common)
  • ‘mon’ supports traps, which are active alerts
  • ‘mon’ supports watchdog/heartbeat style alerts, where if a trap is not regularly received, it marks a service as failed.
  • ‘mon’ supports dependencies
  • ‘mon’ allows defining a service for several hosts at once

Overall I think that ‘mon’ is much more complex, but only to add features, and it doesn’t have a lot of features I wouldn’t use. It still is pretty simple with a simple interface. I recommend it as both good, and overall better than my system.

My only complaint is that it’s basically impossible to Google, which is why I’m writing a recommendation for it here.

Cron email, and sending email to only one address

So you want to know when your monitoring system fails, or your cron jobs don’t run? Add this to your crontab:

Now install a mail-sending agent. I like ‘nullmailer‘, which is much smaller than most mail-sending agents. It can’t receive or forward mail, only send it, which is what I like about it. No chance of a spammer using my server for something nasty.

The way I have it set up, I’ll have a server (avalanche) sending all email from one address (nullmailer@avalanche.za3k.com) to one email (admin@za3k.com), and that’s it. Here’s my setup on debian:

Now just run echo "Subject: sendmail test" | /usr/lib/sendmail -v admin@za3k.com to test and you’re done!

Postmortem: bs-store

Between 2020-03-14 and 2020-12-03 I ran an experimental computer storage setup. I movied or copied 90% of my files into a content-addressable storage system. I’m doing a writeup of why I did it, how I did it, and why I stopped. My hope is that it will be useful to anyone considering using a similar system.

The assumption behind this setup, is that 99% of my files never change, so it’s fine to store only one, static copy of them. (Think movies, photos… they’re most of your computer space, and you’re never going to modify them). There are files you change, I just didn’t put them into this system. If you run a database, this ain’t for you.

Because I have quite a lot of files and 42 drives (7 in my computer, ~35 in a huge media server chassis), there is a problem of how to organize files across drives. To explain why it’s a problem, let’s look at the two default approaches:

  • One Block Device / RAID 0. Use some form of system that unifies block devices, such as RAID0 or a ZFS’s striped vdevs. Writing files is very easy, you see a single 3000GB drive.
    • Many forms of RAID0 use striping. Striping splits each file across all available drives. 42 drives could spin up to read one file (wasteful).
    • You need all the drives mounted to read anything–I have ~40 drives, and I’d like a solution that works if I move and can’t keep my giant media server running. Also, it’s just more reassuring that nothing can fail if you can read each drive individually.
  • JBOD / Just a Bunch of Disks. Label each drive with a category (ex. ‘movies’), and mount them individually.
    • It’s hard to aim for 100% (or even >80%) drive use. Say you have 4x 1000GB drives, and you have 800GB movies, 800GB home video footage, 100GB photographs, and 300GB datasets. How do you arrange that? One drive per dataset is pretty wasteful, as everything fits on 3. But, with three drives, you’ll need to split at least one dataset across drives. Say you put together 800GB home video and 100GB photographs. If you get 200GB more photographs, do you split a 300 GB collection across drives, or move the entire thing to another drive? It’s a lot of manual management and shifting things around for little reason.

Neither approach adds any redundancy, and 42 drives is a bit too many to deal with for most things. Step 1 is to split the 42 drives into 7 ZFS vdevs, each with 2-drive redundancy. That way, if a drive fails or there is a small data corruption (likely), everything will keep working. So now we only have to think about accessing 7 drives (but keep in mind, many physical drives will spin up for each disk access).

The ideal solution:

  • Will not involve a lot of manual management
  • Will fill up each drive in turn to 100%, rather than all drives at an equal %.
  • Will deduplicate identical content (this is a “nice to have”)
  • Will only involve accessing one drive to access one file
  • Will allow me to get and remove drives, ideally across heterogenous systems.

I decided a content-addressable system was ideal for me. That is, you’d look up each file by its hash. I don’t like having an extra step to access files, so files would be accessed by symlink–no frontend program. Also, it was important to me that I be able to transparently swap out the set of drives backing this. I wanted to make the content-addressable system basically a set of 7 content-addressable systems, and somehow wrap those all into one big content-addressable system with the same interface. Here’s what I settled on:

  • (My drives are mounted as /zpool/bs0, /zpool/bs1, … /zpool/bs6)
  • Files will be stored in each pool in turn by hash. So my movie ‘cat.mpg’ with sha hash ‘8323f58d8b92e6fdf190c56594ed767df90f1b6d’ gets stored in /zpool/bs0/83/23/f58d8b9 [shortened for readability]
  • Initially, we just copy files into the content-addressable system, we don’t delete the original. I’m cautious, and I wanted to make everything worked before getting rid of the originals.
  • To access a file, I used read-only unionfs-fuse for this. This checks each of /zpool/bs{0..6}/<hash> in turn. So in the final version, /data/movies/cat.mpg would be a symlink to ‘/bs-union/83/23/f58d8b9’
  • We store some extra metadata on the original file (if not replaced by a symlink) and the storied copy–what collection it’s part of, when it was added, how big it is, what it’s hash is, etc. I chose to use xattrs.

The plan here is that it would be really easy to swap out one backing blockstore of 30GB, for two of 20GB–just copy the files to the new drives and add it to the unionfs.

Here’s what went well:

  • No problems during development–only copying files meant it was easy and safe to debug prototypes.
  • Everything was trivial to access (except see note about mounting disks below)
  • It was easy to add things to the system
  • Holding off on deleting the original content until I was 100% out of room on my room disks, meant it was easy to migrate off of, rather risk-free
  • Running the entire thing on top of zfs ZRAID2 was the right decision, I had no worries about failing drives or data corruption, despite a lot of hardware issues developing at one point.
  • My assumption that files would never change was correct. I made the unionfs filesystem read-only as a guard against error, but it was never a problem.
  • Migrating off the system went smoothly

Here are the implementation problems I found

  • I wrote the entire thing as bash scripts operating directly on files, which was OK for access and putting stuff in the store, but just awful for trying to get an overview of data or migrating things. I definitely should have used a database. I maybe should have used a programming language.
  • Because there was no database, there wasn’t really any kind of regular check for orphans (content in the blobstore with no symlinks to it), and other similar checks.
  • unionfs-fuse suuucks. Every union filesystem I’ve tried sucks. Its read bandwidth is much lower than the component devices (unclear, probably), it doesn’t cache where to look things up, and it has zero xattrs support (can’t read xattrs from the underlying filesystem).
  • gotcha: zfs xattrs waste a lot of space by default, you need to reconfigure the default.

But the biggest problem was disk access patterns:

  • I thought I could cool 42 drives spinning, or at least a good portion of them. This was WRONG by far, and I am not sure how possible it is in a home setup. To give you an idea how bad this was, I had to write a monitor to shut off my computer if the drives went above 60C, and I was developing fevers in my bedroom (where the server is) from overheating. Not healthy.
  • unionfs has to check each backing drive. So we see 42 drives spin up. I have ideas on fixing this, but it doesn’t deal with the other problems
  • To fix this, you could use double-indirection.
    • Rather than pointing a symlink at a unionfs: /data/cat.mpg -> /bs-union/83/23/f58d8b9 (which accesses /zpool/bs0/83/23/f58d8b9)
    • Point a symlink at another symlink that points directly to the data: /data/cat.mpg -> /bs-indirect/83/23/f58d8b9 -> /zpool/bs0/83/23/f58d8b9
  • The idea is that backing stores are kinda “whatever, just shove it somewhere”. But, actually it would be good to have a collection in one place–not only to make it easy to copy, but to spin up only one drive when you go through everything in a collection. It might even be a good idea to have a separate drive for more frequently-accessed content. This wasn’t a huge deal for me since migrating existing content meant it coincidentally ended up pretty localized.
  • Because I couldn’t spin up all 42 drives, I had to keep a lot of the array unmounted, and mount the drives I needed into the unionfs manually.

So although I could have tried to fix things with double-indirection, I decided there were some other disadvantages to symlinks: estimating sizes, making offsite backups foolproof. I decided to migrate off the system entirely. The migration went well, although it required running all the drives at once, so some hardware errors popped up. I’m currently on a semi-JBOD system (still on top of the same 7 ZRAID2 devices).

Hopefully this is useful to someone planning a similar system someday. If you learned something useful, or there are existing systems I should have used, feel free to leave a comment.

Printing on the Brother HL-2270DW printer using a Raspberry Pi

Although the below directions work on Raspberry Pi, they should also work on any other system. The brother-provided driver does not run on arm processors[1] like the raspberry pi, so we will instead use the open-source brlaser[2].

Edit: This setup should also work on the following Brother monochrome printers, just substitute the name where needed:

  • brlaser 4, just install from package manager: DCP-1510, DCP-1600 series, DCP-7030, DCP-7040, DCP-7055, DCP-7055W, DCP-7060D, DCP-7065DN, DCP-7080, DCP-L2500D series, HL-1110 series, HL-1200 series, HL-L2300D series, HL-L2320D series, HL-L2340D series, HL-L2360D series, HL-L2375DW series, HL-L2390DW, MFC-1910W, MFC-7240, MFC-7360N, MFC-7365DN, MFC-7420, MFC-7460DN, MFC-7840W, MFC-L2710DW series
  • brlaser 6, follow full steps below: DCP-L2520D series, DCP-L2520DW series, DCP-L2540DW series (unclear, may only need 4), HL-2030 series, HL-2140 series, HL-2220 series, HL-2270DW series, HL-5030 series

Also, all these steps are command-line based, and you can do the whole setup headless (no monitor or keyboard) using SSH.

  1. Get the latest raspbian image up and running on your pi, with working networking. At the time of writing the latest version is 10 (buster)–once 11+ is released this will be much easier. I have written a convenience tool[3] for this step, but you can also find any number of standard guides. Log into your raspberry pi to run the following steps
  2. (Option 1, not recommended) Upgrade to Debian 11 bullseye (current testing release). This is because we need brlaser 6, not brlaser 4 from debian 10 buster (current stable release). Then, install the print system and driver[2]:
    sudo apt-get update && sudo apt-get install lpr cups ghostscript printer-driver-brlaser
  3. (Option 2, recommended) Install ‘brlaser’ from source.
    1. Install print system and build tools
      sudo apt-get update && sudo apt-get install lpr cups ghostscript git cmake libcups2-dev libcupsimage2-dev
    2. Download the source
      wget https://github.com/pdewacht/brlaser/archive/v6.tar.gz && tar xf v6.tar.gz
    3. Build the source and install
      cd brlaser-6 && cmake . && make && sudo make install
  4. Plug in the printer, verify that it shows up using sudo lsusb or sudo dmesg. (author’s shameful note: if you’re not looking, I find it surprisingly easy to plug USB B into the ethernet jack)
  5. Install the printer.
    1. Run sudo lpinfo -m | grep usb to get the device name of your printer. It will be something like usb://Brother/HL-2270DW%20series?serial=D4N207646
      If you’re following this in the hopes that it will work on another printer, run sudo lpinfo -m | grep HL-2270DW to get the PPD file for your printer.
    2. Install and enable the printer
      sudo lpadmin -p HL-2270DW -E -v usb://Brother/HL-2270DW%20series?serial=D4N207646 -m drv:///brlaser.drv/br2270dw.ppd
      Note, -p HL-2270DW is just the name I’m using for the printer, feel free to name the printer whatever you like.
    3. Enable the printer (did not work for me)
      sudo lpadmin -p HL-2270DW -E
    4. (Optional) Set the printer as the default destination
      sudo lpoptions -d HL-2270DW
    5. (Optional) Set any default options you want for the printer
      sudo lpoptions -p HL-2270DW -o media=letter
  6. Test the printer (I’m in the USA so we use ‘letter’ size paper, you can substitute whichever paper you have such as ‘a4’).
    1. echo "Hello World" | PRINTER=HL-2270DW lp -o media=letter (Make sure anything prints)
    2. cat <test document> | PRINTER=HL-2270DW lp -o media=letter (Print an actual test page to test alignment, etc)
    3. cat <test document> | PRINTER=HL-2270DW lp -o media=letter -o sides=two-sided-short-edge (Make sure duplex works if you plan to use that)
  7. (Optional) Set up an scp print server, so any file you copy to a /printme directory gets printed. For the 2270DW, I also have a /printme.duplex directory.

Links
[1] brother driver does not work on arm (also verified myself)
[2] brlaser, the open-source Brother printer driver
[3] rpi-setup, my convenience command-line script for headless raspberry pi setup
[4] stack overflow answer on how to install one package from testing in debian

Streaming Linux->Twitch using ffmpeg and ALSA

I stopped using OBS a while ago for a couple reasons–the main one was that it didn’t support my video capture card, but I also had issues with it crashing or lagging behind with no clear indication of what it was doing. I ended up switching to ffmpeg for live streaming, because it’s very easy to tell when ffmpeg is lagging behind. OBS uses ffmpeg internally for video. I don’t especially recommend this setup, but I thought I’d document it in case someone can’t use a nice GUI setup like OBS or similar.

I’m prefer less layers, so I’m still on ALSA. My setup is:

  • I have one computer, running linux. It runs what I’m streaming (typically minecraft), and captures everything, encodes everything, and sends it to twitch
  • Video is captured using libxcb (captures X11 desktop)
  • Audio is captured using ALSA. My mic is captured directly, while the rest of my desktop audio is sent to a loopback device which acts as a second mic.
  • Everything is encoded together into one video stream. The video is a Flash video container with x264 video and AAC audio, because that’s what twitch wants. Hopefully we’ll all switch to AV1 soon.
  • That stream is sent to twitch by ffmpeg
  • There is no way to pause the stream, do scenes, adjust audio, see audio levels, etc while the stream is going. I just have to adjust program volumes independently.

Here’s my .asoundrc:

My ffmpeg build line:

And most imporantly, my ffmpeg command:

Let’s break that monster down a bit. ffmpeg structures its command line into input streams, transformations, and output streams.

ffmpeg input streams

-video_size 1280x720 -framerate 30 -f x11grab -s 1280x720 -r 30 -i :0.0:
Grab 720p video (-video_size 1280x720) at 30fps (-framerate 30) using x11grab/libxcb (-f x11grab), and we also want to output that video at the same resolution and framerate (-s 1280x720 -r 30). We grab :0.0 (-i :0.0)–that’s X language for first X server (you only have one, probably), first display/monitor. And, since we don’t say otherwise, we grab the whole thing, so the monitor better be 720p.

-f alsa -ac 1 -ar 48000 -i hw:1,0:
Using alsa (-f alsa), capture mono (-ac 1, 1 audio channel) at the standard PC sample rate (-ar 48000, audio rate=48000 Hz). The ALSA device is hw:1,0 (-i hw:1,0), my microphone, which happens to be mono.

-f alsa -ac 2 -ar 48000 -i hw:Loopback,1:
Using alsa (-f alsa), capture stereo (-ac 2, 2 audio channels) at the standard PC sample rate (-ar 48000, audio rate=48000 Hz). The ALSA device is hw:Loopback,1. In the ALSA config file .asoundrc given above, you can see we send all computer audio to hw:Loopback,0. Something sent to play on hw:Loopback,0 is made available to record as hw:Loopback,1, that’s just the convention for how snd-aloop devices work.

ffmpeg transforms

-filter_complex '[1:a][1:a]amerge=inputs=2[stereo1] ; [2:a][stereo1]amerge=inputs=2[a]' -ac 2:
All right, this one was a bit of a doozy to figure out. In ffmpeg’s special filter notation, 1:a means “stream #1, audio” (where stream #0 is the first one).

First we take the mic input [1:a][1:a] and convert it from a mono channel to stereo, by just duplicating the sound to both ears (amerge=inputs=2[stereo1]). Then, we combine the stereo mic and the stereo computer audio ([2:a][stereo1]) into a single stereo stream using the default mixer (amerge=inputs=2[a]).

-map '[a]' -map 0:v :
By default, ffmpeg just keeps all the streams around, so we now have one mono, and two stereo streams, and it won’t default to picking the last one. So we manually tell it to keep that last audio stream we made (-map '[a]'), and the video stream from the first input (-map 0:v, the only video stream).

ffmpeg output streams

-f flv -ac 2 -ar 48000:
We want the output format to be Flash video (-f flv) with stereo audio (-ac 2) at 48000Hz (-ar 48000). Why do we want that? Because we’re streaming to Twitch and that’s what Twitch says they want–that’s basically why everything in the output format.

-vcodec libx264 -g 60 -keyint_min 30 -b:v3000k -minrate 3000k -maxrate 3000k -pix_fmt yuv420p -s 1280x720 -preset ultrafast -tune film:
Ah, the magic. Now we do x264 encoding (-vcodec libx264), a modern wonder. A lot of the options here are just what Twitch requests. They want keyframes every 2 seconds (-g 60 -keyint_min 30, where 60=30*2=FPS*2, 30=FPS). They want a constant bitrate (-b:v3000k -minrate 3000k -maxrate 3000k) between 1K-6K/s at the time of writing–I picked 3K because it’s appropriate for 720p video, but you could go with 6K for 1080p. Here are Twitch’s recommendations. The pixel format is standard (-pix_gmt yub720p) and we still don’t want to change the resolution (-s 1280x720). Finally the options you might want to change. You want to set the preset as high as it will go with your computer keeping up–mine sucks (-preset ultrafast, where the options go ultrafast,superfast,veryfast,faster,fast,medium, with a 2-10X jump in CPU power needed for each step). And I’m broadcasting minecraft, which in terms of encoders is close to film (-tune film)–lots of panning, relatively complicated stuff on screen. If you want to re-encode cartoons you want something else.

-c:a libfdk_aac -b:a 160k:
We use AAC (-c:a libfdk_aac). Note that libfdk is many times faster than the default implementation, but it’s not available by default in debian’s ffmpeg for (dumb) license reasons. We use 160k bitrate (-b:a 160k ) audio since I’ve found that’s good, and 96K-160K is Twitch’s allowable range. -strict normal

-strict normal: Just an ffmpeg option. Not interesting.
-bufsize 3000k: One second of buffer with CBR video

rtmp://live-sjc.twitch.tv/app/${TWITCH_KEY}:
The twitch streaming URL. Replace ${TWITCH_KEY} with your actual key, of course.

Sources:

  • jrayhawk on IRC (alsa)
  • ffmpeg wiki and docs (pretty good)
  • ALSA docs (not that good)
  • Twitch documentation, which is pretty good once you can find it
  • mark hills on how to set up snd-aloop

Storage Prices 2020-01

I did a survey of the cost of buying hard drives (of all sorts), CDs, DVDs, Blue-rays, and tape media (for tape drives).

Here are the 2020-01 results: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Changes this year

  • I excluded Seagate drives (except where they’re the only drives in class)
  • Amazon’s search got much worse, and they started having listings for refurbished drives
  • Corrected paper archival density, added photographic film
  • Added SSDs (both 2.5″ and M.2 formats)
  • Prices did not go up or down significantly in the last 6 months.

Some conclusions that are useful to know

  • The cheapest option is tape media, but tape reader/writers for LTO 6, 7, and 8 are very expensive.
  • The second-cheapest option is to buy external hard drives, and then open the cases and take out the hard drives. This gives you reliable drives with no warrantee.
  • Blu-ray and DVD are more expensive than buying hard drives