Migrating an existing debian installation to encrypted root

In this article, I migrate an existing debian 10 buster release, from an unencrypted root drive, to an encrypted root. I used a second hard drive because it’s safer–this is NOT an in-place migration guide. We will be encrypting / (root) only, not /boot. My computer uses UEFI. This guide is specific to debian–I happen to know these steps would be different on Arch Linux, for example. They probably work great on a different debian version, and might even work on something debian-based like Ubuntu.

In part 2, I add an optional extra where root decrypts using a special USB stick rather than a keyboard passphrase, for unattended boot.

Apologies if I forget any steps–I wrote this after I did the migration, and not during, so it’s not copy-paste.

Q: Why aren’t we encrypting /boot too?

  1. Encrypting /boot doesn’t add much security. Anyone can guess what’s on my /boot–it’s the same as on everyone debian distro. And encrypting /boot doesn’t prevent tampering–someone can easily replace my encrypted partition by an unencrypted one without my noticing. Something like Secure Boot would resist tampering, but still doesn’t require an encrypted /boot.
  2. I pull a special trick in part 2. Grub2’s has new built-in encryption support, which is what would allow encrypting /boot. But grub2 can’t handle keyfiles or keyscripts as of writing, which I use.

How boot works

For anyone that doesn’t know, here’s how a typical boot process works:

  1. Your computer has built-in firmware, which on my computer meets a standard called UEFI. On older computers this is called BIOS. The firmware is built-in, closed-source, and often specific to your computer. You can replace it with something open-source if you wish.
  2. The firmware has some settings for what order to boot hard disks, CD drives, and USB sticks in. The firmware tries each option in turn, failing and using the next if needed.
  3. At the beginning of each hard disk is a partition table, a VERY short info section containing information about what partitions are on the disk, and where they are. There are two partition table types: MBR (older) and GPT (newer). UEFI can only read GPT partition tables. The first thing the firmware does for each boot disk is read the partition table, to figure out which partitions are there.
  4. For UEFI, the firmware looks for an “EFI” partition on the boot disk–a special partition which contains bootloader executables. EFI always has a FAT filesystem on it. The firmware runs an EFI executable from the partition–which one is configured in the UEFI settings. In my setup there’s only one executable–the grub2 bootloader–so it runs that without special configuration.
  5. Grub2 starts. The first thing Grub2 does is… read the partition table(s) again. It finds the /boot partition, which contains grub.cfg, and reads grub.cfg. (There is a file in the efi partition right next to the executable, which tells grub where and how to find /boot/grub.cfg. This second file is confusingly also called grub.cfg, so let’s forget it exists, we don’t care about it).
  6. Grub2 invokes the Linux Kernel specified in grub.cfg, with the options specified in grub.cfg, including the an option to use a particular initramfs. Both the Linux kernel and the initramfs are also in /boot.
  7. Now the kernel starts, using the initramfs. initramfs is a tiny, compressed, read-only filesystem only used in the bootloading process. The initramfs’s only job is to find the real root filesystem and open it. grub2 is pretty smart/big, which means initramfs may not have anything left to do on your system before you added encryption. If you’re doing decryption, it happens here. This is also how Linux handles weird filesystems (ZFS, btrfs, squashfs), some network filesystems, or hardware the bootloader doesn’t know about. At the end of the process, we now have switched over to the REAL root filesystem.
  8. The kernel starts. We are now big boys who can take care of ourselves, and the bootloading process is over. The kernel always runs /bin/init from the filesystem, which on my system is a symlink to systemd. This does all the usual startup stuff (start any SSH server, print a bunch of messages to the console, show a graphical login, etc).

Setting up the encrypted disk

First off, I used TWO hard drives–this is not an in-place migration, and that way nothing is broken if you mess up. One disk was in my root, and stayed there the whole time. The other I connected via USB.

Here’s the output of gdisk -l on my original disk:

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1050623   512.0 MiB   EF00  # EFI, mounted at /boot/efi
   2         1050624       354803711   168.7 GiB   8300  # ext4, mounted at /
   3       354803712       488396799   63.7 GiB    8200  # swap

Here will be the final output of gdisk -l on the new disk:

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          526335   256.0 MiB   EF00  efi # EFI, mounted at /boot/efi
   2         1050624       135268351   64.0 GiB    8200  swap # swap
   3       135268352       937703054   382.6 GiB   8300  root_cipher # ext4-on-LUKS. ext4 mounted at /
   4          526336         1050623   256.0 MiB   8300  boot # ext4, mounted at /boot
  1. Stop anything else running. We’re going to do a “live” copy from the running system, so at least stop doing anything else. Also most of the commands in this guide need root (sudo).
  2. Format the new disk. I used gdisk and you must select a gpt partition table. Basically I just made everything match the original. The one change I need is to add a /boot partition, so grub2 will be able to do the second stage. I also added partition labels with the c gdisk command to all partitions: boot, root_cipher, efi, and swap. I decided I’d like to be able to migrate to a larger disk later without updating a bunch of GUIDs, and filesystem or partition labels are a good method.
  3. Add encryption. I like filesystem-on-LUKS, but most other debian guides use filesystem-in-LVM-on-LUKS. You’ll enter your new disk password twice–once to make an encrypted partition, once to open the partition.

    cryptsetup luksFormat /dev/disk/by-partlabel/root_cipher
    cryptsetup open /dev/disk-by-partlabel/root_cipher root
    
  4. Make the filesystems. For my setup:

    mkfs.ext4 /dev/disk/by-partlabel/root
    mkfs.ext4 /dev/disk/by-partlabel/boot  
    mkfs.vfat /dev/disk/by-partlabel/efi
    
  5. Mount all the new filesystems at /mnt. Make sure everything (cryptsetup included) uses EXACTLY the same mount paths (ex /dev/disk/by-partlabel/boot instead of /dev/sda1) as your final system will, because debian will examine your mounts to generate boot config files.

    mount /dev/disk/by-partlabel/root /mnt
    mkdir /mnt/boot && mount /dev/disk/by-partlabel/boot /mnt/boot 
    mkdir /mnt/boot/efi && mount /dev/disk/by-partlabel/efi /mnt/boot/efi
    mkdir /mnt/dev && mount --bind /dev /mnt/dev # for chroot
    mkdir /mnt/sys && mount --bind /sys /mnt/sys
    mkdir /mnt/proc && mount --bind /dev /mnt/proc
    
  6. Copy everything over. I used rsync -axAX, but you can also use cp -ax. To learn what all these options are, read the man page. Make sure to keep the trailing slashes in the folder paths for rsync.

    rsync -xavHAX / /mnt/ --no-i-r --info=progress2
    rsync -xavHAX /boot/ /mnt/boot/
    rsync -xavHAX /boot/efi/ /mnt/boot/efi/
    
  7. Chroot in. You will now be “in” the new system using your existing kernel.
    chroot /mnt

  8. Edit /etc/crypttab. Add:
    root PARTLABEL=root_cipher none luks
  9. Edit /etc/fstab. Mine looks like this:

    /dev/mapper/root / ext4 errors=remount-ro 0 1
    PARTLABEL=boot /boot ext4 defaults,nofail 0 1
    PARTLABEL=efi /boot/efi vfat umask=0077,nofail
    PARTLABEL=swap none swap sw,nofail 0 0
    tmpfs /tmp tmpfs mode=1777,nosuid,nodev 0 0
    
  10. Edit /etc/default/grub. On debian you don’t need to edit GRUB_CMDLINE_LINUX.

    GRUB_DISABLE_LINUX_UUID=true
    GRUB_ENABLE_LINUX_PARTLABEL=true
    
  11. Run grub-install. This will install the bootloader to efi. I forget the options to run it with… sorry!

  12. Run update-grub (with no options). This will update /boot/grub.cfg so it knows how to find your new drive. You can verify the file by hand if you know how.
  13. Run update-initramfs (with no options). This will update the initramfs so it can decrypt your root drive.
  14. If there were any warnings or errors printed in the last three steps, something is wrong. Figure out what–it won’t boot otherwise. Especially make sure your /etc/fstab and /etc/crypttab exactly match what you’ve already used to mount filesystems.
  15. Exit the chroot. Make sure any changes are synced to disk (you can unmount everything under /mnt in reverse order to make sure if you want)
  16. Shut down your computer. Remove your root disk and boot from the new one. It should work now, asking for your password during boot.
  17. Once you boot successfully and verify everything mounted, you can remove the nofail from /etc/fstab if you want.
  18. (In my case, I also set up the swap partition after successful boot.) Edit: Oh, also don’t use unencrypted swap with encrypted root. That was dumb.
Tagged , ,

Making a hardware random number generator

If you want a really good source of random numbers, you should get a hardware generator. But there’s not a lot of great options out there, and most people looking into this get (understandably) paranoid about backdoors. But, there’s a nice trick: if you combine multiple random sources together with xor, it doesn’t matter if one is backdoored, as long as they aren’t all backdoored. There are some exceptions–if the backdoor is actively looking at the output, it can still break your system. But as long as you’re just generating some random pads, instead of making a kernel entropy pool, you’re fine with this trick.

So! We just need a bunch of sources of randomness. Here’s the options I’ve tried:

  • /dev/urandom (40,000KB/s) – this is nearly a pseudo-random number generator, so it’s not that good. But it’s good to throw in just in case. [Learn about /dev/random vs /dev/urandom if you haven’t. Then unlearn it again.]
  • random-stream (1,000 KB/s), an implementation of the merenne twister pseudo-random-number generator. A worse version of /dev/urandom, use that unless you don’t trust the Linux kernel for some reason.
  • infnoise (20-23 KB/s), a USB hardware random number generator. Optionally whitens using keccak. Mine is unfortunately broken (probably?) and outputs “USB read error” after a while
  • OneRNG (55 KiB/s), a USB hardware random number generator. I use a custom script which outputs raw data instead of the provided scripts (although they look totally innocuous, do recommend
  • /dev/hwrng (123 KB/s), which accesses the hardware random number generator built into the raspberry pi. this device is provided by the raspbian package rng-tools. I learned about this option here
  • rdrand-gen (5,800 KB/s), a command-line tool to output random numbers from the Intel hardware generator instruction, RDRAND.

At the end, you can use my xor program to combine the streams/files. Make sure to use limit the output size if using files–by default it does not stop outputting data until EVERY file ends. The speed of the combined stream is at most going to be the slowest component (plus a little slowdown to xor everything). Here’s my final command line:

#!/bin/bash
# Fill up the folder with 1 GB one-time pads. Requires 'rng-tools' and a raspberry pi. Run as sudo to access /dev/hwrng.
while true; do
  sh onerng.sh | dd bs=1K count=1000000 of=tmp-onerng.pad 2>/dev/null
  infnoise --raw | dd bs=1K count=1000000 of=tmp-infnoise.pad 2>/dev/null
  xor tmp-onerng.pad tmp-infnoise.pad /dev/urandom /dev/hwrng | dd bs=1K count=1000000 of=/home/pi/pads/1GB-`\date +%Y-%m-%d-%H%M%S`.pad 2>/dev/null;
done

Great, now you have a good one-time-pad and can join ok-mixnet 🙂

P.S. If you really know what you’re doing and like shooting yourself in the foot, you could try combining and whitening entropy sources with a randomness sponge like keccak instead.

Tagged , ,

Crawling Etiquette

I participate in a mentoring program, and recently one of the people I mentor asked me about whether it was okay to crawl something. I thought I would share my response, which is posted below nearly verbatim.

For this article, I’m skipping the subject of how to scrape websites (as off-topic), or how to avoid bans.

People keep telling me that if I scrape pages like Amazon that I’ll get banned. I definitely don’t want this to happen! So, what is your opinion on this?

Generally bans are temporary (a day to two weeks). I’d advise getting used to it, if you want to do serious scraping! If it would be really inconvenient, either don’t scrape the site or learn to use a secondary IP, so when your scraper gets banned, you can still use the site as a user.

More importantly than getting banned, you should learn about why things like bans are in place, because they’re not easy to set up–someone decided it was a good idea. Try to be a good person. As a programmer, you can cause a computer to blindly access a website millions of times–you get a big multiplier on anything a normal person can do. As such, you can cause the owners and users of a site problems, even by accident. Learn scraping etiquette, and always remember there’s an actual computer sitting somewhere, and actual people running the site.

That said, there’s a big difference between sending a lot of traffic to a site that hosts local chili cookoff results, and amazon.com. You could cause make the chili cookoff site hard to access or run up a small bill for the owners if you screw up enough, while realistically there’s nothing you can do to slow down Amazon.com even if you tried.

Here are a couple reasons people want to ban automated scraping:

  1. It costs them money (bandwidth). Or, it makes the site unusable because too many “people” (all you) are trying to access it at once (congestion). Usually, it costs them money because the scaper is stupid–it’s something like a badly written search engine, which opens up every comment in a blog as a separate page, or opens up an infinite series of pages. For example, I host a bunch of large binaries (linux installers–big!), and I’ve had a search engine try to download every single one, once an hour. As a scraper, you can can avoid causing these problems by
    • rate-limiting your bot (ex. only scraping one page every 5-10 seconds, so you don’t overload their server). This is a good safety net–no matter what you do, you can’t break things too badly. If you’re downloading big files, you can also rate-limit your bandwidth or limit your total bandwidth quota.
    • examining what your scraper is doing as it runs (so you don’t download a bunch of unncessessary garbage, like computer-generated pages or a nearly-identical page for every blog comment)
    • obeying robots.txt, which you can probably get a scraping framework to do for you. you can choose to ignore robots.txt if you think you have a good reason to, but make sure you understand why robots.txt exists before you decide.
    • testing the site while you’re scraping by hand or with a computerized timer. If you see the site do something like load slower (even a little) because of what you’re doing, stop your scraper, and adjust your rate limit to be 10X smaller.
    • make your scraper smart. download only the pages you need. if you frequently stop and restart the scraper, have it remember the pages you downloaded–use some form of local cache to avoid re-downloading things. if you need to re-crawl (for example to maintain a mirror) pass if-modified-since HTTP headers.
    • declare an HTTP user-agent, which explains what you’re doing and how to contact you (email or phone) in case there is a problem. i’ve never had anyone actually contact me but as a site admin I have looked at user agents.
    • probably some more stuff i can’t think of off the top of my head
  2. They want to keep their information secret and proprietary, because having their information publicly available would lose them money. This is the main reason Amazon will ban you–they don’t want their product databases published. My personal ethics says I generally ignore this consideration, but you may decide differently
  3. They have a problem with automated bots posting spam or making accounts. Since you’re not doing either, this doesn’t really apply to you, but your program may be caught by the same filters trying to keep non-humans out.

For now I would advise not yet doing any of the above, because you’re basically not doing serious scraping yet. Grabbing all the pages on xkcd.com is fine, and won’t hurt anyone. If you’re going to download more than (say) 10,000 URLs per run, start looking at the list above. One exception–DO look at what your bot does by hand (the list of URLs, and maybe the HTML results), because it will be educational.

Also, in my web crawler project I eventually want to grab the text on each page crawled and analysis it using the requests library. Is something like this prohibited?

Prohibited by whom? Is it against an agreement you signed without reading with Amazon? Is it against US law? Would Amazon rather you didn’t, while having no actual means to stop you? These are questions you’ll have to figure out for yourself, and how much you care about each answer. You’ll also find the more you look into it that none of the three have very satisfactory answers.

The answer of “what bad thing might happen if I do this” is perhaps less satisfying if you’re trying to uphold what you perceive as your responsibilities, but easier to answer.

These are the things that may happen if you annoy a person or company on the internet by scraping their site. What happens will depend both on what you do, and what entity you are annoying (more on the second). Editor’s note: Some of the below is USA-specific, especially the presence/absence of legal or government action.

  • You may be shown CAPTCHAs to see if you are a human
  • Your scaper’s IP or IP block may be banned
  • You or your scraper may be blocked in some what you don’t understand
  • Your account may be deleted or banned (if your scraper uses an account, and rarely even if not)
  • They may yell at you, send you an angry email, or send you a polite email asking you to stop and/or informing you that you’re banned and who to contact if you’d like to change that
  • You may be sent a letter telling you to stop by a lawyer (a cease-and-desist letter), often with a threat of legal action if you do not
  • You may be sued. This could be either a legitimate attempt to sue you, or a sort of extra-intimidating cease-and-desist letter. The attempt could be successful, unsuccessful but need you to show up in court, or could be something you can ignore althogether.
  • You may be charged with some criminal charge such as computer, wire, or mail fraud. The only case I’m aware of offhand is Aaron Swartz
  • You may be brought up on some charge by the FBI, which will result in your computers being taken away and not returned, and possibly jailtime. This one will only happen if you are crawling a government site (and is not supposed to happen ever, but that’s the world we live in).

For what it’s worth, so far I have gotten up to the “polite email” section in my personal life. I do a reasonable amount of scraping, mostly of smaller sites.

[… section specific to Amazon cut …]

Craigslist, government sites, and traditional publishers (print, audio, and academic databases) are the only companies I know of that aggressively goes after scrapers through legal means, instead of technical means. Craigslist will send you a letter telling you to stop first.

What a company will do once you publicly post all the information on their site is another matter, and I have less advice there. There are several sites that offer information about historical Amazon prices, for what that’s worth.

You may find this article interesting (but unhelpful) if you are concerned about being sued. Jason Scott is one of the main technical people at the Internet Archive, and people sometimes object to things he posts online.

In my personal opinion, suing people or bringing criminal charges does not work in general, because most people scraping do not live in the USA, and may use technical means to disguise who they are. Scrapers may be impossible to sue or charge with anything. In short, a policy of trying to sue people who scape your site, will result in your site still being scraped. Also, most people running a site don’t have the resources to sue anyone in any case. So you shouldn’t expect this to be a common outcome, but basically a small percentage of people (mostly crackpots) and companies (RIAA and publishers) may.

Tagged , ,

qr-backup

I made a new project called qr-backup. It’s a command-line program to back up any file to physical paper, using a number of QR codes. You can then restore it, even WITHOUT the qr-backup program, using the provided instructions.

I’m fairly satisfied with its current state (can actually back up my files, makes a PDF). There’s definitely some future features I’m looking forward to adding, though.

Tagged , , ,

OK-Mixnet

I made a new cryptosystem called OK-Mixnet. It has “perfect” security, as opposed to the usual pretty-good security. (Of course, it’s not magic–if your computer is hacked, the cryptosystem isn’t gonna protect your data). Despite the name, it’s not really a mixnet per se, it just similarly defends against SIGINT.

A writeup is here: https://za3k.com/ok-mixnet.md

The alpha codebase is here: https://github.com/za3k/ok-mixnet

Let me know if you’d like to join the open alpha. Email me your username and IP (you’ll need to forward a port).

Tagged , ,

fabric1 AUR package

Fabric is a system administration tool used to run commands on remote machines over SSH. You program it using python. In 2018, Fabric 2 came out. In a lot of ways it’s better, but it’s incompatible, and removes some features I really need. I talked to the Fabric dev (bitprophet) and he seemed on board with keeping a Fabric 1 package around (and maybe renaming the current package to Fabric 2).

Here’s an arch package: https://aur.archlinux.org/packages/fabric1/

Currently Fabric 1 runs only on Python2. But there was a project to port it to Python 3 (confusingly named fabric3), which is currently attempting to merge into mainline fabric. Once that’s done, I’m hoping to see a ‘fabric1’ and ‘fabric2’ package in all the main distros.

Tagged , ,

mon(8)

I had previously hand-rolled a status monitor, status.za3k.com, which I am in the process of replacing (new version). I am replacing it with a linux monitoring daemon, mon, which I recommend. It is targeted at working system administrators. ‘mon’ adds many features over my own system, but still has a very bare-bones feeling.

The old service, ‘simple-status‘ worked as follows:

  • You visited the URL. Then, the status page would (live) kick of about 30 parallel jobs, to check the status of 30 services
  • The list of services is one-per-file in a the services.d directory.
  • For each service, it ran a short script, with no command line arguments.
  • All output is displayed in a simple html table, with the name of the service, the status (with color coding), and a short output line.
  • The script could return with a success (0) or non-success status code. If it returned success, that status line would display in green for success. If it failed, the line would be highlighted red for failure.
  • Scripts can be anything, but I wrote several utility functions to be called from scripts. For example, “ping?” checks whether a host is pingable.
  • Each script was wrapped in timeout. If the script took too long to run, it would be highlighted yellow.
  • The reason all scripts ran every time, is to prevent a failure mode where the information could ever be stale without me noticing.

Mon works as follows

  • The list of 30 services is defined in /etc/mon/con.cf.
  • For each service, it runs a single-line command (monitor) with arguments. The hostname(s) are added to the command line automatically.
  • All output can be displayed in a simple html table, with the name of the service, the status (with color coding), the time of last and next run, and a short output line. Or, I use ‘monshow‘, which is similar but in a text format.
  • Monitors can be anything, but several useful ones are provided in /usr/lib/mon/mon.d (on debian). For example the monitor “ping” checks whether a host is pingable.
  • The script could return with a success (0) or non-success status code. If it returned success, the status line would display in green for success (on the web interface), or red for failure.
  • All scripts run periodically. A script have many states, not just “success” or “failure”. For example “untested” (not yet run) or “dependency failing” (I assume, not yet seen).

As you can see, the two have a very similar approach to the component scripts, which is natural in the Linux world. Here is a comparison.

  • ‘simple-status’ does exactly one thing. ‘mon’ has many features, but does the minimum possible to provide each.
  • ‘simple-status’ is stateless. ‘mon’ has state.
  • ‘simple-status’ runs on demand. ‘mon’ is a daemon which runs monitors periodically.
  • Input is different. ‘simple-status’ is one script which takes a timeout. ‘mon’ listens for trap signals and talks to clients who want to know its state.
  • both can show an HTML status page that looks about the same, with some CGI parameters accepted.
  • ‘mon’ can also show a text status page.
  • both run monitors which return success based on status code, and provide extra information as standard output. ‘mon’ scripts are expected to be able to run on a list of hosts, rather than just one.
  • ‘mon’ has a config file. ‘simple-status’ has no options.
  • ‘simple-status’ is simple (27 lines). ‘mon’ has longer code (4922 lines)
  • ‘simple-status’ is written in bash, and does not expose this. ‘mon’ is written in perl, all the monitors are written in perl, and it allows inline perl in the config file
  • ‘simple-status’ limits the execution time of monitors. ‘mon’ does not.
  • ‘mon’ allows alerting, which call an arbitrary program to deliver the alert (email is common)
  • ‘mon’ supports traps, which are active alerts
  • ‘mon’ supports watchdog/heartbeat style alerts, where if a trap is not regularly received, it marks a service as failed.
  • ‘mon’ supports dependencies
  • ‘mon’ allows defining a service for several hosts at once

Overall I think that ‘mon’ is much more complex, but only to add features, and it doesn’t have a lot of features I wouldn’t use. It still is pretty simple with a simple interface. I recommend it as both good, and overall better than my system.

My only complaint is that it’s basically impossible to Google, which is why I’m writing a recommendation for it here.

Tagged , , ,

Cron email, and sending email to only one address

So you want to know when your monitoring system fails, or your cron jobs don’t run? Add this to your crontab:

MAILTO=me@me.com

Now install a mail-sending agent. I like ‘nullmailer‘, which is much smaller than most mail-sending agents. It can’t receive or forward mail, only send it, which is what I like about it. No chance of a spammer using my server for something nasty.

The way I have it set up, I’ll have a server (avalanche) sending all email from one address (nullmailer@avalanche.za3k.com) to one email (admin@za3k.com), and that’s it. Here’s my setup on debian:

sudo apt-get install nullmailer
echo "admin@za3k.com" | sudo tee /etc/nullmailer/adminaddr # all mail is sent to here, except for certain patterns
echo "nullmailer@`hostname`.za3k.com" | sudo tee /etc/nullmailer/allmailfrom # all mail is sent from here
echo "`hostname`.za3k.com" | sudo tee /etc/nullmailer/defaultdomain # superceded by 'allmailfrom' and not used
echo "`hostname`.za3k.com" | sudo tee /etc/nullmailer/helohost # required to connect to my server. otherwise default to 'me'
echo "smtp.za3k.com smtp --port=587 --starttls" | sudo tee /etc/nullmailer/remotes && sudo chmod 600 /etc/nullmailer/remotes

Now just run echo "Subject: sendmail test" | /usr/lib/sendmail -v admin@za3k.com to test and you’re done!

Tagged , , ,

Postmortem: bs-store

Between 2020-03-14 and 2020-12-03 I ran an experimental computer storage setup. I movied or copied 90% of my files into a content-addressable storage system. I’m doing a writeup of why I did it, how I did it, and why I stopped. My hope is that it will be useful to anyone considering using a similar system.

The assumption behind this setup, is that 99% of my files never change, so it’s fine to store only one, static copy of them. (Think movies, photos… they’re most of your computer space, and you’re never going to modify them). There are files you change, I just didn’t put them into this system. If you run a database, this ain’t for you.

Because I have quite a lot of files and 42 drives (7 in my computer, ~35 in a huge media server chassis), there is a problem of how to organize files across drives. To explain why it’s a problem, let’s look at the two default approaches:

  • One Block Device / RAID 0. Use some form of system that unifies block devices, such as RAID0 or a ZFS’s striped vdevs. Writing files is very easy, you see a single 3000GB drive.
    • Many forms of RAID0 use striping. Striping splits each file across all available drives. 42 drives could spin up to read one file (wasteful).
    • You need all the drives mounted to read anything–I have ~40 drives, and I’d like a solution that works if I move and can’t keep my giant media server running. Also, it’s just more reassuring that nothing can fail if you can read each drive individually.
  • JBOD / Just a Bunch of Disks. Label each drive with a category (ex. ‘movies’), and mount them individually.
    • It’s hard to aim for 100% (or even >80%) drive use. Say you have 4x 1000GB drives, and you have 800GB movies, 800GB home video footage, 100GB photographs, and 300GB datasets. How do you arrange that? One drive per dataset is pretty wasteful, as everything fits on 3. But, with three drives, you’ll need to split at least one dataset across drives. Say you put together 800GB home video and 100GB photographs. If you get 200GB more photographs, do you split a 300 GB collection across drives, or move the entire thing to another drive? It’s a lot of manual management and shifting things around for little reason.

Neither approach adds any redundancy, and 42 drives is a bit too many to deal with for most things. Step 1 is to split the 42 drives into 7 ZFS vdevs, each with 2-drive redundancy. That way, if a drive fails or there is a small data corruption (likely), everything will keep working. So now we only have to think about accessing 7 drives (but keep in mind, many physical drives will spin up for each disk access).

The ideal solution:

  • Will not involve a lot of manual management
  • Will fill up each drive in turn to 100%, rather than all drives at an equal %.
  • Will deduplicate identical content (this is a “nice to have”)
  • Will only involve accessing one drive to access one file
  • Will allow me to get and remove drives, ideally across heterogenous systems.

I decided a content-addressable system was ideal for me. That is, you’d look up each file by its hash. I don’t like having an extra step to access files, so files would be accessed by symlink–no frontend program. Also, it was important to me that I be able to transparently swap out the set of drives backing this. I wanted to make the content-addressable system basically a set of 7 content-addressable systems, and somehow wrap those all into one big content-addressable system with the same interface. Here’s what I settled on:

  • (My drives are mounted as /zpool/bs0, /zpool/bs1, … /zpool/bs6)
  • Files will be stored in each pool in turn by hash. So my movie ‘cat.mpg’ with sha hash ‘8323f58d8b92e6fdf190c56594ed767df90f1b6d’ gets stored in /zpool/bs0/83/23/f58d8b9 [shortened for readability]
  • Initially, we just copy files into the content-addressable system, we don’t delete the original. I’m cautious, and I wanted to make everything worked before getting rid of the originals.
  • To access a file, I used read-only unionfs-fuse for this. This checks each of /zpool/bs{0..6}/<hash> in turn. So in the final version, /data/movies/cat.mpg would be a symlink to ‘/bs-union/83/23/f58d8b9’
  • We store some extra metadata on the original file (if not replaced by a symlink) and the storied copy–what collection it’s part of, when it was added, how big it is, what it’s hash is, etc. I chose to use xattrs.

The plan here is that it would be really easy to swap out one backing blockstore of 30GB, for two of 20GB–just copy the files to the new drives and add it to the unionfs.

Here’s what went well:

  • No problems during development–only copying files meant it was easy and safe to debug prototypes.
  • Everything was trivial to access (except see note about mounting disks below)
  • It was easy to add things to the system
  • Holding off on deleting the original content until I was 100% out of room on my room disks, meant it was easy to migrate off of, rather risk-free
  • Running the entire thing on top of zfs ZRAID2 was the right decision, I had no worries about failing drives or data corruption, despite a lot of hardware issues developing at one point.
  • My assumption that files would never change was correct. I made the unionfs filesystem read-only as a guard against error, but it was never a problem.
  • Migrating off the system went smoothly

Here are the implementation problems I found

  • I wrote the entire thing as bash scripts operating directly on files, which was OK for access and putting stuff in the store, but just awful for trying to get an overview of data or migrating things. I definitely should have used a database. I maybe should have used a programming language.
  • Because there was no database, there wasn’t really any kind of regular check for orphans (content in the blobstore with no symlinks to it), and other similar checks.
  • unionfs-fuse suuucks. Every union filesystem I’ve tried sucks. Its read bandwidth is much lower than the component devices (unclear, probably), it doesn’t cache where to look things up, and it has zero xattrs support (can’t read xattrs from the underlying filesystem).
  • gotcha: zfs xattrs waste a lot of space by default, you need to reconfigure the default.

But the biggest problem was disk access patterns:

  • I thought I could cool 42 drives spinning, or at least a good portion of them. This was WRONG by far, and I am not sure how possible it is in a home setup. To give you an idea how bad this was, I had to write a monitor to shut off my computer if the drives went above 60C, and I was developing fevers in my bedroom (where the server is) from overheating. Not healthy.
  • unionfs has to check each backing drive. So we see 42 drives spin up. I have ideas on fixing this, but it doesn’t deal with the other problems
  • To fix this, you could use double-indirection.
    • Rather than pointing a symlink at a unionfs: /data/cat.mpg -> /bs-union/83/23/f58d8b9 (which accesses /zpool/bs0/83/23/f58d8b9)
    • Point a symlink at another symlink that points directly to the data: /data/cat.mpg -> /bs-indirect/83/23/f58d8b9 -> /zpool/bs0/83/23/f58d8b9
  • The idea is that backing stores are kinda “whatever, just shove it somewhere”. But, actually it would be good to have a collection in one place–not only to make it easy to copy, but to spin up only one drive when you go through everything in a collection. It might even be a good idea to have a separate drive for more frequently-accessed content. This wasn’t a huge deal for me since migrating existing content meant it coincidentally ended up pretty localized.
  • Because I couldn’t spin up all 42 drives, I had to keep a lot of the array unmounted, and mount the drives I needed into the unionfs manually.

So although I could have tried to fix things with double-indirection, I decided there were some other disadvantages to symlinks: estimating sizes, making offsite backups foolproof. I decided to migrate off the system entirely. The migration went well, although it required running all the drives at once, so some hardware errors popped up. I’m currently on a semi-JBOD system (still on top of the same 7 ZRAID2 devices).

Hopefully this is useful to someone planning a similar system someday. If you learned something useful, or there are existing systems I should have used, feel free to leave a comment.

Tagged , , ,

Printing on the Brother HL-2270DW printer using a Raspberry Pi

Although the below directions work on Raspberry Pi, they should also work on any other system. The brother-provided driver does not run on arm processors[1] like the raspberry pi, so we will instead use the open-source brlaser[2].

Edit: This setup should also work on the following Brother monochrome printers, just substitute the name where needed:

  • brlaser 4, just install from package manager: DCP-1510, DCP-1600 series, DCP-7030, DCP-7040, DCP-7055, DCP-7055W, DCP-7060D, DCP-7065DN, DCP-7080, DCP-L2500D series, HL-1110 series, HL-1200 series, HL-L2300D series, HL-L2320D series, HL-L2340D series, HL-L2360D series, HL-L2375DW series, HL-L2390DW, MFC-1910W, MFC-7240, MFC-7360N, MFC-7365DN, MFC-7420, MFC-7460DN, MFC-7840W, MFC-L2710DW series
  • brlaser 6, follow full steps below: DCP-L2520D series, DCP-L2520DW series, DCP-L2540DW series (unclear, may only need 4), HL-2030 series, HL-2140 series, HL-2220 series, HL-2270DW series, HL-5030 series

Also, all these steps are command-line based, and you can do the whole setup headless (no monitor or keyboard) using SSH.

  1. Get the latest raspbian image up and running on your pi, with working networking. At the time of writing the latest version is 10 (buster)–once 11+ is released this will be much easier. I have written a convenience tool[3] for this step, but you can also find any number of standard guides. Log into your raspberry pi to run the following steps
  2. (Option 1, not recommended) Upgrade to Debian 11 bullseye (current testing release). This is because we need brlaser 6, not brlaser 4 from debian 10 buster (current stable release). Then, install the print system and driver[2]:
    sudo apt-get update && sudo apt-get install lpr cups ghostscript printer-driver-brlaser
  3. (Option 2, recommended) Install ‘brlaser’ from source.
    1. Install print system and build tools
      sudo apt-get update && sudo apt-get install lpr cups ghostscript git cmake libcups2-dev libcupsimage2-dev
    2. Download the source
      wget https://github.com/pdewacht/brlaser/archive/v6.tar.gz && tar xf v6.tar.gz
    3. Build the source and install
      cd brlaser-6 && cmake . && make && sudo make install
  4. Plug in the printer, verify that it shows up using sudo lsusb or sudo dmesg. (author’s shameful note: if you’re not looking, I find it surprisingly easy to plug USB B into the ethernet jack)
  5. Install the printer.
    1. Run sudo lpinfo -v | grep usb to get the device name of your printer. It will be something like usb://Brother/HL-2270DW%20series?serial=D4N207646
      If you’re following this in the hopes that it will work on another printer, run sudo lpinfo -m | grep HL-2270DW to get the PPD file for your printer.
    2. Install and enable the printer
      sudo lpadmin -p HL-2270DW -E -v usb://Brother/HL-2270DW%20series?serial=D4N207646 -m drv:///brlaser.drv/br2270dw.ppd
      Note, -p HL-2270DW is just the name I’m using for the printer, feel free to name the printer whatever you like.
    3. Enable the printer (did not work for me)
      sudo lpadmin -p HL-2270DW -E
    4. (Optional) Set the printer as the default destination
      sudo lpoptions -d HL-2270DW
    5. (Optional) Set any default options you want for the printer
      sudo lpoptions -p HL-2270DW -o media=letter
  6. Test the printer (I’m in the USA so we use ‘letter’ size paper, you can substitute whichever paper you have such as ‘a4’).
    1. echo "Hello World" | PRINTER=HL-2270DW lp -o media=letter (Make sure anything prints)
    2. cat <test document> | PRINTER=HL-2270DW lp -o media=letter (Print an actual test page to test alignment, etc)
    3. cat <test document> | PRINTER=HL-2270DW lp -o media=letter -o sides=two-sided-short-edge (Make sure duplex works if you plan to use that)
  7. (Optional) Set up an scp print server, so any file you copy to a /printme directory gets printed. For the 2270DW, I also have a /printme.duplex directory.

Links
[1] brother driver does not work on arm (also verified myself)
[2] brlaser, the open-source Brother printer driver
[3] rpi-setup, my convenience command-line script for headless raspberry pi setup
[4] stack overflow answer on how to install one package from testing in debian

Tagged , ,