Hack-A-Day: Hack-A-Blog

It’s november, and I’ve decided this month that I’m going to do 30 projects in 30 days. It’s an all-month hack-a-thon!

Today’s project is the Hack-A-Blog. (demo, source).

Check out the link above to try out the live demo. I’m proud of getting this one done in time. I think the next days will be easier, as I figured some things out already.

e-ink “laptop”

I’ve been prototyping an e-ink laptop.

a wooden box with a keyboard inside and an e-ink screen mounted to it
a closed wooden box with a keyboard visible through a hole in the front
Closed “laptop”

I’m not the first, there have been many other such devices before. I came up with the idea independently, but the specifics are heavily inspired by the Ultimate Writer by NinjaTrappeur in 2018. Similar to him, my use case is typing without distractions, and reading books. E-ink displays are quite slow to update, so I don’t think it can serve as a general purpose computer. Here’s a video of it in action. It operates at one frame per second.

The electronics are not fully done. They need better secured, and I’m going to redo the cabling and power back.

an e-ink screen reading "hello world"
I broke a screen over-tightening a nut. That said, I like this look pretty well! If the lid was thicker, I know how to avoid screws on the other side, too.
a e-ink screen loose on a desk, covered in garbage
Early screen progress. I got something to display, but not what I wanted.
a mechanical keyboard in a box
I found a really nice, cheap mechanical keyboard on ebay. The main downside is that it’s heavy–730g. It also consumes heavy amounts of power, even when not in use. I have a nearly identical keyboard that doesn’t, which I’ll use for v2.
a homemade battery pack with four red lithium-ion batteries
I made my own lithium-ion battery pack. It works well, but it doesn’t quite fit so I’m going to redo it with one less cell. It also needs an on/off switch and a right angle USB cable.
a close-up of a raspberry pi in a box
The prototype is powered by a Raspberry Pi 3. The final version will use a microcontroller to save power. The Pi Zero can also be swapped in with no changes, and uses a third of the power. But it’s noticeably slower and takes 30 seconds to boot. For prototyping I’m using the Pi 3 for now.

I’m not the best woodworker, but I’m slowly learning. Here are pictures of case and lid action.

back view of a box with hinges
Hinged lid. The screen is on the bottom of the lid.
back view of a wooden stop, closed
A wooden stop on each side
back view of a wooden stop, open
Wooden stop with lid open. It hits the bottom, bringing the lid/screen to a rest at vertical.
picture of a latch, open
Latches on the side
close-up of a hinge in cracked plywood
Don’t put hinges sideways into plywood. But if you do, drill big pilot holes. Out of six screw, one cracked a little.

On the software end, shout outs to:

  • the creator of the ultimate-writer software, NinjaTrappeur, who has been encouraging (and explained the right way to rewrite the stack, if you wanted to today).
  • Ben Krasnow, who made a video about how to hack partial refresh on an e-ink display.

There’s a few things I’d like to polish still–even as a prototype this isn’t fully done.

  • The raspberry pi and battery pack are currently sitting loose. They need secured, especially since they can fall out the open front.
  • The software has some major problems. It doesn’t support Control-C, etc in linux, a must, and it doesn’t update the screen at boot until you press a key, which would be nice to fix.
  • There’s no power switch. Right now you have to unplug it manually.
  • I’d like to add a carrying handle.
  • I’d like to tuck away the electronics behind a panel. They’re ugly.
  • The wood looks rough in a few places. I want to hide some splintered wood, screw holes, etc.
  • The USB cables have too much stress on them. I need to make a little more room in the wood, and use a right-angled connector in one place.

There’s also no default software, but that’s a feature. A prototype is for figuring out how I want the interface to work, and what I want it to do.

Parts list

  • 7.5 inch e-ink screen from Waveshare (not particularly good) – $60
  • Raspberry Pi 3 (Pi Zero, etc also work with no changes) – $35 (but unavailable)
  • microsd card – $7
  • Plywood and boards, wood glue – $15
  • Plexiglass (to cover screen) – $10
  • Bolts, washers, and nuts to secure it. – $5
  • Circular window latch x2 – $8 (or use $10 smaller version)
  • Hinge x2 – $2
  • Total: $142

Power budget (at 5V):

  • Keyboard: 500mW. Other USB keyboards use zero to within my measurement abilities.
  • Screen: 0-250mW when updating. Hard to measure.
  • Pi 3: 2000mW. I have the wifi chip enabled (the default) but I’m not actively connected to wifi.
  • Pi Zero W: 650mW

A real-life test showed 5-6 hour battery life. Theory says (13Wh/battery * 4 batteries / 2.7 watts)=20 hours battery life. I’m investigating the discrepancy. In theory, swapping for a Pi Zero W and a better keyboard would give 72-hour battery life.

qr-backup v1.1

qr-backup v1.1 is released. qr-backup is a command-line Linux program. You can use it to back up a file as a series of QR codes. You can restore the QR codes using a webcam or scanner, and get back the original file.

The main features of qr-backup are ease-of-use and futureproofing (restore does not require qr-backup).

Please report any bugs on github. Once this is stable, I will do the first pip/package manager release. To test the alpha, check out the latest code using git.

See also USAGE and extensive FAQ.

New features in v1.1:

  • Feature complete. New features are unlikely to be added. Future efforts will focus on quality, GUIs, and porting.
  • restore using qr-backup. Previously, the only restore was a bash one-liner (which still works).
    • qr-backup --restore restores using the webcam
    • qr-backup --restore IMAGE IMAGE IMAGE restores from scanned images
  • After generating a PDF backup, qr-backup automatically does a digital test of the restore process
  • Erasure coding. Lose up to 30% of QRs and restore will still work, as long as you are using qr-backup to restore
  • Increased code density, which about cancels out the erasure coding.
  • Back up directories and files. qr-backup makes a .tar file for you
  • Option to use password protection (encryption)
  • Option to print multiple copies of every QR code
  • Option to randomize order of QR codes
  • Optionally print extra cover sheet instructions on how to restore. For long-term archivists.
  • Option to add custom notes and labels to each page
  • Improved support for using qr-backup in a pipe
  • Various bugfixes
  • See CHANGELOG for complete details

P.S. As a special request, if anyone is on OS X, let me know if it works for you?

tty audit logs

I recently wrote a program that records all tty activity. That means bash sessions, ssh, raw tty access, screen and tmux sessions, the lot. I used script. The latest version of my software can be found on github.

Note that it’s been tested only with bash so far, and there’s no encryption built in.

To just record all shell commands typed, use the standard eternal history tricks (bash).

problem-log.txt

One of the more useful things I did was to start logging all my technical problems. Whenever I hit a problem, I write an entry in problem-log.txt. Here’s an example

2022-08-02
Q: Why isn't the printer working? [ SOLVED ]
A: sudo cupsenable HL-2270DW

// This isn't in the problem log, but the issue is that CUPS will silently disable the printer if it thinks there's an issue. This can happen if you pull a USB cord mid-print.

I write the date, the question, and the answer. Later, when I have a tough or annoying problem, I try to grep problem-log.txt. I’ll add a note if I solve a problem using the log, too.

This was an interesting project to look at 5 years later. I didn’t see benefits until 1-2 years later. It does not help me think through a problem. It’s hard to remember to do. But, over time it’s built up and become invaluable to me. I hit a tricky problem, and I can’t immediately find an answer on the web. I find out it’s in problem-log.txt. And, someone’s written it exactly with my hardware (and sometimes even my folder names) correctly in there. Cool!

Here’s another example:

2018-10-21
Q: How do I connect to the small yellow router?

Not every problem gets solved. Oh well.

Scan Organizer

I scan each and every piece of paper that passes through my hands. All my old to-do lists, bills people send me in the mail, the manual for my microwave, everything. I have a lot of scans.

scan-organizer is a tool I wrote to help me neatly organize and label everything, and make it searchable. It’s designed for going through a huge backlog by hand over the course of weeks, and then dumping a new set of raw scans in whenever afterwards. I have a specific processing pipeline discussed below. However if you have even a little programming skill, I’ve designed this to be modified to suit your own workflow.

Input and output

The input is some raw scans. They could be handwritten notes, printed computer documents, photos, or whatever.

A movie ticket stub

The final product is that for each file like ticket.jpg, we end up with ticket.txt. This has metadata about the file (tags, category, notes) and a transcription of any text in the image, to make it searchable with grep & co.

---
category: movie tickets
filename: seven psychopaths ticket.jpg
tags:
- cleaned
- categorized
- named
- hand_transcribe
- transcribed
- verified
---
Rialto Cinemas Elmwood
SEVEN PSYCHOPAT
R
Sun Oct 28 1
7:15 PM
Adult $10.50
00504-3102812185308

Rialto Cinemas Gift Cards
Perfect For Movie Lovers!

Here are some screenshots of the process. Apologizies if they’re a little big! I just took actual screenshots.

At any point I can exit the program, and all progress is saved. I have 6000 photos in the backlog–this isn’t going to be a one-session thing for me! Also, everything has keyboard shortcuts, which I prefer.

Phase 1: Rotating and Cropping

Phase 1: Rotating and Cropping

First, I clean up the images. Crop them, rotate them if they’re not facing the right way. I can rotate images with keyboard shortcuts, although there are also buttons at the bottom. Once I’m done, I press a button, and scan-organizer advanced to the next un-cleaned photo.

Phase 2: Sorting into folders

Phase 2: Sorting into folders

Next, I sort things into folders, or “categories”. As I browse folders, I can preview what’s already in that folder.

Phase 3: Renaming Images

Phase 3: Renaming images

Renaming images comes next. For convenience, I can browse existing images in the folder, to help name everything in a standard way.

Phase 4: Tagging images

Phase 4: Tagging images

I tag my images with the type of text. They might be handwritten. Or they might be printed computer documents. You can imagine extending the process with other types of tagging for your use case.

Not yet done: OCR

Printed documents are run through OCR. This isn’t actually done yet, but it will be easy to plug in. I will probably use tesseract.

Phase 5: Transcribing by hand

Phase 5a: Transcribing by Hand

I write up all my handwritten documents. I have not found any useful handwriting recognition software. I just do it all by hand.

The point of scan-organizer is to filter based on tags. So only images I’ve marked as needing hand transcription are shown in this phase.

Phase 6: Verification

 At the end of the whole process, I verify that each image looks good, and is correctly tagged and transcribed.

One Screenshot Per Minute

One of my archiving and backup contingencies is taking one screenshot per minute. You can also use this to get a good idea of how you spend your day, by turning it into a movie. Although with a tiling window manager like I use, it’s a headache to watch.

I send the screenshots over to another machine for storage, so they’re not cluttering my laptop. It uses up 10-20GB per year.

I’ll go over my exact setup below in case anyone is interested in doing the same:

/bin/screenlog

GPG_KEY=Zachary
TEMPLATE=/var/screenlog/%Y-%m-%d/%Y-%m-%d.%H:%M:%S.jpg
export DISPLAY=:0
export XAUTHORITY=/tmp/XAuthority

IMG=$(\date +$TEMPLATE)
mkdir -p $(dirname "$IMG")
scrot "$IMG"
gpg --encrypt -r "$GPG_KEY" "$IMG"
shred -zu "$IMG"

The script

  • Prints everything to stderr if you run it manually
  • Makes a per-day directory. We store everything in /var/screenlog/2022-07-10/ for the day
  • Takes a screenshot. By default, crontab doesn’t have X Windows (graphics) access. To allow it, the XAuthority file which allows access needs to be somewhere my crontab can reliably access. I picked /tmp/XAuthority. It doesn’t need any unusual permissions, but the default location has some random characters in it.
  • GPG-encrypts the screenshot with a public key and deletes the original. This is extra protection in case my backups somehow get shared, so I don’t literally leak all my habits, passwords, etc. I just use my standard key so I don’t lose it. It’s public-key crypto, so put the public key on your laptop. Put the private key on neither, one, or both, depending on which you want to be able to read the photos.

/etc/cron.d/screenlog

* * * * * zachary  /bin/screenlog
20  * * * * zachary  rsync --remove-source-files -r /var/screenlog/ backup-machine:/data/screenlog/laptop
30  * * * * zachary  rmdir /var/screenlog/*

That’s

  • Take a screenshot once every minute. Change the first * to */5 for every 5 minutes, and so on.
  • Copy over the gpg-encrypted screenshots hourly, deleting the local copy
  • Also hourly, delete empty per-day folders after the contents are copied, so they don’t clutter things

~/.profile

export XAUTHORITY=/tmp/XAuthority

I mentioned /bin/screenlog needs to know where XAuthority is. In Arch Linux this is all I need to do.

youtube-autodl

I just wrote the first pass at youtube-autodl, a tool for automatically downloading youtube videos. It’s inspired by Popcorn Time, a similar program I never ended up using, for automatically pirating the latest video from a TV series coming out.

You explain what you want to download, where you want to download it to, and how to name videoes. youtube-autodl takes care of the rest, including de-duplication and downloading things ones.

The easiest way to understand it is to take a look at the example config file, which is my actual config file.

Personally, I find youtube is pushing “watch this related” video and main-page feeds more and more, to the point where they actually succeed with me. I don’t want to accidentally waste time, so I wanted a way to avoid visiting youtube.com. This is my solution.

qr-backup

qr-backup is a program to back up digital documents to physical paper. Restore is done with a webcam, video camera, or scanner. Someday smart phone cameras will work.

I’ve been making some progress on qr-backup v1.1. So far I’ve added:

  • --restore, which does a one-step restore for you, instead of needing a bash one-line restore process
  • --encrypt provides password-based encryption
  • An automatic restore check that checks the generated PDF. This is mostly useful for me while maintaining qr-backup, but it also provides peace-of-mind to users.
  • --instructions to give more fine-tuned control over printing instructions. There’s a “plain english” explanation of how qr-backup works that you can attach to the backup.
  • --note for adding an arbitrary message to every sheet
  • Base-64 encoding is now per-QR code, each QR is self-contained.
  • Codes are labeled N01/50 instead of 01/50, to support more code types in the future.
  • Code cleanup of QR generation process.
  • Several bugfixes.

v1.1 will be released when I make qr-backup feature complete:

  • Erasure coding, so you only need 70% of the QRs to do a restore.
  • Improve webcam restore slightly.

v1.2 will focus on adding a GUI and support for Windows, Mac, and Android. Switching off zbar is a requirement to allow multi-platform support, and will likely improve storage density.

Testing scrapers faster

Recently I wrote a scraper. First, I downloaded all the HTML files. Next, I wanted to parse the content. However, real world data is pretty messy. I would run the scraper, and it would get partway though the file and fail. Then I would improve it, and it would get further and fail. I’d improve it more, and it would finish the whole file, but fail on the fifth one. Then I’d re-run things, and it would fail on file #52, #1035, and #553,956.

To make testing faster, I added a scaffold. Whenever my parser hit an error, it would print the filename (for me, the tester) and record the filename to an error log. Then, it would immediately exit. When I re-ran the parser, it would test all the files where it had hit a problem first. That way, I didn’t have to wait 20 minutes until it got to the failure case.

if __name__ == "__main__":
    if os.path.exists("failures.log"):
        # Quicker failures 
        with open("failures.log", "r") as f:
            failures = set([x.strip() for x in f])
        for path in tqdm.tqdm(failures, desc="re-checking known tricky files"):
            try:
                with open(path) as input:
                    parse_file(input)
            except Exception:
                print(path, "failed again (already failed once")
                raise

    paths = []
    for root, dirs, files in os.walk("html"):
        for file in sorted(files):
            path = os.path.join(root, file)
            paths.append(path)
    paths.sort()

    with open("output.json", "w") as out:
        for path in tqdm.tqdm(paths, desc="parse files"): # tqdm is just a progress bar. you can also use 'for path in paths:
            with open(input, "r") as input:
                try:
                    result = parse_file(input)
                except Exception:
                    print(path, "failed, adding to quick-fail test list")
                    with open("failures.log", "a") as fatal:
                        print(path, file=fatal)
                    raise
                json.dump(result, out, sort_keys=True) # my desired output is one JSON dict per line
                out.write("\n")