Paper archival

Previous work:

I wanted (for fun) to see if I could get data stored in paper formats. I’d read the previous work, and people put a lot of thought into density, but not a lot of thought into ease of retreival. First off, acid-free paper lasts 500 years or so, which is plenty long enough compared to any environmental stresses (moisture, etc) I expect on any paper I have.

Optar gets a density of 200kB / A4 page. By default, it requires a 600dpi printer, and a 600+dpi scanner. It has 3-of-12 bit redundancy using Golay codes, and spaces out the bits in an okay fashion.

Paperback gets a (theoretical) density of 500kB / A4 page. It needs a 600dpi printer, and a ~900dpi scanner.  It has configurable redundancy using Reed-Solomon codes. It looks completely unusable in practice (alignment issues, aside from being Windows-only).

Okay, so I think these are all stupid, because you need some custom software to decode them, which in any case where you’re decoding data stored on paper you probably don’t have that. I want to use standard barcodes, even if they’re going to be lower density. Let’s look at our options. I’m going to skip linear barcodes (low-density) and color barcodes (printing in color is expensive).  Since we need space between symbols, we want to pick the biggest versions of each code we can. For one, whitespace around codes is going to dominate actual code density for layout efficiency, and larger symbols are usually more dense. For another thing, we want to scan as few symbols as possible if we’re doing them one at a time.

Aztec From 15×15 to 151×151 square pixels. 1914 bytes maximum. Configurable Reed-Solomon error correction.

Density: 11.9 pixels per byte

Data Matrix From 10×10 to 144×144 square pixels. 1555 bytes maximum. Large, non-configurable error correction.

Density: 13.3 pixels per byte

QR Code From 21×21 to 177×177 square pixels. 2,953 bytes maximum. Somewhat configurable Reed-Solomon error correction.

Density: 10.6 pixels per byte

PDF417 17 height by 90-583 width.  1100 bytes maximum. Configurable Reed-Solomon error correction. PDF417 is a stacked linear barcode, and can be scanned by much simpler scanners instead of cameras. It also has built in cross-symbol linking (MacroPDF417), meaning you can scan a sequence of codes before getting output–handy for getting software to automatically link all the codes on a page.

Density: 9.01 pixels per byte

QR codes and PDF417 look like our contenders. PDF417 turns out to not scan well (at all, but especially at large symbol sizes), so despite some nice features let’s pick QR codes. Back when I worked on a digital library I made a component to generate QR codes on the fly, and I know how to scan them on my phone and webcam already from that, so it would be pretty easy to use them.

What density can we get on a sheet of A4 paper (8.25 in × 11.00 in, or 7.75in x 10.50in with half-inch margins)? I trust optar’s estimate (600 dpi = 200 pixels per inch) for printed/scanned pages since they seemed to test things. A max-size QR code is 144×144 pixels, or 0.72 x 0.72 inches at maximum density. We can fit 10 x 14 = 140 QR codes with maximum density on the page, less if we want decent spacing. That’s 140 QR codes x (2,953 bytes per QR code) = 413420 bytes = 413K per page before error correction.

That’s totally comparable to the other approaches above, and you can read the results with off-the-shelf software.  Bam.

Tagged , , , ,

Backup android on plugin

In a previous post I discussed how to backup android with rsync. In this post, I’ll improve on that solution so it happens when you plug the phone in, rather than manually. My solution happens to know I have only one phone; you should adjust accordingly.

The process is

  1. Plug the phone in
  2. Unlock the screen (you’ll see a prompt to do this).
  3. Backup starts automatically
  4. Wait for the backup to finish before unplugging

First, let’s add a udev rule to auto-mount the phone when it’s plugged in and unlocked, and run appropriate scripts.

# 10-android.rules
ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="4ee2", MODE="0660", GROUP="plugdev", SYMLINK+="android", RUN+="/usr/local/bin/android-connected"
ACTION=="remove", SUBSYSTEM=="usb", ENV{ID_MODEL}=="Nexus_4", RUN+="/usr/local/bin/android-disconnected"

Next, we’ll add android-connected and android-disconnected

#!/bin/bash
# /usr/local/bin/android-connected
if [[ "$1" != "-f" ]]
then
 echo "/usr/local/bin/android-connected -f" | /usr/bin/at now
 exit 0
fi

sudo -u zachary DISPLAY=:0 /usr/bin/notify-send "Android plugged in, please unlock."
sudo -u zachary /usr/local/bin/android-mountfs
sudo -u zachary DISPLAY=:0 /usr/bin/notify-send "Mounted, backing up..."
/usr/bin/flock /var/lock/phone-backup.pid sudo -u zachary /usr/local/bin/phone-backup-xenu
sudo -u zachary DISPLAY=:0 /usr/bin/notify-send "Backup completed."
# !/bin/sh
# /usr/local/bin/android-disconnected
#!/bin/sh
sudo -u zachary DISPLAY=:0 /usr/bin/notify-send "Android unplugged."
sudo -u zachary /usr/local/bin/android-umountfs

We’ll add something to mount and unmount the system. Keeping in mind that mounting only works when the screen is unlocked we’ll put that in a loop that checks if the mount worked:

#!/bin/sh
# /usr/local/bin/android-mountfs

android_locked()
{
ls /media/android 2>/dev/null >/dev/null
[ "$?" -eq 2 ]
}

jmtpfs /media/android # mount
while android_locked; do
  fusermount -u /media/android
  sleep 3
  jmtpfs /media/android # mount
done
#!/bin/sh
# /usr/local/bin/android-umountfs
fusermount -u /media/android

The contents of  /usr/local/bin/phone-backup are pretty me-specific so I’ll omit it, but it copies /media/android over to a server. (fun detail: MTP doesn’t show all information even on a rooted phone, so there’s more work to do)

Tagged , , ,

XP Boot USB Stick

Most of the following taken from : http://www.msfn.org/board/topic/151992-install-xp-from-usb-without-extra-tools/, just modified to include syslinux support.

Let me know if there are any omissions; it an XP installer bluescreens on boot for me so I can’t actually test.

  1. Obtain an XP iso file
  2. Format drive with one FAT parition, marked bootable.
  3. syslinux -i /dev/sdXX
    
  4. $ cp /usr/lib/syslinux/bios/mbr.bin >/dev/sdX
    
  5. $ mount /dev/sdXX /mnt
    
  6. mkdir /tmp/xp_iso
    mount xp.iso /tmp/xp_iso
    cp -ar /tmp/xp_iso/* /mnt
    umount /tmp/xp_iso
    rmdir xp_iso
    
  7. cp /usr/lib/syslinux/bios/{chain.c32,libutil.c32,menu.c32,libcom.c32} /mnt
    
  8. cp /mnt/I386/{NTDETECT.COM,SETUPLDR.BIN,TXTSETUP.SIF} /mnt
    
  9. Edit /mnt/syslinux.cfg:

    UI menu.c32# Windows XP
    LABEL windows_xp
    MENU LABEL Run Windows ^XP Setup
    COM32 chain.c32
    APPEND fs ntldr=SETUPLDR.BIN
    
  10. umount /mnt
    
  11. Boot from the USB stick

Tagged , , , , , ,

Roll-your-own git push-to-deploy, and markdown support

Today I added support for development of za3k.com using git:

# !/bin/sh
# /git/bare-repos/za3k.com/hooks/post-update
cd ~za3k/public_html
env -i git pull
echo "Deployed za3k.com"

and markdown support, via a cgi markdown wrapper someone wrote for apache (yes, I’m still using Apache).

Edit: I ended up wanting support for tables in markdown, so I used Ruby‘s redcarpet markdown gem (the same thing Github uses, supports this style of tables as well as code blocks).

CGI support via http://blog.tonns.org/2013/10/enabling-markdown-on-your-apache.html

Tagged , , ,

Screen and Tmux IDEs

I don’t usually like IDEs. They’re hard to switch off of, they do too much. They don’t let me customize things, and I always have to use external tools anyway. I’d really rather do things with a bunch of small tools, the linux way. The problem is, if I close everything, I’ll have trouble getting started back up again. Saving state is one solution. Quick start-up is another. Basically, write a checklist for myself to make starting things up easy (open such-and-such files in the editor, start the server in debug mode, etc).

But we’re programmers, so obviously we’re not going to use a literal checklist. Instead, we’re going to write a little script to auto-start things in a new screen session:

#!/usr/bin/screen -c
# game_development.screen.conf
# Run stand-alone or with screen -c game_devel.screen.conf
screen -t "Vim" 2 bash -c "vim -p *.t"
bind "r" screen -t "Game" 2 bash run.sh

Or if you prefer tmux:

# game_development.tmux.conf
# Run with tmux -f game_development.tmux.conf attach
new-session -s game_development
new-window -n "Vim" "bash -c 'vim -p *.t'"
bind r new-window -n "Game" "bash run.sh"

Note the main features being used: a shebang line hack for screen, to let this file be self-contained and executable. Opening files in vim in place of a text editor. Binding keys for unit tests, running the program, restarting the server, etc. Now, a similar approach is to add new key bindings to the text editor, but I feel like text editors should edit text, and I like being able to document all the additions with help menus (which screen and tmux both support).

Note: ratpoison is similar to screen/tmux so you can do similar things in X.

One thing I’d love is if this kind of file was easy to dump from the current state, especially for things like positioning windows, etc. A little assistance is available, but not too much. Ratpoison and tmux let you dump sizing information. Nothing outputs keybindings or a list of running programs with their windows.

There is a program called tmuxinator to let you write the same config in nested YAML of sessions, panes, and windows, which might appeal to some users.

Also, check out dtach if you don’t need panes and windows, and just want a detachable process.

Tagged , , , , , ,

KISS vs DRY

The best practice or goal emphasized above with respect to templates and views is KISS and DRY. As long as the implementation does not become overly complex and difficult to grok, keep the template code DRY, otherwise KISS principle overrides the need to have template code that does not repeat itself.

Vertebrae Framework

A nice illustration of conflicting positive principles and resolution.

Tagged , ,

moreorcs.com

orc

My newest site: http://moreorcs.com/

The site generates orc-themed emails for you, which you can get emailed at (completely insecurely, it’s just a web address at mailinator to see the content). Please check out mailinator’s site, it’s a really neat project.

Some samples:

  • the last small poop orc (thelastsmallpooporc@moreorcs.com)
  • poop gross green blood thirsty orc
  • 49 cross-eyed slightly intimidating poop dumb orcs
  • the last slightly intimidating orc
  • quite a lot of slightly intimidating small orcs
  • 73 slightly intimidating small pretty orcs
  • smelly orc
  • a few orcs
  • lots and lots and lots and lots and lots and lots of orcs
Tagged , , ,

Archiving gmail

I set up an automatic archiver for gmail, using the special-purpose tool gm-vault. It was fairly straightforward, no tutorial here. The daily sync:

@daily cd ~gmail && cronic gmvault sync -d "/home/gmail/vanceza@gmail.com" vanceza@gmail.com

I’m specifying a backup folder here (-d) so I can easily support multiple accounts, one per line.

Cronic is a tool designed to make cron’s default email behavior better, so I get emailed only on actual backup failures.

Tagged , ,

Archiving twitter

(Output)

I wanted to archive twitter so that I could

  1. Make sure old content was easily available
  2. Read twitter in a one-per-line format without ever logging into the site

twitter_ebooks is a framework to make twitter bots, but it includes an ‘archive’ component to fetch historical account content which is apparently unique in that it 1) works with current TLS and 2) works the current twitter API. It stores the tweets in a JSON format which presumably matches the API return values. Usage is simple:


while read account
do
    ebooks archive "${account}" "archive/${account}.json"
    jq -r 'reverse | .\[\] | "\\(.created\_at|@sh)\\t \\(.text|@sh)"' "archive/${account}.json" >"archive/${account}.txt"
done 

I ran into a bug with upstream incompatibilities which is easily fixed. Another caveat is that the twitter API only allows access 3200 tweets back in time for an account–all the more reason to set up archiving ASAP. Twitter’s rate-limiting is also extreme (15-180 req/15 min), and I’m worried about a problem where my naive script can’t make it through a list of more than 15 accounts even with no updates.

Tagged , , , ,