Archiving all bash commands typed

This one’s a quickie. Just a second of my config to record all bash commands to a file (.bash_eternal_history) forever. The default bash HISTFILESIZE is 500. Setting it to a non-numeric value will make the history file grow forever (although not your actual history size, which is controlled by HISTSIZE).

I do this in addition:

Archiving all web traffic

Today I’m going to walk through a setup on how to archive all web (HTTP/S) traffic passing over your Linux desktop. The basic approach is going to be to install a proxy which records traffic. It will record the traffic to WARC files. You can’t proxy non-HTTP traffic (for example, chat or email) because we’re using an HTTP proxy approach.

The end result is pretty slow for reasons I’m not totally sure of yet. It’s possible warcproxy isn’t streaming results.

  1. Install the server
  2. Make a warcprox user to run the proxy as.
  3. Make a root certificate. You’re going to intercept HTTPS traffic by pretending to be the website, so if anyone gets ahold of this, they can fake being every website to you. Don’t give it out.
  4. Set up a directory where you’re going to store the WARC files. You’re saving all web traffic, so this will get pretty big.
  5. Set up a boot script for warcproxy. Here’s mine. I’m using supervisorctl rather than systemd.
  6. Set up any browers, etc to use localhost:18000 as your proxy. You could also do some kind of global firewall config. Chromium in particular was pretty irritating on Arch Linux. It doesn’t respect $http_proxy, so you have to pass it separate options. This is also a good point to make sure anything you don’t want recorded BYPASSES the proxy (for example, maybe large things like youtube, etc).

Archiving Twitch

Install jq and youtube-dl

Get a list of the last 100 URLs:

Save them locally:

Did it. youtube-dl is smart enough to avoid re-downloading videos it already has, so as long as you run this often enough (I do daily), you should avoid losing videos before they’re deleted.

Thanks jrayhawk for the API info.

Paper archival

Previous work:

I wanted (for fun) to see if I could get data stored in paper formats. I’d read the previous work, and people put a lot of thought into density, but not a lot of thought into ease of retreival. First off, acid-free paper lasts 500 years or so, which is plenty long enough compared to any environmental stresses (moisture, etc) I expect on any paper I have.

Optar gets a density of 200kB / A4 page. By default, it requires a 600dpi printer, and a 600+dpi scanner. It has 3-of-12 bit redundancy using Golay codes, and spaces out the bits in an okay fashion.

Paperback gets a (theoretical) density of 500kB / A4 page. It needs a 600dpi printer, and a ~900dpi scanner.  It has configurable redundancy using Reed-Solomon codes. It looks completely unusable in practice (alignment issues, aside from being Windows-only).

Okay, so I think these are all stupid, because you need some custom software to decode them, which in any case where you’re decoding data stored on paper you probably don’t have that. I want to use standard barcodes, even if they’re going to be lower density. Let’s look at our options. I’m going to skip linear barcodes (low-density) and color barcodes (printing in color is expensive).  Since we need space between symbols, we want to pick the biggest versions of each code we can. For one, whitespace around codes is going to dominate actual code density for layout efficiency, and larger symbols are usually more dense. For another thing, we want to scan as few symbols as possible if we’re doing them one at a time.

Aztec From 15×15 to 151×151 square pixels. 1914 bytes maximum. Configurable Reed-Solomon error correction.

Density: 11.9 pixels per byte

Data Matrix From 10×10 to 144×144 square pixels. 1555 bytes maximum. Large, non-configurable error correction.

Density: 13.3 pixels per byte

QR Code From 21×21 to 177×177 square pixels. 2,953 bytes maximum. Somewhat configurable Reed-Solomon error correction.

Density: 10.6 pixels per byte

PDF417 17 height by 90-583 width.  1100 bytes maximum. Configurable Reed-Solomon error correction. PDF417 is a stacked linear barcode, and can be scanned by much simpler scanners instead of cameras. It also has built in cross-symbol linking (MacroPDF417), meaning you can scan a sequence of codes before getting output–handy for getting software to automatically link all the codes on a page.

Density: 9.01 pixels per byte

QR codes and PDF417 look like our contenders. PDF417 turns out to not scan well (at all, but especially at large symbol sizes), so despite some nice features let’s pick QR codes. Back when I worked on a digital library I made a component to generate QR codes on the fly, and I know how to scan them on my phone and webcam already from that, so it would be pretty easy to use them.

What density can we get on a sheet of A4 paper (8.25 in × 11.00 in, or 7.75in x 10.50in with half-inch margins)? I trust optar’s estimate (600 dpi = 200 pixels per inch) for printed/scanned pages since they seemed to test things. A max-size QR code is 144×144 pixels, or 0.72 x 0.72 inches at maximum density. We can fit 10 x 14 = 140 QR codes with maximum density on the page, less if we want decent spacing. That’s 140 QR codes x (2,953 bytes per QR code) = 413420 bytes = 413K per page before error correction.

That’s totally comparable to the other approaches above, and you can read the results with off-the-shelf software.  Bam.

Backup android on plugin

In a previous post I discussed how to backup android with rsync. In this post, I’ll improve on that solution so it happens when you plug the phone in, rather than manually. My solution happens to know I have only one phone; you should adjust accordingly.

The process is

  1. Plug the phone in
  2. Unlock the screen (you’ll see a prompt to do this).
  3. Backup starts automatically
  4. Wait for the backup to finish before unplugging

First, let’s add a udev rule to auto-mount the phone when it’s plugged in and unlocked, and run appropriate scripts.

Next, we’ll add android-connected and android-disconnected

We’ll add something to mount and unmount the system. Keeping in mind that mounting only works when the screen is unlocked we’ll put that in a loop that checks if the mount worked:

The contents of  /usr/local/bin/phone-backup are pretty me-specific so I’ll omit it, but it copies /media/android over to a server. (fun detail: MTP doesn’t show all information even on a rooted phone, so there’s more work to do)

Archiving twitter

(Output)

I wanted to archive twitter so that I could

  1. Make sure old content was easily available
  2. Read twitter in a one-per-line format without ever logging into the site

twitter_ebooks is a framework to make twitter bots, but it includes an ‘archive’ component to fetch historical account content which is apparently unique in that it 1) works with current TLS and 2) works the current twitter API. It stores the tweets in a JSON format which presumably matches the API return values. Usage is simple:

I ran into a bug with upstream incompatibilities which is easily fixed. Another caveat is that the twitter API only allows access 3200 tweets back in time for an account–all the more reason to set up archiving ASAP. Twitter’s rate-limiting is also extreme (15-180 req/15 min), and I’m worried about a problem where my naive script can’t make it through a list of more than 15 accounts even with no updates.

 

 

Android backup on arch linux

Edit: See here for an automatic version of the backup portion.

Connecting android to Windows and Mac, pretty easy. On arch linux? Major pain. Here’s what I did, mostly via the help of the arch wiki:

  1. Rooted my phone. Otherwise you can’t back up major parts of the file system (including text messages and most application data) [EDIT: Actually, you can’t back these up over MTP even once you root your phone. Oops.]
  2. Installed jmtpfs, a FUSE filesystem for mounting MTP, the new alternative to mount-as-storage on portable devices.
  3. Enabled ‘user_allow_other’ in /etc/fuse.conf. I’m not sure if I needed to, but I did.
  4. Plugged in the phone, and mounted the filesystem:

    The biggest pitfall I had was that if the phone’s screen is not unlocked at this point, mysterious failures will pop up later.
  5. Synced the contents of the phone. For reasons I didn’t diagnose (I assume specific to FUSE), this actually fails as root:

Archiving gmail

I set up an automatic archiver for gmail, using the special-purpose tool gm-vault. It was fairly straightforward, no tutorial here. The daily sync:

I’m specifying a backup folder here (-d) so I can easily support multiple accounts, one per line.

Cronic is a tool designed to make cron’s default email behavior better, so I get emailed only on actual backup failures.

Archiving github

GitHub-Backup is a small project to archive github repos to a local computer. It advertises that one reason to use it is

You are paranoid tinfoil-hat wearer who needs to back up everything in triplicate on a variety of outdated tape media.

which describes why I was searching it out perfectly.

I made a new account on my server (github) and cloned their repo.

Despite being semi-unmaintained, everything mostly works still. There were two exceptions–some major design problems around private repos. I only need to back up my public repos really, so I ‘solved’ this by issuing an Oauth token that only knows about public repos. And second, a small patch to work around a bug with User objects in the underlying Github egg:

Then I just shoved everything into a cron task and we’re good to go.

Edit: There’s a similar project for bitbucket I haven’t tried out: https://bitbucket.org/fboender/bbcloner