Archiving twitter

(Output)

I wanted to archive twitter so that I could

  1. Make sure old content was easily available
  2. Read twitter in a one-per-line format without ever logging into the site

twitter_ebooks is a framework to make twitter bots, but it includes an ‘archive’ component to fetch historical account content which is apparently unique in that it 1) works with current TLS and 2) works the current twitter API. It stores the tweets in a JSON format which presumably matches the API return values. Usage is simple:

while read account
do
    ebooks archive "${account}" "archive/${account}.json"
    jq -r 'reverse | .[] | "\(.created_at|@sh)\t \(.text|@sh)"' "archive/${account}.json" >"archive/${account}.txt"
done <accounts.txt

I ran into a bug with upstream incompatibilities which is easily fixed. Another caveat is that the twitter API only allows access 3200 tweets back in time for an account–all the more reason to set up archiving ASAP. Twitter’s rate-limiting is also extreme (15-180 req/15 min), and I’m worried about a problem where my naive script can’t make it through a list of more than 15 accounts even with no updates.

 

 

Android backup on arch linux

Edit: See here for an automatic version of the backup portion.

Connecting android to Windows and Mac, pretty easy. On arch linux? Major pain. Here’s what I did, mostly via the help of the arch wiki:

  1. Rooted my phone. Otherwise you can’t back up major parts of the file system (including text messages and most application data) [EDIT: Actually, you can’t back these up over MTP even once you root your phone. Oops.]
  2. Installed jmtpfs, a FUSE filesystem for mounting MTP, the new alternative to mount-as-storage on portable devices.
  3. Enabled ‘user_allow_other’ in /etc/fuse.conf. I’m not sure if I needed to, but I did.
  4. Plugged in the phone, and mounted the filesystem:
    jmtpfs /media/android

    The biggest pitfall I had was that if the phone’s screen is not unlocked at this point, mysterious failures will pop up later.

  5. Synced the contents of the phone. For reasons I didn’t diagnose (I assume specific to FUSE), this actually fails as root:
    rsync -aAXv --progress --fake-super --one-file-system /media/android --delete --delete-excluded "$SYNC_DESTINATION"
    

Archiving gmail

I set up an automatic archiver for gmail, using the special-purpose tool gm-vault. It was fairly straightforward, no tutorial here. The daily sync:

@daily cd ~gmail && cronic gmvault sync -d "/home/gmail/vanceza@gmail.com" vanceza@gmail.com

I’m specifying a backup folder here (-d) so I can easily support multiple accounts, one per line.

Cronic is a tool designed to make cron’s default email behavior better, so I get emailed only on actual backup failures.

The Double Lives of Books

Two forces pull at me: the desire to have few possessions and be able to travel flexibly, and the convenience of reading and referencing physical books. I discovered a third option: I have digital copies of all my books, so I can freely get rid of them at any time, or travel without inconvenience.

So that’s where we start. Here’s where I went.

I thought, if these books are just a local convenience for an online version, it’s more artistically satisfying to have some representation of that. So I printed up a card catalog of all my books, both the ones I have digital copies of and not:

An example catalog card

An example catalog card

That’s what a card looks like. There’s information about the book up top, and a link in the form of a QR code in the middle. The link downloads a PDF version of that book. Obviously being a programmer, the cards all all automatically generated.

Book with a card inside

Book with a card inside

For the books where I have a physical copy, I put the card in the book, and it feels like I’m touching the digital copy. My friends can pirate their own personal version of the book (saving me the sadness of lost lent-out books I’m sure we’ve all felt at times). And I just thing it looks darn neat. Some physical books I don’t have a digital version of, since the world is not yet perfect. But at least I can identify them at a glance (and consider sending them off to a service like http://1dollarscan.com/)

Card catalog of digital books

Card catalog of digital books

And then, I have a box full of all the books I *don’t* have a physical copy of, so I can browse through them, and organize them into reading lists or recommendations. It’s not nearly as cool as the ones in books, but it’s sort of nice to keep around.

And if I ever decide to get rid of a book, I can just check to make sure there’s a card inside, and move the card into the box, reassured nothing is lost, giving away a physical artifact I no longer have the ability to support.

I sadly won’t provide a link to the library since that stuff is mostly pirated.

Interesting technical problems encountered during this project (you can stop reading now if you’re not technically inclined):

  • Making sure each card gets printed exactly once, in the face of printer failures and updating digital collections. This was hard and took up most of my time, but it’s also insanely boring so I’ll say no more.
  • Command-line QR code generation, especially without generating intermediate files. I used rqrcode_png in ruby. I can now hotlink link qr.png?text=Hello%20World and see any text I want, it’s great.
  • Printing the cards. This is actually really difficult to automate–I generate the cards in HTML and it’s pretty difficult to print HTML, CSS, and included images. I ended up using the ‘wkhtmltoimage‘ project, which as far as I can tell, renders the image somewhere internally using webkit and screenshots it. There’s also a wkhtmltopdf available, which worked well but I couldn’t get to cooperate with index-card sized paper. Nothing else really seems to handle CSS, etc properly and as horrifying as the fundamental approach is, it’s both correct and well-executed. (They solved a number of problems with upstream patches to Qt for example, the sort of thing I love to hear)
  • The zbarcam software (for scanning QR codes among other digital codes) is just absolute quality work and I can’t say enough good things about it. Scanning cards back into the computer was one of the most pleasant parts of this whole project. It has an intuitive command UI using all the format options I want, and camera feedback to show it’s scanned QR codes (which it does very quickly).
  • Future-proofed links to pirated books–the sort of link that usually goes down. I opted to use a SHA256 hash (the mysterious numbers at the bottom which form a unique signature generated from the content of the book) and provide a small page on my website which gives you a download based on that. This is what the QR code links to. I was hoping there was some way to provide that without involving me, but I’m unaware of any service available. Alice Monday suggested just typing the SHA hash into Google, which sounded like the sort of clever idea which might work. It doesn’t.

Archiving github

GitHub-Backup is a small project to archive github repos to a local computer. It advertises that one reason to use it is

You are paranoid tinfoil-hat wearer who needs to back up everything in triplicate on a variety of outdated tape media.

which describes why I was searching it out perfectly.

I made a new account on my server (github) and cloned their repo.

sudo useradd -m github
sudo -i- u github
git clone git@github.com:clockfort/GitHub-Backup.git

Despite being semi-unmaintained, everything mostly works still. There were two exceptions–some major design problems around private repos. I only need to back up my public repos really, so I ‘solved’ this by issuing an Oauth token that only knows about public repos. And second, a small patch to work around a bug with User objects in the underlying Github egg:

-       os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))
+       if hasattr(repo.user, 'email') and repo.user.email:
+               os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))

Then I just shoved everything into a cron task and we’re good to go.

@hourly GitHub-Backup/github-backup.py -m -t  vanceza /home/github/vanceza

Edit: There’s a similar project for bitbucket I haven’t tried out: https://bitbucket.org/fboender/bbcloner