Archiving twitter

(Output)

I wanted to archive twitter so that I could

  1. Make sure old content was easily available
  2. Read twitter in a one-per-line format without ever logging into the site

twitter_ebooks is a framework to make twitter bots, but it includes an ‘archive’ component to fetch historical account content which is apparently unique in that it 1) works with current TLS and 2) works the current twitter API. It stores the tweets in a JSON format which presumably matches the API return values. Usage is simple:


while read account
do
    ebooks archive "${account}" "archive/${account}.json"
    jq -r 'reverse | .\[\] | "\\(.created\_at|@sh)\\t \\(.text|@sh)"' "archive/${account}.json" >"archive/${account}.txt"
done 

I ran into a bug with upstream incompatibilities which is easily fixed. Another caveat is that the twitter API only allows access 3200 tweets back in time for an account–all the more reason to set up archiving ASAP. Twitter’s rate-limiting is also extreme (15-180 req/15 min), and I’m worried about a problem where my naive script can’t make it through a list of more than 15 accounts even with no updates.

Tagged , , , ,

Android backup on arch linux

Edit: See here for an automatic version of the backup portion.

Connecting android to Windows and Mac, pretty easy. On arch linux? Major pain. Here’s what I did, mostly via the help of the arch wiki:

  1. Rooted my phone. Otherwise you can’t back up major parts of the file system (including text messages and most application data) [EDIT: Actually, you can’t back these up over MTP even once you root your phone. Oops.]
  2. Installed jmtpfs, a FUSE filesystem for mounting MTP, the new alternative to mount-as-storage on portable devices.
  3. Enabled ‘user_allow_other’ in /etc/fuse.conf. I’m not sure if I needed to, but I did.
  4. Plugged in the phone, and mounted the filesystem:

    jmtpfs /media/android
    

    The biggest pitfall I had was that if the phone’s screen is not unlocked at this point, mysterious failures will pop up later.

  5. Synced the contents of the phone. For reasons I didn’t diagnose (I assume specific to FUSE), this actually fails as root:

    rsync -aAXv --progress --fake-super --one-file-system /media/android --delete --delete-excluded "$SYNC_DESTINATION"
    
Tagged , , , ,

Archiving gmail

I set up an automatic archiver for gmail, using the special-purpose tool gm-vault. It was fairly straightforward, no tutorial here. The daily sync:

@daily cd ~gmail && cronic gmvault sync -d "/home/gmail/vanceza@gmail.com" vanceza@gmail.com

I’m specifying a backup folder here (-d) so I can easily support multiple accounts, one per line.

Cronic is a tool designed to make cron’s default email behavior better, so I get emailed only on actual backup failures.

Tagged , ,

The Double Lives of Books

Two forces pull at me: the desire to have few possessions and be able to travel flexibly, and the convenience of reading and referencing physical books. I discovered a third option: I have digital copies of all my books, so I can freely get rid of them at any time, or travel without inconvenience.

So that’s where we start. Here’s where I went.

I thought, if these books are just a local convenience for an online version, it’s more artistically satisfying to have some representation of that. So I printed up a card catalog of all my books, both the ones I have digital copies of and not:

An example catalog card
An example catalog card

That’s what a card looks like. There’s information about the book up top, and a link in the form of a QR code in the middle. The link downloads a PDF version of that book. Obviously being a programmer, the cards all all automatically generated.

Book with a card inside
Book with a card inside

For the books where I have a physical copy, I put the card in the book, and it feels like I’m touching the digital copy. My friends can pirate their own personal version of the book (saving me the sadness of lost lent-out books I’m sure we’ve all felt at times). And I just thing it looks darn neat. Some physical books I don’t have a digital version of, since the world is not yet perfect. But at least I can identify them at a glance (and consider sending them off to a service like http://1dollarscan.com/)

Card catalog of digital books
Card catalog of digital books

And then, I have a box full of all the books I *don’t* have a physical copy of, so I can browse through them, and organize them into reading lists or recommendations. It’s not nearly as cool as the ones in books, but it’s sort of nice to keep around.

And if I ever decide to get rid of a book, I can just check to make sure there’s a card inside, and move the card into the box, reassured nothing is lost, giving away a physical artifact I no longer have the ability to support.

I sadly won’t provide a link to the library since that stuff is mostly pirated.

Interesting technical problems encountered during this project (you can stop reading now if you’re not technically inclined):

  • Making sure each card gets printed exactly once, in the face of printer failures and updating digital collections. This was hard and took up most of my time, but it’s also insanely boring so I’ll say no more.
  • Command-line QR code generation, especially without generating intermediate files. I used rqrcode_png in ruby. I can now hotlink link qr.png?text=Hello%20World and see any text I want, it’s great.
  • Printing the cards. This is actually really difficult to automate–I generate the cards in HTML and it’s pretty difficult to print HTML, CSS, and included images. I ended up using the ‘wkhtmltoimage‘ project, which as far as I can tell, renders the image somewhere internally using webkit and screenshots it. There’s also a wkhtmltopdf available, which worked well but I couldn’t get to cooperate with index-card sized paper. Nothing else really seems to handle CSS, etc properly and as horrifying as the fundamental approach is, it’s both correct and well-executed. (They solved a number of problems with upstream patches to Qt for example, the sort of thing I love to hear)
  • The zbarcam software (for scanning QR codes among other digital codes) is just absolute quality work and I can’t say enough good things about it. Scanning cards back into the computer was one of the most pleasant parts of this whole project. It has an intuitive command UI using all the format options I want, and camera feedback to show it’s scanned QR codes (which it does very quickly).
  • Future-proofed links to pirated books–the sort of link that usually goes down. I opted to use a SHA256 hash (the mysterious numbers at the bottom which form a unique signature generated from the content of the book) and provide a small page on my website which gives you a download based on that. This is what the QR code links to. I was hoping there was some way to provide that without involving me, but I’m unaware of any service available. Alice Monday suggested just typing the SHA hash into Google, which sounded like the sort of clever idea which might work. It doesn’t.

Archiving github

GitHub-Backup is a small project to archive github repos to a local computer. It advertises that one reason to use it is

You are paranoid tinfoil-hat wearer who needs to back up everything in triplicate on a variety of outdated tape media.

which describes why I was searching it out perfectly.

I made a new account on my server (github) and cloned their repo.

sudo useradd -m github
sudo -i- u github
git clone git@github.com:clockfort/GitHub-Backup.git

Despite being semi-unmaintained, everything mostly works still. There were two exceptions–some major design problems around private repos. I only need to back up my public repos really, so I ‘solved’ this by issuing an Oauth token that only knows about public repos. And second, a small patch to work around a bug with User objects in the underlying Github egg:

-       os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))
+       if hasattr(repo.user, 'email') and repo.user.email:
+               os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))

Then I just shoved everything into a cron task and we’re good to go.

@hourly GitHub-Backup/github-backup.py -m -t  vanceza /home/github/vanceza

Edit: There’s a similar project for bitbucket I haven’t tried out: https://bitbucket.org/fboender/bbcloner

Tagged , , ,

Configuring mailx’s .mailrc with Gmail

Here’s how I added gmail to .mailrc for the BSD program mailx, provided by the s-nail package in arch.

account gmail {
  set folder=imaps://example@gmail.com@imap.gmail.com
  set password-example@gmail.com@imap.gmail.com="PASS"
  set smtp-use-starttls
  set smtp=smtp://smtp.gmail.com:587
  set smtp-auth=login
  set smtp-auth-user=example@gmail.com
  set smtp-auth-password="PASS"
  set from="John Smith <example@gmail.com>"
}

Replace PASS with your actual password, and example@gmail.com with your actual email. Read the documentation if you want to avoid plaintext passwords.

You can send mail with ‘mail -A gmail ’. If you have only one account, remove the first and last line and use ‘mail

Tagged , ,

Setting up SSL certificates using StartSSL

  1. Generate an SSL/TLS key, which will be used to actually encrypt traffic.

    DOMAIN=nntp.za3k.com
    openssl genrsa -out ${DOMAIN}.key 4096
    chmod 700 ${DOMAIN}.key
    
  2. Generate a Certificate Signing Request, which is sent to your authentication provider. The details here will have to match the details they have on file (for StartSSL, just the domain name).

    # -subj "/C=US/ST=/L=/O=/CN=${DOMAIN}" can be omitted to fill in custom identification details
    # -sha512 is the hash of your key used for identification. This was the reasonable option in Oct 2014. It isn't supported by IE6
    openssl req -new -key ${DOMAIN}.key -out ${DOMAIN}.csr -subj "/C=US/ST=/L=/O=/CN=${DOMAIN}" -sha512
    
  3. Submit your Certificate Signing Request to your authentication provider. Assuming the signing request details match whatever they know about you, they’ll return you a certificate. You should also make sure to grab any intermediate and root certificates here.

    echo "Saved certificate" > ${DOMAIN}.crt
    wget https://www.startssl.com/certs/sca.server1.crt https://www.startssl.com/certs/ca.crt # Intermediate and root certificate for StartSSL
    
  4. Combine the chain of trust (key, CSR, certificate, intermediate certificates(s), root certificate) into a single file with concatenation. Leaving out the key will give you a combined certificate of trust for the key, which you may need for other applications.

    cat ${DOMAIN}.crt sca.server1.crt >${DOMAIN}.pem # Main cert
    cat ${DOMAIN}.key ${DOMAIN}.crt sca.server1.crt ca.crt >${DOMAIN}.full.pem
    chmod 700 ${DOMAIN}.full.pem
    

See also: https://github.com/Gordin/StartSSL_API

Tagged , , ,

Running a forge server on headless linux

I’ve had a lot of trouble getting Minecraft Forge to run headless. They have a friendly installer option that I just can’t use in my situation, but one of the devs seems actively hostile around providing help to headless servers, so I didn’t bother asking forge for help. I thought I’d write up what I had to do to get things working. As a warning, it requires some local work; you can’t do everything headless with these directions.

I’m running Minecraft 1.6.4, with the latest version of forge for that, 9.11.1.965.

  1. Locally, download and start the minecraft client for the correct version at least once. Not sure if you’ll need to ‘play online’ or not. If you have the current installer, you need to make a new profile with the correct minecraft version and play that.
  2. Copy ~/.minecraft/libraries to the headless machine.
  3. Download forge (the installer version, not the universal) from http://files.minecraftforge.net/. The non-adly version is the little star for non-interactive use.
  4. Run

    java -jar forge-1.6.4-9.11.1.965-installer.jar --installServer
    
  5. Delete the installer, you don’t need it any more.

  6. Install any mods you want to the ‘mods’ directory, edit server.properties, etc. Normal server setup.
  7. To execute the server, run the file indicated in the installer. In my case, I run

    java -jar minecraftforge-universal-1.6.4-9.11.1.965-v164-pregradle.jar nogui
    

Alternatively, you can install the entire server locally and copy it over.

Tagged

Amazon AWS

I was originally planning to write a rosetta-stone style guide for similar commands between digital ocean, google compute, and AWS. Instead, I spent all day writing this CLI tool for EC2 which wraps the enormous and unintuitive AWS command-line tool. It’s not totally polished, namely you’ll have to hand-substitute some stuff at the top of the script that should properly go in a config file, but hopefully someone will find it useful.

As a warning it terminates, not just stops, all amazon instances when asked.

Tagged ,