Archiving twitter

(Output)

I wanted to archive twitter so that I could

  1. Make sure old content was easily available
  2. Read twitter in a one-per-line format without ever logging into the site

twitter_ebooks is a framework to make twitter bots, but it includes an ‘archive’ component to fetch historical account content which is apparently unique in that it 1) works with current TLS and 2) works the current twitter API. It stores the tweets in a JSON format which presumably matches the API return values. Usage is simple:


while read account
do
    ebooks archive "${account}" "archive/${account}.json"
    jq -r 'reverse | .\[\] | "\\(.created\_at|@sh)\\t \\(.text|@sh)"' "archive/${account}.json" >"archive/${account}.txt"
done 

I ran into a bug with upstream incompatibilities which is easily fixed. Another caveat is that the twitter API only allows access 3200 tweets back in time for an account–all the more reason to set up archiving ASAP. Twitter’s rate-limiting is also extreme (15-180 req/15 min), and I’m worried about a problem where my naive script can’t make it through a list of more than 15 accounts even with no updates.

Tagged , , , ,

Android backup on arch linux

Edit: See here for an automatic version of the backup portion.

Connecting android to Windows and Mac, pretty easy. On arch linux? Major pain. Here’s what I did, mostly via the help of the arch wiki:

  1. Rooted my phone. Otherwise you can’t back up major parts of the file system (including text messages and most application data) [EDIT: Actually, you can’t back these up over MTP even once you root your phone. Oops.]
  2. Installed jmtpfs, a FUSE filesystem for mounting MTP, the new alternative to mount-as-storage on portable devices.
  3. Enabled ‘user_allow_other’ in /etc/fuse.conf. I’m not sure if I needed to, but I did.
  4. Plugged in the phone, and mounted the filesystem:

    jmtpfs /media/android
    

    The biggest pitfall I had was that if the phone’s screen is not unlocked at this point, mysterious failures will pop up later.

  5. Synced the contents of the phone. For reasons I didn’t diagnose (I assume specific to FUSE), this actually fails as root:

    rsync -aAXv --progress --fake-super --one-file-system /media/android --delete --delete-excluded "$SYNC_DESTINATION"
    
Tagged , , , ,

Archiving gmail

I set up an automatic archiver for gmail, using the special-purpose tool gm-vault. It was fairly straightforward, no tutorial here. The daily sync:

@daily cd ~gmail && cronic gmvault sync -d "/home/gmail/vanceza@gmail.com" vanceza@gmail.com

I’m specifying a backup folder here (-d) so I can easily support multiple accounts, one per line.

Cronic is a tool designed to make cron’s default email behavior better, so I get emailed only on actual backup failures.

Tagged , ,

Archiving github

GitHub-Backup is a small project to archive github repos to a local computer. It advertises that one reason to use it is

You are paranoid tinfoil-hat wearer who needs to back up everything in triplicate on a variety of outdated tape media.

which describes why I was searching it out perfectly.

I made a new account on my server (github) and cloned their repo.

sudo useradd -m github
sudo -i- u github
git clone git@github.com:clockfort/GitHub-Backup.git

Despite being semi-unmaintained, everything mostly works still. There were two exceptions–some major design problems around private repos. I only need to back up my public repos really, so I ‘solved’ this by issuing an Oauth token that only knows about public repos. And second, a small patch to work around a bug with User objects in the underlying Github egg:

-       os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))
+       if hasattr(repo.user, 'email') and repo.user.email:
+               os.system("git config --local gitweb.owner %s"%(shell_escape("%s <%s>"%(repo.user.name, repo.user.email.encode("utf-8"))),))

Then I just shoved everything into a cron task and we’re good to go.

@hourly GitHub-Backup/github-backup.py -m -t  vanceza /home/github/vanceza

Edit: There’s a similar project for bitbucket I haven’t tried out: https://bitbucket.org/fboender/bbcloner

Tagged , , ,