Archiving all web traffic

Today I’m going to walk through a setup on how to archive all web (HTTP/S) traffic passing over your Linux desktop. The basic approach is going to be to install a proxy which records traffic. It will record the traffic to WARC files. You can’t proxy non-HTTP traffic (for example, chat or email) because we’re using an HTTP proxy approach.

The end result is pretty slow for reasons I’m not totally sure of yet. It’s possible warcproxy isn’t streaming results.

  1. Install the server
  2. Make a warcprox user to run the proxy as.
  3. Make a root certificate. You’re going to intercept HTTPS traffic by pretending to be the website, so if anyone gets ahold of this, they can fake being every website to you. Don’t give it out.
  4. Set up a directory where you’re going to store the WARC files. You’re saving all web traffic, so this will get pretty big.
  5. Set up a boot script for warcproxy. Here’s mine. I’m using supervisorctl rather than systemd.
  6. Set up any browers, etc to use localhost:18000 as your proxy. You could also do some kind of global firewall config. Chromium in particular was pretty irritating on Arch Linux. It doesn’t respect $http_proxy, so you have to pass it separate options. This is also a good point to make sure anything you don’t want recorded BYPASSES the proxy (for example, maybe large things like youtube, etc).

Setting up SSL certificates using StartSSL

    1. Generate an SSL/TLS key, which will be used to actually encrypt traffic.
    2. Generate a Certificate Signing Request, which is sent to your authentication provider. The details here will have to match the details they have on file (for StartSSL, just the domain name).
    3. Submit your Certificate Signing Request to your authentication provider. Assuming the signing request details match whatever they know about you, they’ll return you a certificate. You should also make sure to grab any intermediate and root certificates here.
    4. Combine the chain of trust (key, CSR, certificate, intermediate certificates(s), root certificate) into a single file with concatenation. Leaving out the key will give you a combined certificate of trust for the key, which you may need for other applications.

See also: https://github.com/Gordin/StartSSL_API