Streaming Linux->Twitch using ffmpeg and ALSA

I stopped using OBS a while ago for a couple reasons–the main one was that it didn’t support my video capture card, but I also had issues with it crashing or lagging behind with no clear indication of what it was doing. I ended up switching to ffmpeg for live streaming, because it’s very easy to tell when ffmpeg is lagging behind. OBS uses ffmpeg internally for video. I don’t especially recommend this setup, but I thought I’d document it in case someone can’t use a nice GUI setup like OBS or similar.

I’m prefer less layers, so I’m still on ALSA. My setup is:

  • I have one computer, running linux. It runs what I’m streaming (typically minecraft), and captures everything, encodes everything, and sends it to twitch
  • Video is captured using libxcb (captures X11 desktop)
  • Audio is captured using ALSA. My mic is captured directly, while the rest of my desktop audio is sent to a loopback device which acts as a second mic.
  • Everything is encoded together into one video stream. The video is a Flash video container with x264 video and AAC audio, because that’s what twitch wants. Hopefully we’ll all switch to AV1 soon.
  • That stream is sent to twitch by ffmpeg
  • There is no way to pause the stream, do scenes, adjust audio, see audio levels, etc while the stream is going. I just have to adjust program volumes independently.

Here’s my .asoundrc:

# sudo modprobe snd-aloop is required to set up hw:Loopback
pcm.!default {
  type plug
  slave.pcm {
    type dmix
    ipc_key 99
    slave {
      pcm "hw:Loopback,0"
      rate 48000
      channels 2
      period_size 1024
    }
  }
}

My ffmpeg build line:

./configure --enable-libfdk-aac --enable-nonfree --enable-libxcb --enable-indev=alsa --enable-outdev=alsa --prefix=/usr/local --extra-version='za3k' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-libaom --enable-libass --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libmp3lame --enable-libopus --enable-libpulse --enable-libvorbis --enable-libvpx --enable-libx265 --enable-opengl --enable-libdrm --enable-libx264 --enable-shared --enable-librtmp && make -j 4 && sudo make install

And most imporantly, my ffmpeg command:

ffmpeg 
  -video_size 1280x720 -framerate 30 -f x11grab -s 1280x720 -r 30 -i :0.0 
  -f alsa -ac 1 -ar 48000 -i hw:1,0 
  -f alsa -ac 2 -ar 48000 -i hw:Loopback,1
  -filter_complex '[1:a][1:a]amerge=inputs=2[stereo1] ; [2:a][stereo1]amerge=inputs=2[a]' -ac 2 
  -map '[a]' -map 0:v 
  -f flv -ac 2 -ar 48000 
  -vcodec libx264 -g 60 -keyint_min 30 -b:v 3000k -minrate 3000k -maxrate 3000k -pix_fmt yuv420p -s 1280x720 -preset ultrafast -tune film 
  -c:a libfdk_aac -b:a 160k -strict normal -bufsize 3000k 
  rtmp://live-sjc.twitch.tv/app/${TWITCH_KEY}

Let’s break that monster down a bit. ffmpeg structures its command line into input streams, transformations, and output streams.

ffmpeg input streams

-video_size 1280x720 -framerate 30 -f x11grab -s 1280x720 -r 30 -i :0.0:
Grab 720p video (-video_size 1280x720) at 30fps (-framerate 30) using x11grab/libxcb (-f x11grab), and we also want to output that video at the same resolution and framerate (-s 1280x720 -r 30). We grab :0.0 (-i :0.0)–that’s X language for first X server (you only have one, probably), first display/monitor. And, since we don’t say otherwise, we grab the whole thing, so the monitor better be 720p.

-f alsa -ac 1 -ar 48000 -i hw:1,0:
Using alsa (-f alsa), capture mono (-ac 1, 1 audio channel) at the standard PC sample rate (-ar 48000, audio rate=48000 Hz). The ALSA device is hw:1,0 (-i hw:1,0), my microphone, which happens to be mono.

-f alsa -ac 2 -ar 48000 -i hw:Loopback,1:
Using alsa (-f alsa), capture stereo (-ac 2, 2 audio channels) at the standard PC sample rate (-ar 48000, audio rate=48000 Hz). The ALSA device is hw:Loopback,1. In the ALSA config file .asoundrc given above, you can see we send all computer audio to hw:Loopback,0. Something sent to play on hw:Loopback,0 is made available to record as hw:Loopback,1, that’s just the convention for how snd-aloop devices work.

ffmpeg transforms

-filter_complex '[1:a][1:a]amerge=inputs=2[stereo1] ; [2:a][stereo1]amerge=inputs=2[a]' -ac 2:
All right, this one was a bit of a doozy to figure out. In ffmpeg’s special filter notation, 1:a means “stream #1, audio” (where stream #0 is the first one).

First we take the mic input [1:a][1:a] and convert it from a mono channel to stereo, by just duplicating the sound to both ears (amerge=inputs=2[stereo1]). Then, we combine the stereo mic and the stereo computer audio ([2:a][stereo1]) into a single stereo stream using the default mixer (amerge=inputs=2[a]).

-map '[a]' -map 0:v :
By default, ffmpeg just keeps all the streams around, so we now have one mono, and two stereo streams, and it won’t default to picking the last one. So we manually tell it to keep that last audio stream we made (-map '[a]'), and the video stream from the first input (-map 0:v, the only video stream).

ffmpeg output streams

-f flv -ac 2 -ar 48000:
We want the output format to be Flash video (-f flv) with stereo audio (-ac 2) at 48000Hz (-ar 48000). Why do we want that? Because we’re streaming to Twitch and that’s what Twitch says they want–that’s basically why everything in the output format.

-vcodec libx264 -g 60 -keyint_min 30 -b:v3000k -minrate 3000k -maxrate 3000k -pix_fmt yuv420p -s 1280x720 -preset ultrafast -tune film:
Ah, the magic. Now we do x264 encoding (-vcodec libx264), a modern wonder. A lot of the options here are just what Twitch requests. They want keyframes every 2 seconds (-g 60 -keyint_min 30, where 60=30*2=FPS*2, 30=FPS). They want a constant bitrate (-b:v3000k -minrate 3000k -maxrate 3000k) between 1K-6K/s at the time of writing–I picked 3K because it’s appropriate for 720p video, but you could go with 6K for 1080p. Here are Twitch’s recommendations. The pixel format is standard (-pix_gmt yub720p) and we still don’t want to change the resolution (-s 1280x720). Finally the options you might want to change. You want to set the preset as high as it will go with your computer keeping up–mine sucks (-preset ultrafast, where the options go ultrafast,superfast,veryfast,faster,fast,medium, with a 2-10X jump in CPU power needed for each step). And I’m broadcasting minecraft, which in terms of encoders is close to film (-tune film)–lots of panning, relatively complicated stuff on screen. If you want to re-encode cartoons you want something else.

-c:a libfdk_aac -b:a 160k:
We use AAC (-c:a libfdk_aac). Note that libfdk is many times faster than the default implementation, but it’s not available by default in debian’s ffmpeg for (dumb) license reasons. We use 160k bitrate (-b:a 160k ) audio since I’ve found that’s good, and 96K-160K is Twitch’s allowable range. `-strict normal`

-strict normal: Just an ffmpeg option. Not interesting.
-bufsize 3000k: One second of buffer with CBR video

rtmp://live-sjc.twitch.tv/app/${TWITCH_KEY}:
The twitch streaming URL. Replace ${TWITCH_KEY} with your actual key, of course.

Sources:

  • jrayhawk on IRC (alsa)
  • ffmpeg wiki and docs (pretty good)
  • ALSA docs (not that good)
  • Twitch documentation, which is pretty good once you can find it
  • mark hills on how to set up snd-aloop
Tagged , , , ,

Capturing video on Debian Linux with the Blackmagic Intensity Pro 4K card

Most of this should apply for any linux system, other than the driver install step. Also, I believe most of it applies to DeckLink and Intensity cards as well.

My main source is https://gist.github.com/afriza/879fed4ede539a5a6501e0f046f71463. I’ve re-written for clarity and Debian.

  1. Set up hardware. On the Intensity Pro 4K, I see a black screen on my TV when things are set up correctly (a clear rectangle, not just nothing).
  2. From the Blackmagic site, download “Desktop Video SDK” version 10.11.4 (not the latest). Get the matching “Desktop Video” software for Linux.
  3. Install the drivers. In my case, these were in desktopvideo_11.3a7_amd64.deb.
    After driver install, lsmod | grep blackmagic should show a driver loaded on debian.
    You can check that the PCI card is recognized with lspci | grep Blackmagic (I think this requires the driver but didn’t check)
  4. Update the firmware (optional). sudo BlackmagicFirmwareUpdater status will check for updates available. There were none for me.
  5. Extract the SDK. Move it somewhere easier to type. The relevant folder is Blackmagic DeckLink SDK 10.11.4/Linux/includes. Let’s assume you move that to ~/BM_SDK
  6. Build ffmpeg from source. I’m here copying from my source heavily.

    1. Get the latest ffmpeg source and extract it. Don’t match the debian version–it’s too old to work.
      wget https://ffmpeg.org/releases/ffmpeg-4.2.tar.bz2 && tar xf ffmpeg-*.tar.bz2 && cd ffmpeg-*
    2. Install build deps.
      sudo apt-get install nasm yasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev libvorbis-dev libass-dev
    3. Build.
      PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure --prefix="$HOME/ffmpeg_build" --pkg-config-flags="--static" --extra-cflags="-I$HOME/ffmpeg_build/include -I$HOME/ffmpeg_sources/BMD_SDK/include" --extra-ldflags="-L$HOME/ffmpeg_build/lib" --extra-libs="-lpthread -lm" --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-decklink

      make -j $(`nproc)`

      sudo cp ffmpeg ffprobe /usr/local/bin/

  7. Use ffmpeg.
    ffmpeg -f decklink -list_devices 1 -i dummy should show your device now. Note the name for below.

    ffmpeg -f decklink -list_formats 1 -i 'Intensity Pro 4K' shows supported formats. Here’s what I see for the Intensity Pro 4K:

[decklink @ 0x561bd9881800] Supported formats for 'Intensity Pro 4K':
        format_code     description
        ntsc            720x486 at 30000/1001 fps (interlaced, lower field first)
        pal             720x576 at 25000/1000 fps (interlaced, upper field first)
        23ps            1920x1080 at 24000/1001 fps
        24ps            1920x1080 at 24000/1000 fps
        Hp25            1920x1080 at 25000/1000 fps
        Hp29            1920x1080 at 30000/1001 fps
        Hp30            1920x1080 at 30000/1000 fps
        Hp50            1920x1080 at 50000/1000 fps
        Hp59            1920x1080 at 60000/1001 fps
        Hp60            1920x1080 at 60000/1000 fps
        Hi50            1920x1080 at 25000/1000 fps (interlaced, upper field first)
        Hi59            1920x1080 at 30000/1001 fps (interlaced, upper field first)
        Hi60            1920x1080 at 30000/1000 fps (interlaced, upper field first)
        hp50            1280x720 at 50000/1000 fps
        hp59            1280x720 at 60000/1001 fps
        hp60            1280x720 at 60000/1000 fps
        4k23            3840x2160 at 24000/1001 fps
        4k24            3840x2160 at 24000/1000 fps
        4k25            3840x2160 at 25000/1000 fps
        4k29            3840x2160 at 30000/1001 fps
        4k30            3840x2160 at 30000/1000 fps

Capture some video: ffmpeg -raw_format argb -format_code Hp60 -f decklink -i 'Intensity Pro 4K' test.avi

The format (raw_format and format_code) will vary based on your input settings. In particular, note that-raw_format uyvy422 is the default, which I found did not match my computer output. I was able to switch either the command line or the computer output settings to fix it.

Troubleshooting

  • I’m not running any capture, but passthrough isn’t working. That’s how the Intensity Pro 4K works. Passthrough is not always-on. I’d recommend a splitter if you want this for streaming.
  • ffmpeg won’t compile. Your DeckLink SDK may be too new. Get 10.11.4 instead.
  • I can see a list of formats, but I can’t select one using -format_code. ffmpeg doesn’t recognize the option. Your ffmpeg is too old. Download a newer source.
  • When I look at the video, I see colored bars. The HDMI output turns on during recording. The Intensity Pro 4K outputs this when the resolution, hertz, or color format does not match the input. This also happens if your SDK and driver versions are mismatched.

Sources:

Tagged , ,

github.com archive – Background Research

My current project is to archive git repos, starting with all of github.com. As you might imagine, size is an issue, so in this post I do some investigation on how to better compress things. It’s currently Oct, 2017, for when you read this years later and your eyes bug out at how tiny the numbers are.

Let’s look at the list of repositories and see what we can figure out.

  • Github has a very limited naming scheme. These are the valid characters for usernames and repositories: [-._0-9a-zA-Z].
  • Github has 68.8 million repositories
  • Their built-in fork detection is not very aggressive–they say they have 50% forks, and I’m guessing that’s too low. I’m unsure what github considers a fork (whether you have to click the “fork” button, or whether they look at git history). To be a little more aggressive, I’m looking at collections of repos with the same name instead.There are 21.3 million different respository names. 16.7 million repositories do not share a name with any other repository. Subtracting, that means there 4.6million repository names representing the other 52.1 million possibly-duplicated repositories.
  • Here are the most common repository names. It turns out Github is case-insensitive but I didn’t figure this out until later.
    • hello-world (548039)
    • test (421772)
    • datasciencecoursera (191498)
    • datasharing (185779)
    • dotfiles (120020)
    • ProgrammingAssignment2 (112149)
    • Test (110278)
    • Spoon-Knife (107525)
    • blog (80794)
    • bootstrap (74383)
    • Hello-World (68179)
    • learngit (59247)
    • – (59136)
  • Here’s the breakdown of how many copies of things there are, assuming things named the same are copies:
    • 1 copy (16663356, 24%)
    • 2 copies (4506958, 6.5%)
    • 3 copies (2351856, 3.4%)
    • 4-9 copies (5794539, 8.4%)
    • 10-99 copies (13389713, 19%)
    • 100-999 copies (13342937, 19%)
    • 1000-9999 copies (7922014, 12%)
    • 10000-99999 copies (3084797, 4.5%)
    • 1000000+ copies (1797060, 2.6%)

That’s about everything I can get from the repo names. Next, I downloaded all repos named dotfiles. My goal is to pick a compression strategy for when I store repos. My strategy will include putting repos with the name name on the same disk, to improve deduplication. I figured ‘dotfiles’ was a usefully large dataset, and it would include interesting overlap–some combination of forks, duplicated files, similar, and dissimilar files. It’s not perfect–for example, it probably has a lot of small files and fewer authors than usual. So I may not get good estimates, but hopefully I’ll get decent compression approaches.

Here’s some information about dotfiles:

  • 102217 repos. The reason this doesn’t match my repo list number is that some repos have been deleted or made private.
  • 243G disk size after cloning (233G apparent). That’s an average of 2.3M per repo–pretty small.
  • Of these, 1873 are empty repos taking up 60K each (110M total). That’s only 16K apparent size–lots of small or empty files. An empty repo is a good estimate for per-repo overhead. 60K overhead for every repo would be 6GB total.
  • There are 161870 ‘refs’ objects, or about 1.6 per repo. A ‘ref’ is a branch, basically. Unless a repo is empty, it must have at least one ref (I don’t know if github enforces that you must have a ref called ‘master’).
  • Git objects are how git stores everything.
    • ‘Blob’ objects represent file content (just content). Rarely, blobs can store content other than files, like GPG signatures.
    • ‘Tree’ objects represent directory listings. These are where filenames and permissions are stored.
    • ‘Commit’ and ‘Tag’ objects are for git commits and tags. Makes sense. I think only annotated tags get stored in the object database.
  • Internally, git both stores diffs (for example, a 1 line file change is represented as close to 1 line of actual disk storage), and compresses the files and diffs. Below, I list a “virtual” size, representing the size of the uncompressed object, and a “disk” size representing the actual size as used by git.For more information on git internals, I recommend the excellent “Pro Git” (available for free online and as a book), and then if you want compression and bit-packing details the fine internals documentation has some information about objects, deltas, and packfile formats.
  • Git object counts and sizes:
    • Blob
      • 41031250 blobs (401 per repo)
      • taking up 721202919141 virtual bytes = 721GB
      • 239285368549 bytes on disk = 239GB (3.0:1 compression)
      • Average size per object: 17576 bytes virtual, 5831 bytes on disk
      • Average size per repo: 7056KB virtual, 2341KB on disk
    • Tree
      • 28467378 trees (278 per repo)
      • taking up 16837190691 virtual bytes = 17GB
      • 3335346365 bytes on disk = 3GB (5.0:1 compression)
      • Average size per object: 591 bytes virtual, 117 bytes on disk
      • Average size per repo: 160KB virtual, 33KB on disk
    • Commit
      • 14035853 commits (137 per repo)
      • taking up 4135686748 virtual bytes = 4GB
      • 2846759517 bytes on disk = 3GB (1.5:1 compression)
      • Average size per object: 295 bytes virtual, 203 bytes on disk
      • Average size per repo: 40KB virtual, 28KB on disk
    • Tag
      • 5428 tags (0.05 per repo)
      • taking up 1232092 virtual bytes = ~0GB
      • 1004941 bytes on disk = ~0GB (1.2:1 compression)
      • Average size: 227 bytes virtual, 185 bytes on disk
      • Average size per repo: 12 bytes virtual, 10 bytes on disk
    • Ref: ~2 refs, above
    • Combined
      • 83539909 objects (817 per repo)
      • taking up 742177028672 virtual bytes = 742GB
      • 245468479372 bytes on disk = 245GB
      • Average size: 8884 bytes virtual, 2938 bytes on disk
    • Usage
      • Blob, 49% of objects, 97% of virtual space, 97% of disk space
      • Tree, 34% of objects, 2.2% of virtual space, 1.3% of disk space
      • Commit, 17% of objects, 0.5% of virtual space, 1.2% of disk space
      • Tags: 0% ish

Even though these numbers may not be representative, let’s use them to get some ballpark figures. If each repo had 600 objects, and there are 68.6 million repos on github, we would expect there to be 56 billion objects on github. At an average of 8,884 bytes per object, that’s 498TB of git objects (164TB on disk). At 40 bytes per hash, it would also also 2.2TB of hashes alone. Also interesting is that files represent 97% of storage–git is doing a good job of being low-overhead. If we pushed things, we could probably fit non-files on a single disk.

Dotfiles are small, so this might be a small estimate. For better data, we’d want to randomly sample repos. Unfortunately, to figure out how deduplication works, we’d want to pull in some more repos. It turns out picking 1000 random repo names gets you 5% of github–so not really feasible.

164TB, huh? Let’s see if there’s some object duplication. Just the unique objects now:

  • Blob
    • 10930075 blobs (106 per repo, 3.8:1 deduplication)
    • taking up 359101708549 virtual bytes = 359GB (2.0:1 dedup)
    • 121217926520 bytes on disk = 121GB (3.0:1 compression, 2.0:1 dedup)
    • Average size per object: 32854 bytes virtual, 11090 bytes on disk
    • Average size per repo: 3513KB virtual, 1186KB on disk
  • Tree
    • 10286833 trees (101 per repo, 2.8:1 deduplication)
    • taking up 6888606565 virtual bytes = 7GB (2.4:1 dedup)
    • 1147147637 bytes on disk = 1GB (6.0:1 compression, 2.9:1 dedup)
    • Average size per object: 670 bytes virtual, 112 bytes on disk
    • Average size per repo: 67KB virtual, 11KB on disk
  • Commit
    • 4605485 commits (45 per repo, 3.0:1 deduplication)
    • taking up 1298375305 virtual bytes = 1.3GB (3.2:1 dedup)
    • 875615668 bytes on disk = 0.9GB (3.3:1 dedup)
    • Average size per object: 282 bytes virtual, 190 bytes on disk
    • Average size per repo: 13KB virtual, 9KB on disk
  • Tag
    • 2296 tags (0.02 per repo, 2.7:1 dedup)
    • taking up 582993 virtual bytes = ~0GB (2.1:1 dedup)
    • 482201 bytes on disk = ~0GB (1.2:1 compression, 2.1:1 dedup)
    • Average size per object: 254 virtual, 210 bytes on disk
    • Average size per repo: 6 bytes virtual, 5 bytes on disk
  • Combined
    • 25824689 objects (252 per repo, 3.2:1 dedup)
    • taking up 367289273412 virtual bytes = 367GB (2.0:1 dedup)
    • 123241172026 bytes of disk = 123GB (3.0:1 compression, 2.0:1 dedup)
    • Average size per object: 14222 bytes virtual, 4772 bytes on disk
    • Average size per repo: 3593KB, 1206KB on disk
  • Usage
    • Blob, 42% of objects, 97.8% virtual space, 98.4% disk space
    • Tree, 40% of objects, 1.9% virtual space, 1.0% disk space
    • Commit, 18% of objects, 0.4% virtual space, 0.3% disk space
    • Tags: 0% ish

All right, that’s 2:1 disk savings over the existing compression from git. Not bad. In our imaginary world where dotfiles are representative, that’s 82TB of data on github (1.2TB non-file objects and 0.7TB hashes)

Let’s try a few compression strategies and see how they fare:

  • 243GB (233GB apparent). Native git compression only
  • 243GB. Same, with ‘git repack -adk’
  • 237GB. As a ‘.tar’
  • 230GB. As a ‘.tar.gz’
  • 219GB. As a’.tar.xz’ We’re only going to do one round with ‘xz -9’ compression, because it took 3 days to compress on my machine.
  • 124GB. Using shallow checkouts. A shallow checkout is when you only grab the current revision, not the entire git history. This is the only compression we try that loses data.
  • 125GB. Same, with ‘git repack -adk’)

Throwing out everything but the objects allows other fun options, but there aren’t any standard tools and I’m out of time. Maybe next time. Ta for now.

Tagged , , , , , , , ,

Getting the Adafruit Pro Trinket 3.3V to work in Arch Linux

I’m on Linux, and here’s what I did to get the Adafruit Pro Trinket (3.3V version) to work. I think most of this should work for other Adafruit boards as well. I’m on Arch Linux, but other distros will be similar, just find the right paths for everything. Your version of udev may vary on older distros especially.

  1. Install the Arduino IDE. If you want to install the adafruit version, be my guest. It should work out of the box, minus the udev rule below. I have multiple microprocessors I want to support, so this wasn’t an option for me.
  2. Copy the hardware profiles to your Arduino install. pacman -Ql arduino shows me that I should be installing to /usr/share/aduino.  You can find the files you need at their source (copy the entire folder) or the same thing is packaged inside of the IDE installs.

    cp adafruit-git /usr/share/arduino/adafruit
    
  3. Re-configure “ATtiny85” to work with avrdude. On arch, pacman -Ql arduino | grep "avrdude.conf says I should edit /usr/share/arduino/hardware/tools/avr/etc/avrdude.conf. Paste this revised “t85” section into avrdude.conf (credit to the author)

  4. Install a udev rule so you can program the Trinket Pro as yourself (and not as root).

    # /etc/udev/rules.d/adafruit-usbtiny.rules
    SUBSYSTEM=="usb", ATTR{product}=="USBtiny", ATTR{idProduct}=="0c9f", ATTRS{idVendor}=="1781", MODE="0660", GROUP="arduino"
    
  5. Add yourself as an arduino group user so you can program the device with usermod -G arduino -a <username>. Reload the udev rules and log in again to refresh the groups you’re in. Close and re-open the Arduino IDE if you have it open to refresh the hardware rules.

  6. You should be good to go! If you’re having trouble, start by making sure you can see the correct hardware, and that avrdude can recognize and program your device with simple test programs from the command link. The source links have some good specific suggestions.

Sources:
http://www.bacspc.com/2015/07/28/arch-linux-and-trinket/
http://andijcr.github.io/blog/2014/07/31/notes-on-trinket-on-ubuntu-14.04/

Tagged , , , , , ,

Archiving all bash commands typed

This one’s a quickie. Just a second of my config to record all bash commands to a file (.bash_eternal_history) forever. The default bash HISTFILESIZE is 500. Setting it to a non-numeric value will make the history file grow forever (although not your actual history size, which is controlled by HISTSIZE).

I do this in addition:

#~/.bash.d/eternal-history
# don't put duplicate lines in the history
HISTCONTROL=ignoredups
# append to the history file, don't overwrite it
shopt -s histappend
# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTFILESIZE=infinite
# Creates an eternal bash log in the form
# PID USER INDEX TIMESTAMP COMMAND
export HISTTIMEFORMAT="%s "

PROMPT_COMMAND="${PROMPT_COMMAND:+$PROMPT_COMMAND ; }"'echo $$ $USER \
"$(history 1)" >> ~/.bash_eternal_history'
Tagged , , ,

Archiving all web traffic

Today I’m going to walk through a setup on how to archive all web (HTTP/S) traffic passing over your Linux desktop. The basic approach is going to be to install a proxy which records traffic. It will record the traffic to WARC files. You can’t proxy non-HTTP traffic (for example, chat or email) because we’re using an HTTP proxy approach.

The end result is pretty slow for reasons I’m not totally sure of yet. It’s possible warcproxy isn’t streaming results.

  1. Install the server

    # pip install warcproxy
    
  2. Make a warcprox user to run the proxy as.

    # useradd -M --shell=/bin/false warcprox
    
  3. Make a root certificate. You’re going to intercept HTTPS traffic by pretending to be the website, so if anyone gets ahold of this, they can fake being every website to you. Don’t give it out.

    # mkdir /etc/warcprox
    # cd /etc/warcprox
    # sudo openssl genrsa -out ca.key 409
    # sudo openssl req -new -x509 -key ca.key -out ca.crt
    # cat ca.crt ca.key >ca.pem
    # chown root:warcprox ca.pem ca.key
    # chmod 640 ca.pem ca.key
    
  4. Set up a directory where you’re going to store the WARC files. You’re saving all web traffic, so this will get pretty big.

    # mkdir /var/warcprox
    # chown -R warcprox:warcprox /var/warcprox
    
  5. Set up a boot script for warcproxy. Here’s mine. I’m using supervisorctl rather than systemd.

    #/etc/supervisor.d/warcprox.ini
    [program:warcprox]
    command=/usr/bin/warcprox -p 18000 -c /etc/warcprox/ca.pem --certs-dir ./generated-certs -g sha1
    directory=/var/warcprox
    user=warcprox
    autostart=true
    autorestart=unexpected
    
  6. Set up any browers, etc to use localhost:18000 as your proxy. You could also do some kind of global firewall config. Chromium in particular was pretty irritating on Arch Linux. It doesn’t respect $http_proxy, so you have to pass it separate options. This is also a good point to make sure anything you don’t want recorded BYPASSES the proxy (for example, maybe large things like youtube, etc).

Tagged , , , ,

Mail filtering with Dovecot

This expands on my previous post about how to set up an email server.

We’re going to set up a few spam filters in Dovecot under Debian. We’re going to use Sieve, which lets the user set up whichever filters they want. However, we’re going to run a couple pre-baked spam filters regardless of what the user sets up.

  1. Install Sieve.

    sudo apt-get install dovecot-sieve dovecot-managesieved
    
  2. Add Sieve to Dovecot

    # /etc/dovecot/dovecot.conf
    # Sieve and ManageSieve
    protocols = $protocols sieve
    protocol lmtp {
     mail_plugins = $mail_plugins sieve
    }
    service managesieve-login {
     inet_listener sieve {
     port = 4190
     }
    }
    protocol sieve {
     managesieve_logout_format = bytes ( in=%i : out=%o )
    }
    plugin {
     # Settings for the Sieve and ManageSieve plugin
     sieve = file:~/sieve;active=~/.dovecot.sieve
     sieve_before = /etc/dovecot/sieve.d/
     sieve_dir = ~/sieve # For old version of ManageSieve
     #sieve_extensions = +vnd.dovecot.filter
     #sieve_plugins = sieve_extprograms
    }
    
  3. Install and update SpamAssassin, a heuristic perl script for spam filtering.

    sudo apt-get install spamasssassin
    sudo sa-update
    
    # /etc/default/spamassassin
    ENABLED=1
    #CRON=1 # Update automatically
    
    # /etc/spamassassin/local.cf
    report_safe 0 # Don't modify headers
    
    sudo service spamassassin start
    
  4. There’s a lot of custom configuration and training you should do to get SpamAssassin to accurately categorize what you consider spam. I’m including a minimal amount here. The following will train SpamAssassin system-wide based on what users sort into spam folders.

    #!/bin/sh
    # /etc/cron.daily/spamassassin-train
    all_folders() {
            find /var/mail/vmail -type d -regextype posix-extended -regex '.*/cur|new$'
    }
    
    all_folders | grep "Spam" | sa-learn --spam -f - >/dev/null 2>/dev/null
    all_folders | grep -v "Spam" | sa-learn --ham -f - >/dev/null 2>/dev/null
    
  5. Make Postfix run SpamAssassin as a filter, so that it can add headers as mail comes in.

    # /etc/postfix/master.cf
    smtp inet n - - - - smtpd
     -o content_filter=spamassassin
    # ...
    spamassassin unix - n n - - pipe user=debian-spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}
    
    sudo service postfix restart
    
  6. Add SpamAssassin to Sieve. Dovecot (via Sieve) will now move messages with spam headers from SpamAssassin to your spam folder. Make sure you have a “Spam” folder and that it’s set to autosubscribe.

    # /etc/dovecot/sieve.d/spam-assassin.sieve
    require ["fileinto"];
    # Move spam to spam folder
    if header :contains "X-Spam-Flag" "YES" {
     fileinto "Spam";
     # Stop here - if there are other rules, ignore them for spam messages
     stop;
    }
    
    cd /etc/dovecot/sieve.d
    sudo sievec spam-assassin.sieve
    
  7. Restart Dovecot

    sudo service dovecot restart
    
  8. Test spam. The GTUBE is designed to definitely get rejected. Set the content of your email to this:

    XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X
    
  9. You should also be able to create user-defined filters in Sieve, via the ManageSieve protocol. I tested this using a Sieve thunderbird extension. You’re on your own here.

Tagged , , , , ,

Installing email with Postfix and Dovecot (with Postgres)

I’m posting my email setup here. The end result will:

  • Use Postfix for SMTP
  • Use Dovecot for IMAP and authentication
  • Store usernames, email forwards, and passwords in a Postgres SQL database
  • Only be accessible over encrypted channels
  • Pass all common spam checks
  • Support SMTP sending and IMAP email checking. I did not include POP3 because I don’t use it, but it should be easy to add
  • NOT add spam filtering or web mail (this article is long enough as it is, maybe in a follow-up)

Note: My set up is pretty standard, except that rDNS for smtp.za3k.com resolves to za3k.com because I only have one IP. You may need to change your hostnames if you’re using mail.example.com or smtp.example.com.

On to the install!

  1. Install debian packages

    sudo apt-get install postfix # Postfix \
          dovecot-core dovecot-imapd dovecot-lmtpd # Dovecot \
          postgresql dovecot-pgsql postfix-pgsql # Postgres \
          opendkim opendkim-tools # DKIM
    
  2. Set up security. smtp.za3k.com cert is at /etc/certs/zak3.com.pem, the key is at /etc/ssl/private/smtp.za3k.com.key. dhparams for postfix are at /etc/postfix/dhparams.pem. (If you need a certificate and don’t know how to get one, you can read Setting up SSL certificates using StartSSL)

  3. Install Postfix

    # /etc/postfix/master.cf
    smtp       inet  n       -       -       -       -       smtpd
    submission inet  n       -       -       -       -       smtpd
      -o syslog_name=postfix/submission
      -o smtpd_tls_security_level=encrypt
      -o smtpd_sasl_auth_enable=yes
      -o smtpd_reject_unlisted_recipient=no
      -o milter_macro_daemon_name=ORIGINATING
    
    # /etc/postfix/main.cf additions
    # TLS parameters
    smtpd_tls_cert_file=/etc/ssl/certs/smtp.za3k.com.pem
    smtpd_tls_key_file=/etc/ssl/private/smtp.za3k.com.key
    smtpd_use_tls=yes
    smtpd_tls_mandatory_protocols=!SSLv2,!SSLv3
    smtp_tls_mandatory_protocols=!SSLv2,!SSLv3
    smtpd_tls_protocols=!SSLv2,!SSLv3
    smtp_tls_protocols=!SSLv2,!SSLv3
    smtpd_tls_exclude_ciphers = aNULL, eNULL, EXPORT, DES, RC4, MD5, PSK, aECDH, EDH-DSS-DES-CBC3-SHA, EDH-RSA-DES-CDC3-SHA, KRB5-DE5, CBC3-SHA
    
    # Relay and recipient settings
    myhostname = za3k.com
    myorigin = /etc/mailname
    mydestination = za3k.com, smtp.za3k.com, localhost.com, localhost
    relayhost =
    mynetworks_style = host
    mailbox_size_limit = 0
    inet_interfaces = all
    smtpd_relay_restrictions = permit_mynetworks,
      permit_sasl_authenticated,
      reject_unauth_destination
    
    alias_maps = hash:/etc/aliases
    local_recipient_maps = $alias_maps
    mailbox_transport = lmtp:unix:private/dovecot-lmtp
    
  4. Install Dovecot

    # /etc/dovecot/dovecot.cf
    mail_privileged_group = mail # Local mail
    disable_plaintext_auth = no
    
    protocols = imap
    
    ssl=required
    ssl_cert = </etc/ssl/certs/imap.za3k.com.pem
    ssl_key = </etc/ssl/private/imap.za3k.com.key
    
    # IMAP Folders
    namespace {
     inbox = yes
     mailbox Trash {
     auto = create
     special_use = \Trash
     }
     mailbox Drafts {
     auto = no
     special_use = \Drafts
     }
     mailbox Sent {
     auto = subscribe
     special_use = \Sent
     }
     mailbox Spam {
     auto = subscribe
     special_use = \Junk
     }
    }
    
    # Expunging / deleting mail should FAIL, use the lazy_expunge plugin for this
    namespace {
     prefix = .EXPUNGED/
     hidden = yes
     list = no
     location = maildir:~/expunged
    }
    mail_plugins = $mail_plugins lazy_expunge
    plugin {
     lazy_expunge = .EXPUNGED/
    }
    
    # /etc/postfix/main.cf
    # SASL authentication is done through Dovecot to let users relay mail
    smtpd_sasl_type = dovecot
    smtpd_sasl_path = private/auth
    
  5. Set up the database and virtual users. Commands

    # Create the user vmail for storing virtual mail
    # vmail:x:5000:5000::/var/mail/vmail:/usr/bin/nologin
    groupadd -g 5000 vmail
    mkdir /var/mail/vmail
    useradd -M -d /var/mail/vmail --shell=/usr/bin/nologin -u 5000 -g vmail vmail
    chown vmail:vmail /var/mail/vmail
    chmod 700 /var/mail/vmail
    
    psql -U postgres
    ; Set up the users
    CREATE USER 'postfix' PASSWORD 'XXX';
    CREATE USER 'dovecot' PASSWORD 'XXX';
    
    ; Create the database
    CREATE DATABASE email;
    \connect email
    
    ; Set up the schema 
    
    CREATE TABLE aliases (
        alias text NOT NULL,
        email text NOT NULL
    );
    
    CREATE TABLE users (
        username text NOT NULL,
        domain text NOT NULL,
        created timestamp with time zone DEFAULT now(),
        password text NOT NULL
    );
    
    REVOKE ALL ON TABLE aliases FROM PUBLIC;
    GRANT ALL ON TABLE aliases TO postfix;
    GRANT ALL ON TABLE aliases TO dovecot;
    
    REVOKE ALL ON TABLE users FROM PUBLIC;
    GRANT ALL ON TABLE users TO dovecot;
    GRANT ALL ON TABLE users TO postfix;
    
    # /etc/dovecot/dovecot.conf
    # Since we're giving each virtual user their own directory under /var/mail/vmail, just use that directly and not a subdirectory
    mail_location = maildir:~/
    
    # /etc/dovecot/dovecot-sql.conf defines the DB queries used for authorization
    passdb {
      driver = sql
      args = /etc/dovecot/dovecot-sql.conf
    }
    userdb {
      driver = prefetch
    }
    userdb {
      driver = sql
      args = /etc/dovecot/dovecot-sql.conf
    }
    
    # /etc/postfix/main.cf
    pgsql:/etc/postfix/pgsql-virtual-aliases.cf
    local_recipient_maps = pgsql:/etc/postfix/pgsql-virtual-mailbox.cf 
    
    # /etc/postfix/pgsql-virtual-aliases.cf
    # hosts = localhost
    user = postfix
    password = XXXXXX
    dbname = email
    
    query = SELECT email FROM aliases WHERE alias='%s'
    
    # /etc/postfix/pgsql-virtual-mailbox.cf
    # hosts = localhost
    user = postfix
    password = XXXXXX
    dbname = email
    
    query = SELECT concat(username,'@',domain,'/') as email FROM users WHERE username='%s'
    
    # /etc/dovecot/dovecot-sql.conf
    driver = pgsql
    connect = host=localhost dbname=email user=dovecot password=XXXXXX
    default_pass_scheme = SHA512
    password_query = SELECT \
      CONCAT(username,'@',domain) as user, \
      password, \
      'vmail' AS userdb_uid, \
      'vmail' AS userdb_gid, \
      '/var/mail/vmail/%u' as userdb_home \
      FROM users \
      WHERE concat(username,'@',domain) = '%u';
    
    user_query = SELECT username, \
      CONCAT('maildir:/var/mail/vmail/',username,'@',domain) as mail, \
      '/var/mail/vmail/%u' as home, \
      'vmail' as uid, \
      'vmail' as gid \
      FROM users \
      WHERE concat(username,'@',domain) = '%u';
    
  6. Set up users. Example user creation:

    # Generate a password
    $ doveadm pw -s sha512 -r 100
    Enter new password: ...
    Retype new password: ...
    {SHA512}.............................................................==
    
    psql -U dovecot -d email
    ; Create a user za3k@za3k.com
    mail=# INSERT INTO users (
        username,
        domain,
        password
    ) VALUES (
        'za3k',
        'za3k.com'
        '{SHA512}.............................................................==',
    );
    
  7. Set up aliases/redirects. Example redirect creation:

    psql -U dovecot -d email
    ; Redirect mail from foo@example.com to bar@example.net
    mail=# INSERT INTO users ( email, alias ) VALUES (
        'bar@example.net',
        'foo@example.com'
    );
    
  8. Test setup locally by hand. Try using TELNET. Test remote setup using STARTSSL. This is similar to the previous step, but to start the connection use:

    openssl s_client -connect smtp.za3k.com:587 -starttls smtp
    

    Make sure to test email to addresses at your domain or that you’ve set up (final destination), and emails you’re trying to send somewhere else (relay email)

    A small digression: port 25 is used for unencrypted email and support STARTTLS, 587 is used for STARTTLS only, and 465 (obsolete) is used for TLS. My ISP, Comcast, blocks access to port 25 on outgoing traffic.

  9. Make sure you’re not running an open relay at http://mxtoolbox.com/diagnostic.aspx

  10. Set your DNS so that the MX record points at your new mailserver. You’ll probably want a store and forward backup mail server (mine is provided by my registrar). Email should arrive at your mail server from now on. This is the absolute minimum setup. Everything from here on is to help the community combat spam (and you not to get blacklisted).
  11. Set up DKIM (DomainKeys Identified Mail). DKIM signs outgoing mail to show that it’s from your server, which helps you not get flagged as spam.
    None of these files or folders exist to begin with in debian.

    # Add to /etc/opendkim.conf
    KeyTable                /etc/opendkim/KeyTable
    SigningTable            /etc/opendkim/SigningTable
    ExternalIgnoreList      /etc/opendkim/TrustedHosts
    InternalHosts           /etc/opendkim/TrustedHosts
    LogWhy yes
    
    # /etc/opendkim/TrustedHosts
    127.0.0.1
    [::1]
    localhost
    za3k.com
    smtp.za3k.com
    
    mkdir -p /etc/opendkim/keys/za3k.com
    cd /etc/opendkim/keys/za3k.com
    opendkim-genkey -s default -d za3k.com
    chown opendkim:opendkim default.private
    
    # /etc/opendkim/KeyTable
    default._domainkey.za3k.com za3k.com:default:/etc/opendkim/keys/za3k.com/default.private
    
    # /etc/opendkim/SigningTable
    za3k.com default._domainkey.za3k.com
    

    Display the DNS public key to set in a TXT record with:

    # sudo cat /etc/opendkim/keys/za3k.com/default.txt
    default._domainkey      IN      TXT     ( "v=DKIM1; k=rsa; "
              "p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCggdv3OtQMek/fnu+hRrHYZTUcpUFcSGL/+Sbq+GffR98RCgabx/jjPJo3HmqsB8czaXf7yjO2UiSN/a8Ae6/yu23d7hyTPUDacatEM+2Xc4/zG+eAlAMQOLRJeo3z53sNiq0SmJET6R6yH4HCv9VkuS0TQczkvME5hApft+ZedwIDAQAB" )  ; ----- DKIM
    
    # My registrar doesn't support this syntax so it ends up looking like: 
    $ dig txt default._domainkey.za3k.com txt
    default._domainkey.za3k.com. 10800 IN   TXT     "v=DKIM1\; k=rsa\; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCggdv3OtQMek/fnu+hRrHYZTUcpUFcSGL/+Sbq+GffR98RCgabx/jjPJo3HmqsB8czaXf7yjO2UiSN/a8Ae6/yu23d7hyTPUDacatEM+2Xc4/zG+eAlAMQOLRJeo3z53sNiq0SmJET6R6yH4HCv9VkuS0TQczkvME5hApft+ZedwIDAQAB"
    
    # Uncomment in /etc/default/opendkim
    SOCKET="inet:12345@localhost" # listen on loopback on port 12345
    
    # /etc/postfix/main.cf
    # DKIM
    milter_default_action = accept
    milter_protocol = 6
    smtpd_milters = inet:localhost:12345
    non_smtpd_milters = inet:localhost:12345
    
  12. Set up SPF (Sender Policy Framework). SPF explains to other services which IPs can send email on your behalf. You can set up whatever policy you like. A guide to the syntax is at: http://www.openspf.org/SPF_Record_Syntax.  Mine is

    @ 10800 IN TXT "v=spf1 +a:za3k.com +mx:za3k.com ~all"
    

    You should also be verifying this on your end as part of combating spam, but as far as outgoing mail all you need to do is add a TXT record to your DNS record.

  13. Set your rDNS (reverse DNS) if it’s not already. This should point at the same hostname reported by Postfix during SMTP. This will be handled by whoever assigns your IP address (in my case, my hosting provider).

  14. Test your spam reputability using https://www.mail-tester.com or https://www.port25.com/support/authentication-center/email-verification. You can monitor if you’re on any blacklists at http://mxtoolbox.com/blacklists.aspx.
  15. Set up DMARC. DMARC declares your policy around DKIM being mandatory. You can set up whatever policy you like.  Mine is

    _dmarc 10800 IN TXT "v=DMARC1;p=reject;aspf=s;adkim=s;pct=100;rua=mailto:postmaster@za3k.com"
    

My sources writing this:

Takeaways

  • You can set up store-and-forward mail servers, so if your mail server goes down, you don’t lose all the email for that period. It’s generally a free thing.
  • Postfix’s configuration files were badly designed and crufty, so you might pick a different SMTP server.
  • Email was REALLY not designed to do authentication, which is why proving you’re not a spammer is so difficult. This would all be trivial with decent crypto baked in (or really, almost any backwards-incompatible change)
  • The option to specify a SQL query as a configuration file option is wonderful. Thanks, Dovecot.
  • Overall, although it was a lot of work, I do feel like it was worth it to run my own email server.
Tagged , , , , ,

Dependency Resolution in Javascript

Sometimes I have a bunch of dependencies. Say, UI components that need other UI components to be loaded. I’d really just like to have everything declare dependencies and magically everything is loaded in the right order. It turns out that if use “require” type files this isn’t bad (google “dependency injection”), but for anything other than code loading you’re a bit lost. I did find dependency-graph, but this requires the full list of components to run. I wanted a version would you could add components whenever you wanted–an online framework.

My take is here: https://github.com/vanceza/dependencies-online

It has no requirements, and is available on npm as dependencies-online.

Tagged , , ,