Storage Prices 2022-07

I did a survey of the cost of buying hard drives (of all sorts), microsd/sd, USB sticks, CDs, DVDs, Blu-rays, and tape media (for tape drives).

Here are the 2022-07 results: https://za3k.com/archive/storage-2022-07.sc.txt

2020-01: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Useful conclusions:

  • Used or refurbished items were excluded. Multi-packs (5 USB sticks) were excluded except for optical media. Seagate drives were excluded, because they are infamous for having a high failure rate and bad returns process.
  • Per TB, the cheapest options are:
    • Tape media (LTO-8) at $4.74/TB, but I recommend against it. Tape drives are expensive ($3300 for LTO-8 new), giving a breakeven with HDDs at 350-400TB. Also, the world is down to only one tape drive manufacturer, so you could end up screwed in the future.
    • 3.5″ internal spinning hard drives, at $13.75/TB. Currently the best option is 4TB drives.
    • 3.5″ external spinning hard drives, at $17.00/TB. Currently the best is 18TB WD drives. If you want internal drives, you can buy external ones and open them up, although it voids your warranty.
    • 2.5″ external spinning hard drives, at $24.50/TB. 4-5TB is best.
    • Blu-ray disks, at $23.16: 25GB is cheapest, then 50GB ($32.38/TB), then 100GB ($54.72/TB).
  • Be very careful buying internal hard drives online, and try to use a first-party seller. There are a lot of fake sellers and sellers who don’t actually provide a warranty. This is new in the last few years.

Changes since the last survey 2 years ago:

  • Amazon’s search got much worse again. More sponsored listings, still refurbished drives.
  • Sketchy third-party sellers are showing up on Amazon, and other vendors. At this point the problem is people not getting what they order, or getting it but without a promised warranty. I tried to filter out such Amazon sellers. I had trouble, even though I do the survey by hand. At this point it would be hard to safely buy an internal hard drive on Amazon.
  • Spinning drives: Prices have not significantly dropped or risen for spinning hard drives, since 2020.
  • Spinning drives: 18TB and 20TB 3.5″ hard drives became available
  • SSDs: 8TB is available (in both 2.5 inch and M.2 formats)
  • SSDs: Prices dropped by about half, per TB. The cheapest overall drives dropped about 30%.
  • USB: 2TB dropped back off the market, and appears unavailable.
  • USB: On the lower end, USB prices rose almost 2X. On the higher end, they dropped.
  • MicroSD/SD: Prices dropped
  • MicroSD/SD: A new player entered the cheap-end flash market, TEAMGROUP. Based on reading reviews, they make real drives, and sell them cheaper than they were available before. Complaints of buffer issues or problems with sustained write speeds are common.
  • MicroSD/SD: It’s no longer possible to buy slow microsd/sd cards, which is good. Basically everything is class 10 and above.
  • MicroSD/SD: Combine microsd and sd to show price comparison
  • Optical: Mostly optical prices did not change. 100GB Blu-Ray dropped by 60-70%. Archival Blu-Ray, too.
  • Tape: LTO-9 is available.
  • Tape: The cost of LTO-8 tape dropped 50%, which makes it the cheapest option.
  • Tape: This is not new, but there is still only one tape drive manufacturer (HP) since around the introduction of LTO-8.

Postmortem: bs-store

Between 2020-03-14 and 2020-12-03 I ran an experimental computer storage setup. I movied or copied 90% of my files into a content-addressable storage system. I’m doing a writeup of why I did it, how I did it, and why I stopped. My hope is that it will be useful to anyone considering using a similar system.

The assumption behind this setup, is that 99% of my files never change, so it’s fine to store only one, static copy of them. (Think movies, photos… they’re most of your computer space, and you’re never going to modify them). There are files you change, I just didn’t put them into this system. If you run a database, this ain’t for you.

Because I have quite a lot of files and 42 drives (7 in my computer, ~35 in a huge media server chassis), there is a problem of how to organize files across drives. To explain why it’s a problem, let’s look at the two default approaches:

  • One Block Device / RAID 0. Use some form of system that unifies block devices, such as RAID0 or a ZFS’s striped vdevs. Writing files is very easy, you see a single 3000GB drive.
    • Many forms of RAID0 use striping. Striping splits each file across all available drives. 42 drives could spin up to read one file (wasteful).
    • You need all the drives mounted to read anything–I have ~40 drives, and I’d like a solution that works if I move and can’t keep my giant media server running. Also, it’s just more reassuring that nothing can fail if you can read each drive individually.
  • JBOD / Just a Bunch of Disks. Label each drive with a category (ex. ‘movies’), and mount them individually.
    • It’s hard to aim for 100% (or even >80%) drive use. Say you have 4x 1000GB drives, and you have 800GB movies, 800GB home video footage, 100GB photographs, and 300GB datasets. How do you arrange that? One drive per dataset is pretty wasteful, as everything fits on 3. But, with three drives, you’ll need to split at least one dataset across drives. Say you put together 800GB home video and 100GB photographs. If you get 200GB more photographs, do you split a 300 GB collection across drives, or move the entire thing to another drive? It’s a lot of manual management and shifting things around for little reason.

Neither approach adds any redundancy, and 42 drives is a bit too many to deal with for most things. Step 1 is to split the 42 drives into 7 ZFS vdevs, each with 2-drive redundancy. That way, if a drive fails or there is a small data corruption (likely), everything will keep working. So now we only have to think about accessing 7 drives (but keep in mind, many physical drives will spin up for each disk access).

The ideal solution:

  • Will not involve a lot of manual management
  • Will fill up each drive in turn to 100%, rather than all drives at an equal %.
  • Will deduplicate identical content (this is a “nice to have”)
  • Will only involve accessing one drive to access one file
  • Will allow me to get and remove drives, ideally across heterogenous systems.

I decided a content-addressable system was ideal for me. That is, you’d look up each file by its hash. I don’t like having an extra step to access files, so files would be accessed by symlink–no frontend program. Also, it was important to me that I be able to transparently swap out the set of drives backing this. I wanted to make the content-addressable system basically a set of 7 content-addressable systems, and somehow wrap those all into one big content-addressable system with the same interface. Here’s what I settled on:

  • (My drives are mounted as /zpool/bs0, /zpool/bs1, … /zpool/bs6)
  • Files will be stored in each pool in turn by hash. So my movie ‘cat.mpg’ with sha hash ‘8323f58d8b92e6fdf190c56594ed767df90f1b6d’ gets stored in /zpool/bs0/83/23/f58d8b9 [shortened for readability]
  • Initially, we just copy files into the content-addressable system, we don’t delete the original. I’m cautious, and I wanted to make everything worked before getting rid of the originals.
  • To access a file, I used read-only unionfs-fuse for this. This checks each of /zpool/bs{0..6}/<hash> in turn. So in the final version, /data/movies/cat.mpg would be a symlink to ‘/bs-union/83/23/f58d8b9’
  • We store some extra metadata on the original file (if not replaced by a symlink) and the storied copy–what collection it’s part of, when it was added, how big it is, what it’s hash is, etc. I chose to use xattrs.

The plan here is that it would be really easy to swap out one backing blockstore of 30GB, for two of 20GB–just copy the files to the new drives and add it to the unionfs.

Here’s what went well:

  • No problems during development–only copying files meant it was easy and safe to debug prototypes.
  • Everything was trivial to access (except see note about mounting disks below)
  • It was easy to add things to the system
  • Holding off on deleting the original content until I was 100% out of room on my room disks, meant it was easy to migrate off of, rather risk-free
  • Running the entire thing on top of zfs ZRAID2 was the right decision, I had no worries about failing drives or data corruption, despite a lot of hardware issues developing at one point.
  • My assumption that files would never change was correct. I made the unionfs filesystem read-only as a guard against error, but it was never a problem.
  • Migrating off the system went smoothly

Here are the implementation problems I found

  • I wrote the entire thing as bash scripts operating directly on files, which was OK for access and putting stuff in the store, but just awful for trying to get an overview of data or migrating things. I definitely should have used a database. I maybe should have used a programming language.
  • Because there was no database, there wasn’t really any kind of regular check for orphans (content in the blobstore with no symlinks to it), and other similar checks.
  • unionfs-fuse suuucks. Every union filesystem I’ve tried sucks. Its read bandwidth is much lower than the component devices (unclear, probably), it doesn’t cache where to look things up, and it has zero xattrs support (can’t read xattrs from the underlying filesystem).
  • gotcha: zfs xattrs waste a lot of space by default, you need to reconfigure the default.

But the biggest problem was disk access patterns:

  • I thought I could cool 42 drives spinning, or at least a good portion of them. This was WRONG by far, and I am not sure how possible it is in a home setup. To give you an idea how bad this was, I had to write a monitor to shut off my computer if the drives went above 60C, and I was developing fevers in my bedroom (where the server is) from overheating. Not healthy.
  • unionfs has to check each backing drive. So we see 42 drives spin up. I have ideas on fixing this, but it doesn’t deal with the other problems
  • To fix this, you could use double-indirection.
    • Rather than pointing a symlink at a unionfs: /data/cat.mpg -> /bs-union/83/23/f58d8b9 (which accesses /zpool/bs0/83/23/f58d8b9)
    • Point a symlink at another symlink that points directly to the data: /data/cat.mpg -> /bs-indirect/83/23/f58d8b9 -> /zpool/bs0/83/23/f58d8b9
  • The idea is that backing stores are kinda “whatever, just shove it somewhere”. But, actually it would be good to have a collection in one place–not only to make it easy to copy, but to spin up only one drive when you go through everything in a collection. It might even be a good idea to have a separate drive for more frequently-accessed content. This wasn’t a huge deal for me since migrating existing content meant it coincidentally ended up pretty localized.
  • Because I couldn’t spin up all 42 drives, I had to keep a lot of the array unmounted, and mount the drives I needed into the unionfs manually.

So although I could have tried to fix things with double-indirection, I decided there were some other disadvantages to symlinks: estimating sizes, making offsite backups foolproof. I decided to migrate off the system entirely. The migration went well, although it required running all the drives at once, so some hardware errors popped up. I’m currently on a semi-JBOD system (still on top of the same 7 ZRAID2 devices).

Hopefully this is useful to someone planning a similar system someday. If you learned something useful, or there are existing systems I should have used, feel free to leave a comment.

Storage Prices 2020-01

I did a survey of the cost of buying hard drives (of all sorts), CDs, DVDs, Blue-rays, and tape media (for tape drives).

Here are the 2020-01 results: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Changes this year

  • I excluded Seagate drives (except where they’re the only drives in class)
  • Amazon’s search got much worse, and they started having listings for refurbished drives
  • Corrected paper archival density, added photographic film
  • Added SSDs (both 2.5″ and M.2 formats)
  • Prices did not go up or down significantly in the last 6 months.

Some conclusions that are useful to know

  • The cheapest option is tape media, but tape reader/writers for LTO 6, 7, and 8 are very expensive.
  • The second-cheapest option is to buy external hard drives, and then open the cases and take out the hard drives. This gives you reliable drives with no warrantee.
  • Blu-ray and DVD are more expensive than buying hard drives