Flash media longevity testing -- 5 years later

By admin | January 2, 2025 - 06:00 pm |January 2, 2025 Technical

Year 0 – I filled 10 32-GB Kingston flash drives with random data.
Year 1 – Tested drive 1, zero bit rot. Re-wrote drive 1 with the same data.
Year 2 – Tested drive 2, zero bit rot. Re-tested drive 1, zero bit rot. Re-wrote drives 1-2 with the same data.
Year 3 – Tested drive 3, zero bit rot. Re-tested drives 1-2, zero bit rot. Re-wrote drives 1-3 with the same data.
Year 4 – Tested drive 4, zero bit rot. Re-tested drives 1-3, zero bit rot. Re-wrote drives 1-4 with the same data.
Year 5 - Re-tested drives 1-3, zero bit rot. Re-wrote drives 1-3 with the same data.

Will report back in 1 more year when I test drive 5.

The full test plan is available in the year 4 blog post

FAQ: https://blog.za3k.com/usb-flash-longevity-testing-year-2/

Tagged archiving, research, slow, usb

leave comment

Flash media longevity testing – 4 years later

By admin | January 1, 2024 - 11:55 am |January 1, 2024 Technical

Year 0 – I filled 10 32-GB Kingston flash drives with random data.
Year 1 – Tested drive 1, zero bit rot. Re-wrote drive 1 with the same data.
Year 2 – Tested drive 2, zero bit rot. Re-tested drive 1, zero bit rot. Re-wrote drives 1-2 with the same data.
Year 3 – Tested drive 3, zero bit rot. Re-tested drives 1-2, zero bit rot. Re-wrote drives 1-3 with the same data.
Year 4 – Tested drive 4, zero bit rot. Re-tested drives 1-3, zero bit rot. Re-wrote drives 1-4 with the same data.

Will report back in 2 more years when I test the fifth. Since flash drives are likely to last more than 10 years, the plan has never been “test one new one each year”.

The years where I’ll first touch a new drive (assuming no errors) are: 1, 2, 3, 4, 6, 8, 11, 15, 20, 27

The full test plan:

YEAR 1: read+write  1                           [1s]
YEAR 2: read+write  1, 2                        [1s]
YEAR 3: read+write  1, 2, 3                     [1s]
YEAR 4: read+write  1, 2, 3, 4                  [2s] (every 2nd year)
year 5: read+write  1, 2, 3,
YEAR 6: read+write  1, 2, 3, 4  5               [2s]
year 7: read+write  1, 2, 3,
YEAR 8: read+write  1, 2, 3, 4, 5, 6            [2s]
year 9: read+write  1, 2, 3,
year 10: read+write 1, 2, 3, 4, 5, 6
YEAR 11: read+write 1, 2, 3,         7          [4s]
year 12: read+write 1, 2, 3, 4, 5, 6
year 13: read+write 1, 2, 3
year 14: read+write 1, 2, 3, 4, 5, 6
YEAR 15: read+write 1, 2, 3,         7, 8       [4s]
year 16: read+write 1, 2, 3, 4, 5, 6
year 17: read+write 1, 2, 3
year 18: read+write 1, 2, 3, 4, 5, 6
year 19: read+write 1, 2, 3,         7, 8
YEAR 20: read+write 1, 2, 3, 4, 5, 6       9    [8s]
year 21: read+write 1, 2, 3
year 22: read+write 1, 2, 3, 4, 5, 6
read 23: read+write 1, 2, 3          7, 8
year 24: read+write 1, 2, 3, 4, 5, 6
year 25: read+write 1, 2, 3
year 26: read+write 1, 2, 3, 4, 5, 6
YEAR 27: read+write 1, 2, 3          7, 8,   10 [8s]
year 28: read+write 1, 2, 3, 4, 5, 6       9
year 29+: repeat years 21-28

FAQ: https://blog.za3k.com/usb-flash-longevity-testing-year-2/

Tagged archiving, research, slow, usb

leave comment

Storage Prices 2023-01

By admin | January 9, 2023 - 10:08 pm |January 9, 2023 Technical

I did a survey of the cost of buying hard drives (of all sorts), microsd/sd, USB sticks, CDs, DVDs, Blu-rays, and tape media (for tape drives).

I excluded used/refurbished options. Multi-packs (5 USB sticks) were excluded, except for optical media like CD-ROMs. Seagate drives were excluded because Seagate has a poor reputation.

Here are the 2023-01 results: https://za3k.com/archive/storage-2023-01.sc.txt

2022-07: https://za3k.com/archive/storage-2022-07.sc.txt
2020-01: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Per TB, the options are (from cheapest to most expensive):

Tape media (LTO-8) at $4.52/TB, but I recommend against it. A tape drive is about $1,600 (twice that new). That’s a breakeven at 150-300TB. Also, the world is down to one tape drive manufacturer, so you may end up screwed in the future.
3.5″ internal spinning hard drives, at $15.00/TB. Currently the best option is 8TB drives.
Optical media, at $16.71/TB. 25GB blu-ray disks are cheapest.
3.5″ external hard drives, at $17.75/TB. Currently the best option is 18TB drives.
2.5″ portable spinning hard drives, at $22.00/TB. Currently the best option is 5TB drives.
SSD drives, at $42-$46/TB. Best option is 1TB.
USB sticks, at $59/TB. Best option is 128GB sticks.
MicroSD cards, at $62/TB. Best option is 512GB cards.

Changes since the last survey (4 months ago):

Amazon’s search improved. Less refurbished drives and sponsored listings.
Spinning drives: 22TB 3.5″ drives became available
Spinning drives: Prices for the previous cheapest option (4TB) rose, making 8TB the new cheap option.
SSDs: Prices dropped by about 30%.
MicroSD/SD: Prices dropped slightly.
Optical: The cheapest option (25GB blu-ray) dropped 30%.
Optical: I stopped gathering data on the cost of BR-RE
Tape: LTO-7 tape drives are now available used, halving the break-even point on tape.

Tagged archiving, research, storage

leave comment

2023 Flash media longevity testing (3 years later)

By admin | January 9, 2023 - 11:30 am |January 9, 2023 Non-Technical, Technical

Year 0 – I filled 10 32-GB Kingston flash drives with random data.
Year 1 – Tested drive 1, zero bit rot. Re-wrote drive 1 with the same data.
Year 2 – Tested drive 2, zero bit rot. Re-tested drive 1, zero bit rot. Re-wrote drives 1-2 with the same data.
Year 3 – Tested drive 3, zero bit rot. Re-tested drives 1-2, zero bit rot. Re-wrote drives 1-3 with the same data.

This year they were stored in a box on my shelf.

Will report back in 1 more year when I test the fourth 🙂

Tagged archiving, backup, research

leave comment

tty audit logs

By admin | August 18, 2022 - 08:14 pm |August 18, 2022 Technical

I recently wrote a program that records all tty activity. That means bash sessions, ssh, raw tty access, screen and tmux sessions, the lot. I used script. The latest version of my software can be found on github.

Note that it’s been tested only with bash so far, and there’s no encryption built in.

To just record all shell commands typed, use the standard eternal history tricks (bash).

Tagged archiving, linux, system administration

leave comment

Scan Organizer

By admin | July 20, 2022 - 09:43 pm |July 20, 2022 Non-Technical, Technical

I scan each and every piece of paper that passes through my hands. All my old to-do lists, bills people send me in the mail, the manual for my microwave, everything. I have a lot of scans.

scan-organizer is a tool I wrote to help me neatly organize and label everything, and make it searchable. It’s designed for going through a huge backlog by hand over the course of weeks, and then dumping a new set of raw scans in whenever afterwards. I have a specific processing pipeline discussed below. However if you have even a little programming skill, I’ve designed this to be modified to suit your own workflow.

Input and output

The input is some raw scans. They could be handwritten notes, printed computer documents, photos, or whatever.

The final product is that for each file like ticket.jpg, we end up with ticket.txt. This has metadata about the file (tags, category, notes) and a transcription of any text in the image, to make it searchable with grep & co.

---
category: movie tickets
filename: seven psychopaths ticket.jpg
tags:
- cleaned
- categorized
- named
- hand_transcribe
- transcribed
- verified
---
Rialto Cinemas Elmwood
SEVEN PSYCHOPAT
R
Sun Oct 28 1
7:15 PM
Adult $10.50
00504-3102812185308

Rialto Cinemas Gift Cards
Perfect For Movie Lovers!

Here are some screenshots of the process. Apologizies if they’re a little big! I just took actual screenshots.

At any point I can exit the program, and all progress is saved. I have 6000 photos in the backlog–this isn’t going to be a one-session thing for me! Also, everything has keyboard shortcuts, which I prefer.

Phase 1: Rotating and Cropping

First, I clean up the images. Crop them, rotate them if they’re not facing the right way. I can rotate images with keyboard shortcuts, although there are also buttons at the bottom. Once I’m done, I press a button, and scan-organizer advanced to the next un-cleaned photo.

Phase 2: Sorting into folders

Next, I sort things into folders, or “categories”. As I browse folders, I can preview what’s already in that folder.

Phase 3: Renaming Images

Renaming images comes next. For convenience, I can browse existing images in the folder, to help name everything in a standard way.

Phase 4: Tagging images

I tag my images with the type of text. They might be handwritten. Or they might be printed computer documents. You can imagine extending the process with other types of tagging for your use case.

Not yet done: OCR

Printed documents are run through OCR. This isn’t actually done yet, but it will be easy to plug in. I will probably use tesseract.

Phase 5: Transcribing by hand

I write up all my handwritten documents. I have not found any useful handwriting recognition software. I just do it all by hand.

The point of scan-organizer is to filter based on tags. So only images I’ve marked as needing hand transcription are shown in this phase.

Phase 6: Verification

At the end of the whole process, I verify that each image looks good, and is correctly tagged and transcribed.

Tagged archiving, linux, physical, software

leave comment

Storage Prices 2022-07

By admin | July 11, 2022 - 03:45 pm |July 11, 2022 Non-Technical

I did a survey of the cost of buying hard drives (of all sorts), microsd/sd, USB sticks, CDs, DVDs, Blu-rays, and tape media (for tape drives).

Here are the 2022-07 results: https://za3k.com/archive/storage-2022-07.sc.txt

2020-01: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Useful conclusions:

Used or refurbished items were excluded. Multi-packs (5 USB sticks) were excluded except for optical media. Seagate drives were excluded, because they are infamous for having a high failure rate and bad returns process.
Per TB, the cheapest options are:
- Tape media (LTO-8) at $4.74/TB, but I recommend against it. Tape drives are expensive ($3300 for LTO-8 new), giving a breakeven with HDDs at 350-400TB. Also, the world is down to only one tape drive manufacturer, so you could end up screwed in the future.
- 3.5″ internal spinning hard drives, at $13.75/TB. Currently the best option is 4TB drives.
- 3.5″ external spinning hard drives, at $17.00/TB. Currently the best is 18TB WD drives. If you want internal drives, you can buy external ones and open them up, although it voids your warranty.
- 2.5″ external spinning hard drives, at $24.50/TB. 4-5TB is best.
- Blu-ray disks, at $23.16: 25GB is cheapest, then 50GB ($32.38/TB), then 100GB ($54.72/TB).
Be very careful buying internal hard drives online, and try to use a first-party seller. There are a lot of fake sellers and sellers who don’t actually provide a warranty. This is new in the last few years.

Changes since the last survey 2 years ago:

Amazon’s search got much worse again. More sponsored listings, still refurbished drives.
Sketchy third-party sellers are showing up on Amazon, and other vendors. At this point the problem is people not getting what they order, or getting it but without a promised warranty. I tried to filter out such Amazon sellers. I had trouble, even though I do the survey by hand. At this point it would be hard to safely buy an internal hard drive on Amazon.
Spinning drives: Prices have not significantly dropped or risen for spinning hard drives, since 2020.
Spinning drives: 18TB and 20TB 3.5″ hard drives became available
SSDs: 8TB is available (in both 2.5 inch and M.2 formats)
SSDs: Prices dropped by about half, per TB. The cheapest overall drives dropped about 30%.
USB: 2TB dropped back off the market, and appears unavailable.
USB: On the lower end, USB prices rose almost 2X. On the higher end, they dropped.
MicroSD/SD: Prices dropped
MicroSD/SD: A new player entered the cheap-end flash market, TEAMGROUP. Based on reading reviews, they make real drives, and sell them cheaper than they were available before. Complaints of buffer issues or problems with sustained write speeds are common.
MicroSD/SD: It’s no longer possible to buy slow microsd/sd cards, which is good. Basically everything is class 10 and above.
MicroSD/SD: Combine microsd and sd to show price comparison
Optical: Mostly optical prices did not change. 100GB Blu-Ray dropped by 60-70%. Archival Blu-Ray, too.
Tape: LTO-9 is available.
Tape: The cost of LTO-8 tape dropped 50%, which makes it the cheapest option.
Tape: This is not new, but there is still only one tape drive manufacturer (HP) since around the introduction of LTO-8.

Tagged archiving, prices, storage

leave comment

youtube-autodl

By admin | July 8, 2022 - 12:02 pm |July 8, 2022 Non-Technical, Technical

I just wrote the first pass at youtube-autodl, a tool for automatically downloading youtube videos. It’s inspired by Popcorn Time, a similar program I never ended up using, for automatically pirating the latest video from a TV series coming out.

You explain what you want to download, where you want to download it to, and how to name videoes. youtube-autodl takes care of the rest, including de-duplication and downloading things ones.

The easiest way to understand it is to take a look at the example config file, which is my actual config file.

Personally, I find youtube is pushing “watch this related” video and main-page feeds more and more, to the point where they actually succeed with me. I don’t want to accidentally waste time, so I wanted a way to avoid visiting youtube.com. This is my solution.

Tagged archiving, software, youtube

leave comment

USB Flash Longevity Testing – Year 2

By admin | March 10, 2022 - 09:11 am |March 10, 2022 Non-Technical

Year 0 – I filled 10 32-GB Kingston flash drives with random data.
Year 1 – Tested drive 1, zero bit rot. Re-wrote the drive with the same data.
Year 2 – Re-tested drive 1, zero bit rot. Tested drive 2, zero bit rot. Re-wrote both with the same data.

They have been stored in a box on my shelf, with a 1-month period in a moving van (probably below freezing) this year.

Will report back in 1 more year when I test the third 🙂

FAQs:

Q: Why didn’t you test more kinds of drives?
A: Because I don’t have unlimited energy, time and money :). I encourage you to!
Q: You know you powered the drive by reading it, right?
A: Yes, that’s why I wrote 10 drives to begin with. We want to see how something works if left unpowered for 1 year, 2 years, etc.
Q: What drive model is this?
A: The drive tested was “Kingston Digital DataTraveler SE9 32GB USB 2.0 Flash Drive (DTSE9H/32GBZ)” from Amazon, model DTSE9H/32GBZ, barcode 740617206432, WO# 8463411X001, ID 2364, bl 1933, serial id 206432TWUS008463411X001005. It was not used for anything previously–I bought it just for this test.
Q: Which flash type is this model?
A: We don’t know. If you do know, please tell me.
Q: What data are you testing with?
A: (Repeatable) randomly generated bits
Q: What filesystem are you using? / Doesn’t the filesystem do error correction?
A: I’m writing data directly to the drive using Linux’s block devices.

Tagged archiving, research, slow, usb

leave comment

Storage Prices 2020-01

By admin | January 6, 2020 - 04:53 pm |January 6, 2020 Uncategorized

I did a survey of the cost of buying hard drives (of all sorts), CDs, DVDs, Blue-rays, and tape media (for tape drives).

Here are the 2020-01 results: https://za3k.com/archive/storage-2020-01.sc.txt
2019-07: https://za3k.com/archive/storage-2019-07.sc.txt
2018-10: https://za3k.com/archive/storage-2018-10.sc.txt
2018-06: https://za3k.com/archive/storage-2017-06.sc.txt
2018-01: https://za3k.com/archive/storage-2017-01.sc.txt

Changes this year

I excluded Seagate drives (except where they’re the only drives in class)
Amazon’s search got much worse, and they started having listings for refurbished drives
Corrected paper archival density, added photographic film
Added SSDs (both 2.5″ and M.2 formats)
Prices did not go up or down significantly in the last 6 months.

Some conclusions that are useful to know

The cheapest option is tape media, but tape reader/writers for LTO 6, 7, and 8 are very expensive.
The second-cheapest option is to buy external hard drives, and then open the cases and take out the hard drives. This gives you reliable drives with no warrantee.
Blu-ray and DVD are more expensive than buying hard drives

Tagged archiving, prices, storage

leave comment