Tuesday, April 4, 2023

Comparison of par2cmdline, par2cmdline-turbo, klauspost/reedsolomon, and catid/wirehair

Programs tested: parchive/par2cmdline, klauspost/reedsolomon, catid/wirehair

=== 1 GB file with 1100 shards ===

1000 file shards, 100 recovery shards

par2cmdline: 30 seconds, 0.1 GB memory usage

par2cmdline-turbo: 5 seconds, 0.12 GB memory usage

wirehair: 4 seconds, 2.1 GB memory usage

reedsolomon: 4 seconds, 1.5 GB memory usage

The results of the test above show that par2cmdline-turbo is the clear winner in terms of performance and memory usage.

The memory usage of wirehair is due to the storage of the file in memory as well as the creation of the encoder. The 1GB file itself takes up 1GB of memory, and then the encoder also takes up another 1GB, for a total of 2GB. 

The memory usage of Klaus Post's reed solomon implementation is due to the file itself as well as the encoder. The reed solomon encoding matrix takes around 0.1GB, then the file itself takes around 1GB, and then encoding the shards takes around 0.4GB, for a total of 1.5GB.

All in all, wirehair and reedsolomon take similar amounts of time, though reedsolomon uses somewhat less memory. Both are much faster than par2cmdline but also use much more memory.

Note: klauspost/reedsolomon does support progressive encoding, but only for the Regular code (which supports up to 256 shards) not the Leopard-RS code (which supports up to 65536 shards). This means that if you have more than 256 shards then you cannot do progressive encoding. See source code here (https://github.com/klauspost/reedsolomon/blob/4e9455c045bba7f15065ef2f216d488866decf2b/leopard.go):

func (r *leopardFF16) EncodeIdx(dataShard []byte, idx int, parity [][]byte) error {
return ErrNotSupported
}

As for wirehair, the author said that there is no way to do progressive encoding:

https://github.com/catid/wirehair/issues/30

Q: Is there any way to not have to load entire files into ram?

catid: I think you'd have to use swap for that or split it into separate pieces. The row/column mixing travels all over the dataset. Might be improved a bit but at the end of the day requires everything to produce the intermediate symbols and then produce output!