Description
With the coming encryption changes (see #494), I think we need to start looking at other use cases users could have for encryption. The biggest one I see is the ability to take encrypted backups with zfs send
to an untrusted machine. In addition, it would be nice for a system administrator to be able to take backups of encrypted datasets without needing to load encryption keys. With such a feature, ZFS could be a true platform for end-to-end encryption.
The fundamental issue as far as implementation goes is that all of the ZFS send code is designed to go through the ARC and work with arc_buf_t
's. The ARC currently stores data decompressed and decrypted, so that it can be used on request. That said, there are WIP changes coming that will enable the ARC to store compressed data (see the WIP at #4768) and further changes that will allow users to send data from the compressed ARC without decompressing (still not upstreamed to OpenZFS yet). However, this work doesn't really make sense for encryption because, fundamentally, compressed data can always be decompressed, but encrypted data can only be decrypted if the encryption key is loaded. Furthermore, from an efficiency standpoint, it makes sense to want compressed data in the ARC so that we can fit more data in there, while there is no real reason to want encrypted data in the ARC.
Therefore, I would like to put forth the idea of a raw send that reads data from disk instead of the ARC. The basic implementation here would be to abstract the arc_read()
calls on the send side so that they can also use zio_read()
with ZIO_FLAG_RAW
. The receive side would simply need to write this data exactly as is to the disk.
On top of that, we would need a way to send the DSL Keychain object, which currently resides in the MOS. I think this could be included as a new DRR_*
enum type that would simply be handled slightly differently so that it is written to the MOS instead of the dataset. It is also possible, since the encryption patch isn't upstreamed yet, that we could move the DSL Keychain into its dataset, but I suspect that might be more trouble than it is worth (although I will look into it).
I am going to start working on a proof of concept of this soon and will submit it as a WIP when I feel it is close enough to needing review. Please let me know if anyone has any suggestions or issues.