When using an S3FS/FUSE-based filesystem on Linux to store data in Amazon S3, here’s a recommendation on the rsync command to use to push files up to it:
rsync -avW --progress --inplace --size-only
This gives you the following:
- Recurse directories
- Include symlinks
- Include permissions
- Include modification times
- Include group (although in most cases this is meaningless since most users of S3FS use the hard-coded uid, gid and allow_other arguments to ensure the filesystem works correctly for the intended user)
- Include owner (similarly meaningless)
- Include devices (not sure if device objects can be represented in S3FS, haven’t tried it)
-W: copy whole files. This prevents rsync from trying to do checksums and only replace pieces of the destination file because in the S3FS world, the entire file is going to come down, be modified, and pushed back up, which is much worse than simply pushing the new version of the file up and not trying to modify parts of it. rsync doesn’t realize the filesystem is remote.
–progress: useful to watch what it’s doing since rsync’ing files over S3FS is sloooooow. This will also tell you if it has hung and give you speed stats.
–size-only: copy based on the file’s size, not the date, time or checksum. I’ve found the date/time is often not very useful, especially if using the filesystem from multiple systems.
–inplace: copy changed blocks directly into the destination file; can save you considerably on S3 inbound/outbound bandwidth if you have small changes to large files. (Credit to commenter for suggesting this)