Mounting an Amazon S3 bucket into a CentOS 6 filesystem

Fortunately this is far easier to do on CentOS 6 than it is on CentOS 4.  If you do need the CentOS 4 directions, those are located here.

Prerequisites for this are installable via yum:

yum install libcurl libcurl-devel libxml2-devel make automake autoconf

There is also a FUSE kernel module available in CentOS 6 but unfortunately it’s version 2.8.3 and the later versions of S3FS (1.73+) require later versions.   That being the case, you’ll want to remove the fuse rpms if they’re already present since they will conflict:

rpm -e fuse fuse-devel fuse-libs

For FUSE, go grab the latest from http://fuse.sourceforge.net/  At the time of this writing, that is version 2.9.3:

wget http://sourceforge.net/projects/fuse/files/fuse-2.X/2.9.3/fuse-2.9.3.tar.gz/download

Extract it, cd into the extract directory and run configure, make and make install:

tar zxvf fuse-2.9.3.tar.gz
cd fuse-2.9.3
./configure --prefix=/usr
make
make install

Note that the above installation overrode the normal install location of /usr/local/ and just went with /usr/.  I chose to do this so I didn’t have to modify any of the server’s library locations, paths, etc. that would affect building S3FS or its commands.

S3FS uses package config to find information about the installed FUSE library, so before starting the S3FS build process, run this:

export PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/lib64/pkgconfig/

I don’t know if you’re installing on 32-bit or 64-bit so the above works for both. If you chose to install FUSE in /usr/local/ then you’ll need to adjust the package config paths to /usr/local/lib/pkgconfig and /usr/local/lib64/pkgconfig

Next is S3FS.  At the time of this writing, 1.73 is the latest version.  You can download from http://code.google.com/p/s3fs/wiki/FuseOverAmazon or if 1.73 is still current, from the server run:

wget http://s3fs.googlecode.com/files/s3fs-1.73.tar.gz

No special config directives are needed other than to specify the installation prefix as /usr if you also installed FUSE there like my directions specified:

tar zxvf s3fs-1.73.tar.gz
cd s3fs-1.73
autoreconf --install
./configure --prefix=/usr
make
make install

Hopefully you now have a working S3FS installation; test it out:

# s3fs --version
Amazon Simple Storage Service File System 1.73
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Final step is hooking into Amazon.  In the Amazon S3 portal, click “Security Credentials” and create a new access key.  Put the new access key id and secret key in /etc/passwd-s3fs.  Chmod 600 on that file.  Now let’s try to mount:

mkdir /var/s3cache
s3fs bucket_name /var/www/html/s3 -o use_cache=/var/s3cache,uid=500,gid=500,allow_other

The use_cache directive should not be pointed at a small partition because file writes are going to create files in there and it can fill up.  If you don’t have an area of the system large enough to hold what you expect to be writing, probably best to go without cache.  If going with cache, you may also want a cron job to clear out old files from cache periodically since most people use S3 as a way to get a huge amount of cheap storage, so if all the files you write to it just end up on your server anyway from the cache, it hasn’t saved you anything.  Such a cron job would look something like this to remove any cached data older than two days:

find /var/s3fs/ -type f -mtime +2 -exec rm -f '{}' \;

The bucket_name should be replaced by the name of your S3 bucket.  I recommend creating it in all lowercase because past versions of S3FS only supported lowercase buckets and who knows if there are bugs left around from whatever made that a requirement.

/var/www/html/s3 should of course be replaced by wherever you choose to mount this new filesystem.

The uid and gid should be set to the user who owns where the file system is mounted, i.e. the user id that will be making use of the storage.  If other users, or processes (like apache) need to get into the same directory, then allow_other must be specified.  Keep in mind that S3FS doesn’t support permissions and ownership, so anyone who can get into the directory once allow_other is turned on is going to have access to read/write/delete, and that includes your hacked php scripts, so be careful with important data on a web-connected server.  allow_other basically takes the file system in question from usable by only one user to world writable.

To mount an S3FS file system at boot time, add modprobe fuse to your /etc/rc.local file followed by your s3fs command to mount the filesystem.

When copying files into the new filesystem, rsync is useful, but use some less common options to optimize things; I have an article on that HERE.

Finally, some things I’ve noticed about S3FS that you may want to take into account:

  • There’s a performance tuning directive max_stat_cache_size you can turn on to cache directory listings up to the size of max_stat_cache_size.  I found max_stat_cache_size=100000 worked well for my needs.
  • Similar directive of enable_noobj_cache tells it to cache the non-existence of files.  I’m not using that directive currently so I do not know if it’s reliable.
  • A directive multireq_max can be set to a value to configure how many parallel requests are made for doing reads of object listings.  I was having some performance issues when using that so I stopped.
  • A directive parallel_count can be used for parallel writes of data; I had performance issues with that as well so stopped using it.  It defaults to 5.
  • Large directories written by prior versions of S3FS seem to have serious performance issues with later versions.  For example, a directory containing several thousand sub-directories written from a CentOS 4 server with S3FS version 1.19 would take about 15 minutes to output a simple ‘ls’ on a CentOS 6 server running S3FS 1.73.  I made a new bucket, from the CentOS 6 server, recreated all the data and directories, and the problems went away; directory listings now take between 1 and 4 seconds.
  • I had one instance where something went wrong with the cache after I had uploaded 60,000 files to S3 via S3FS.  When reading the files back, they’d come out as zero byte files.  Completely emptying the cache directory resolved the issue.

Leave a Reply

Your email address will not be published. Required fields are marked *