Mounting an S3 filesystem on Amazon from a CentOS 4 using FUSE/s3fs

Well this was more of a challenge than expected.  A customer had an old CentOS 4 server with about a terabyte of data on it that was by itself at a remote facility; i.e. no other machine on the same network to move the files too.  The files stored were relatively large, need to be web accessible AND the primary office had too slow an internet connection to become the new server.  There was no confidential/personal data in the files, so Amazon S3 seemed like a reasonable place to put them.  That would make them (optionally) web accessible, is mostly reliable and has a lot of redundancy since they have no way to back them up themselves unless they download the archive to local media.

I decided to set up the FUSE kernel module for having a filesystem in user space; i.e. let a regular user mount a remote filesystem.  The S3FS software lets you mount an Amazon S3 bucket as a filesystem in linux via the FUSE kernel module.  Later versions of RedHat, and CentOS as a result, have FUSE modules built in, but the old CentOS 4 does not.

So, first, you’re going to need to install some prerequisites which you can download from http://vault.centos.org/ since you don’t likely have your old CentOS 4 CD laying around:

  • guile
  • neon
  • subversion
  • swig
  • umb-scheme
  • kernel-devel (However, you need to match the devel package to the specific version of kernel you’re running, and keep in mind that if you update the kernel later, your fuse module is going to break and need to be rebuilt again.  This specific kernel-devel also needs to match SMP/no-SMP, bigmem, etc. so make sure you get the correct one)

Install all of those using the standard rpm command.

For FUSE, the latest versions do not work with the old kernel.  The last version that will build successfully on CentOS 4 is 2.7.6, so that is the one you’re going to need.  http://sourceforge.net/projects/fuse/files/fuse-2.X/2.7.6/

Extract it, cd into the extract directory and run:

./configure --with-kernel=/usr/src/kernels/2.6.9-89.31.1.EL-smp-i686 --prefix=/usr
make
make install

The above should of course be adjusted to match the specific kernel you have installed since the devel package will also be version-specific.  Assuming it builds without error, the following command *should* reveal the 2.7.6 version you installed:

pkg-config --modversion fuse

If you decided to install fuse without the –prefix=/usr then you’ll end up having the module in /usr/local/ and same with the pkgconfig file, so the above command will not work.  In that case, you’d first need to run:

export PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/local/lib/pkgconfig

or  lib64 instead of lib if you’re on a 64-bit system.

Okay, with fuse installed and pkg-config showing the correct version, run: modprobe fuse

Make sure it loaded:
lsmod | grep fuse (should now show the module as loaded)
dmesg should show it too:

dmesg | tail -2

fuse init (API version 7.8)
 fuse distribution version: 2.7.6

Next is S3FS.  Unfortunately you have to go all the way back to version 1.19 for the last one that will configure successfully on CentOS 4; that download is:  http://s3fs.googlecode.com/files/s3fs-1.19.tar.gz

Normally you’d try to to do the typical build procedure; ./configure –prefix=/usr; make; make install but it will fail saying that you do not have libcurl support, even if you have the curl-devel package installed.  The version isn’t new enough, but that’s only the half.  The latest version of curl is too new, so you need one right in the middle; version 7.15 specifically:  http://curl.haxx.se/download/archeology/curl-7.15.0.tar.gz

The reason for that is because of the ancient version of pkg-config on CentOS 4.  I attempted to compile and install a later version of pkg-config but that caused more problems so I went back to the base version and worked around the issue.  If you attempt to use the later versions of curl, you’ll end up getting this error from pkg-config:  “Unknown keyword ‘URL’ in ‘/usr/lib/pkgconfig/libcurl.pc'”

Okay, so 7.15 is downloaded and extracted; we’re going to install it over top of the system’s curl; I didn’t find any harmful side effects from doing this and it makes it eliminates conflicts from ldconfig finding curl libraries in two places since you’re going to have a binary that depends on the newer one.  So build/install using “./configure –prefix=/usr; make;  make install”

Think it’s going to work now?  Nope.  Now you’re going to get stuck on a missing libcrypto.pc for package config.  The versions of openssl, openssl-devel and openssl096 for CentOS 4 do not include that.  I decided to download the latest openssl to get around this issue; that one you do NOT want to install over top of the system’s openssl.  Fortunately it doesn’t by default so nothing special needs to happen there.  I grabbed the latest 1.0.1e from http://www.openssl.org/source/openssl-1.0.1e.tar.gz and did the ./configure –prefix=/usr/local; make; make install

Okay once more giving it a go with the S3FS build; ./configure.  Now it actually worked, and picked up the correct copy of openssl.  Go ahead with make; make install and test it out:

# s3fs --version
 Amazon Simple Storage Service File System 1.19
 Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
 License GPL2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.

good to go so far.  Now, in the Amazon S3 portal, click “Security Credentials” and create a new access key.  Put the new access key id and secret key in /etc/passwd-s3fs.  Chmod 600 on that file.  Now let’s try to mount:

s3fs bucket_name /var/www/html/s3 -o use_cache=/tmp,uid=500,gid=500,allow_other

The use_cache directive should not be pointed at a small partition because file writes are going to create files in there and it can fill up.  If you don’t have an area of the system large enough to hold what you expect to be writing, probably best to go without cache.

The bucket_name should be replaced by the name of your S3 bucket.  Create it in all lowercase because this version of S3FS only supports lowercase buckets.

/var/www/html/s3 should of course be replaced by wherever you choose to mount this new filesystem.

The uid and gid should be set to the user who owns where the file system is mounted.  If other users, or processes (like apache) need to get into the same directory, then allow_other must be specified.  Keep in mind that S3FS doesn’t support permissions and ownership, so anyone who can get into the directory once allow_other is turned on is going to have access to read/write/delete, and that includes your hacked php scripts, so be careful with important data on a web-connected server.

To make this happen at boot time, add modprobe fuse to your /etc/rc.local file followed by your s3fs command to mount the filesystem.

When copying files into the new filesystem, rsync is useful, but use rsync with the -W option to prevent it from trying to compute changes to the files; the -W means just replace, so no trying to download first, change, upload.

One final caveat. Unfortunately this combination of S3FS and FUSE versions do not work reliably with big directories; i.e. if you try to copy a directory into the new filesystem that has thousands of files, typically it will just hang and you won’t be able to kill the processes; you’ll have to reboot. There may be some eventual timeout but I waited an hour before giving up. This issue was corrected in later versions of either fuse, s3fs or both, not sure where the bug is, but I have not found a way to compile the later versions of either on CentOS 4 unfortunately. You can probably work around this limitation with a careful combination of find, xargs and rsync to copy a file at a time or something like that to get your directory structure initially mirrored, then it should work fine going forward if you’re just reading/writing/updating files.

Leave a Reply

Your email address will not be published. Required fields are marked *