Keeping Amazon EC2 crap off your website

You’ve probably happened across this page because your website, or server, is continually being hammered with a barrage of bullshit requests from random IP’s within Amazon’s EC2 address space.  They always seem to come in massive deluges and include hundreds of IP’s across multiple common network boundaries.  You’re probably sick of it, maybe you’ve tried to contact Amazon and realized they could not give a shit about abuse from their network, so then you started blocking one IP at a time, or adding CIDR ranges into a .htaccess file, etc.  It all gets very trying on the nerves.

Well, fortunately Amazon themselves are able to help you block their customers, and I’ve written some simple scripts to automate updating it.  So here we go.

Getting the Amazon IP’s

You’ll find that they’re nice enough to have their IP ranges publicly available, and even in JSON format.  This page discusses how to get them and even how to have them notify you of changes; or you can just poll for them of course:

http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

In any case, the current download link for the JSON file is:

https://ip-ranges.amazonaws.com/ip-ranges.json

Now, the first issue for me is that this includes all their IP ranges; Amazon proper, EC2 (the cess pool), cloudfront, route53, etc.  It also categorizes them by AWS region.  I only wanted to block EC2, but did want to block from all regions, so that’s what this script does.  You can easily modify to restrict that down.

Here’s the perl script; it writes the output to a file named ec2-ips.dat in the same directory it’s running in:

#!/usr/bin/perl

use strict;
use warnings;

use JSON;
use LWP;
use HTTP::Request::Common;
use HTTP::Status qw( :constants );

# amazon IP JSON URL
# This comes from http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html#aws-ip-download
my $amazonIPs = 'https://ip-ranges.amazonaws.com/ip-ranges.json';
my $json = JSON->new;
my $ua = LWP::UserAgent->new;

$ua->agent("CanIHazEC2die.com/1.0");

my $request = GET "$amazonIPs";
my $response = $ua->request($request);

if ( $response->is_error ) {
  print "AWS IP list request failure; check URL and connectivity.\n";
  exit;
} else {

  open(my $fh, '>','/var/ec2block/ec2-ips.dat');

  my %ipjson = %{ $json->decode( $response->content )};

  for (keys $ipjson{prefixes}) {
   my $service = $ipjson{prefixes}[$_]{service};
   my $prefix = $ipjson{prefixes}[$_]{ip_prefix};
   if ($service eq "EC2") {
     print $fh "$prefix\n";
   }
  }
  for (keys $ipjson{ipv6_prefixes}) {
   my $service = $ipjson{ipv6_prefixes}[$_]{service};
   my $prefix = $ipjson{ipv6_prefixes}[$_]{ipv6_prefix};
   if ($service eq "EC2") {
     print $fh "$prefix\n";
   }
  }

  close ($fh);
}

Apache Blocking

This is the part where I had high hopes of this being an easy task but was thwarted.  I didn’t want to use the traditional Allow/Deny statements, which support CIDR blocks, because I needed to only block certain content from Amazon, which doesn’t fall on URL boundaries but rather is based on what may or may not be in the query string.  So, if all you need is outright blocking, for a whole site or subdirectories, you can use the above script to grab the CIDR blocks from amazon, and then just do a traditional allow,deny in your config:

order allow,deny
deny from 54.72.0.0/15
.....
allow from all

You can even alter the above perl script to spit it out in the appropriate format, possibly for automation.

Now, back to my problem.  (Reminder: I want to very selectively block, based on query string AND ip address)  My next idea was to use Apache’s 2.4+ feature of expressions.  Expressions is CIDR-aware, so you can do rewrites like this:

RewriteCond expr "%{REMOTE_ADDR} -ipmatch '192.0.2.0/24'"

or, more efficiently (based on the docs at https://httpd.apache.org/docs/2.4/expr.html) this should be written as:

RewriteCond expr "%{REMOTE_ADDR} -R '192.0.2.0/24'"

This code is able to determine if the given visitor IP address is contained within the CIDR-defined network range.  So this gets us a lot closer to what I wanted; I could have a RewriteCond that looks at the query string, and then also network matching to block the request.  I didn’t really want to do that though because Amazon has more than 250 network ranges currently, so I’d end up with a nightmare of a rewrite script that does 250+ comparisons with every request to the site.

Next idea was a hash-based rewrite map to hold the networks.  Unfortunately, there’s no functionality allow the expression features to match against a rewrite map instead of just simple text; or perhaps there is, but I couldn’t find it.  I envisioned it looking something like this:

RewriteEngine On
RewriteMap ec2addys dbm:/var/ec2.dbm
RewriteCond %{QUERY_STRING} protectedString
RewriteCond expr "%{REMOTE_ADDR} -R '$ec2addys:%{REMOTE_ADDR}'"
RewriteRule .* - [F,L]

 

I just made that up, but you get the idea; didn’t work either way.  So, now I’m at the point where I don’t want a huge config file to block all the networks, can’t use expressions with a rewritemap, so the next idea I had was a simple perl script running as a rewritemap program.

Now my config looks like:

RewriteMap ec2block prg:/var/ec2block/ec2-ips.pl
RewriteCond %{QUERY_STRING} protectedString
RewriteCond ${ec2block:%{REMOTE_ADDR}} ^ec2block$
RewriteRule .* - [F,L]

So, what occurs is for only the requests where the protected string is in the query string sent by the browser, the remote address is sent to the already-running (rewritemap programs start with apache) perl script, who’s already read in the amazon files, and it runs through a quick loop to see if the address matches any Amazon network.  If it does, it spits out ‘ec2block’ which tells apache to deny the request, otherwise the request is allowed.

Here’s the perl script:

#!/usr/bin/perl

use strict;
use warnings;

use NetAddr::IP;

my $network;
my $remoteIP;
my $v6 = 0;

# Turn off I/O buffering
$| = 1;

open(my $fh, '</var/ec2block/ec2-ips.dat');
my @iparray = <$fh>;
close($fh);

while (<STDIN>) {
  $v6 = 0;
  my $visitor;
  $remoteIP = $_;
  chomp $remoteIP;

  if ( $remoteIP =~ /:/ ) {
    if ( $debug ) { print $fh "remote ip is ipv6 $remoteIP\n"; }
    $visitor = NetAddr::IP->new6($remoteIP);
    $v6 = 1;
  } else {
    if ( $debug ) { print $fh "remote ip is ipv4 $remoteIP\n"; }
    $visitor = NetAddr::IP->new($remoteIP);
  }

  if ( !defined ($visitor) ) { next; }
  foreach my $cidr (@iparray) {
    chomp $cidr;
    if ( ( $v6 ) && ( $cidr !~ /:/ ) ) {
      next;
    } elsif ( ( !$v6 ) && ( $cidr =~ /:/ ) ) {
      next;
    }

    if ( $v6 ) {
      $network = NetAddr::IP->new6($cidr);
    } else {
      $network = NetAddr::IP->new($cidr);
    }

    if ( $visitor->within($network) ) {
      print "ec2block\n";
      last;
    } else {
      print "NULL\n";
      last;
    }
  }
  undef $visitor;
}


The script works for both IPv4 and IPv6 requests without need for modification.  If it looks a little weird, it’s likely because of how rewritemap programs work.  They start with apache and stay running, then just get STDIN of whatever you send them, and are expected to spit back STDOUT one of two things, a newline-terminated response string, or newline-terminated NULL (actual NULL spelled out like four letters, as seen above).  So the script doesn’t have too much overhead other than looking through the loop of 250+ amazon network ranges.  There’s probably some fancy perl guru way of heavily optimizing this, but given the address to CIDR testing needs to be performed, I couldn’t think of a fancy hash-based way to do it.

 

3 Replies to “Keeping Amazon EC2 crap off your website”

  1. Monick

    Hi! I was searching for a similar solution and I’ve tried your code. I like the idea, but I can’t get how it can actually work if you break the loop after the “NULL” print: comparison between IP and network stops after first iteration, doesn’t it? I’m a Perl newbie but I can’t make it works, so I edited last lines of your code this way:

    if ( $visitor->within($network) ) {
    $found = 1;
    print “ec2block\n”;
    last;
    }
    }
    if ( !$found ) { print “NULL\n”; }
    undef $visitor;
    undef $remoteIP;
    undef $found;

    Absolutely not sure if it’s strictly correct but it’s working.

    Thanks for the hint!

    Reply
    • Your Mom Post author

      Hi Monick, that ‘last’ is breaking you out of the foreach() loop, not the overall while() loop, so once either ec2block or NULL is printed, it just goes back to waiting for the next line of input from apache.

      Reply

Leave a Reply to Your Mom Cancel reply

Your email address will not be published. Required fields are marked *