Shell Voodoo, Connected IPs, and Counting Total Connections

I’m posting this mostly as a note to myself, but if you, future visitor, stumble upon this post and have improvements or other things you’d like to share, be my guest. Posts that are overly critical of the methodologies provided by others, or those which otherwise add nothing to the discussion will be removed. This is especially true for those espousing beliefs that PowerShell is superior.

I won’t go into the exact details of why we needed to do this, but the general break down is thus:

  • Get a list of connected IP addresses
  • Sort them
  • Count how many connections were made from a single address

Fortunately, the solution turns out to be quite easy. For FreeBSD:

netstat -anfinet | grep -v 127.0.0.1 | awk '{ print $5 }' | \
grep -E '.*([0-9]{1,4}\.)+.*' | sed 's/\(.*\)\..*/\1/' | \
sort -g -k 1 | uniq -c | sort -n -k 1

And for most derivatives of Linux:

netstat -anW --tcp --udp | grep -v 127.0.0.1 | awk '{ print $5 }' | \
grep --color=never -E '.*[0-9]{1,4}(\.|\:).*' | sed 's/\(.*\)\:.*/\1/' | \
sort -g -k 1 | uniq -c | sort -n -k 1

You may need to modprobe sctp to get the --tcp and --udp netstat flags working. Also, both of these should work with IPv6 addresses, too, which is why I’ve tried to keep the sed regex as simple as possible.

What the Eff is This?!

Okay, I agree. I’ve probably made some kind of mistake somewhere; I don’t know awk or sed quite as well as I should (easily fixed, if I ever wanted to spend a weekend learning). That said, here’s my understanding of how this should work. First, we’ll deal with the FreeBSD derivative, line by line:

FreeBSD

Here is a breakdown for the FreeBSD-specific stuff:

netstat -anfinet | grep -v 127.0.0.1 | awk '{ print $5 }' | \

As with all platforms I’m aware, -an shows all connections by their numerical addresses. netstat prefers to perform a reverse lookup on every address, and this can take some time. However, the FreeBSD-specific option -f inet specifies to only show INET (IPv4/IPv6) addresses and eliminates much of the cruft associated with local Unix domain sockets. Likewise, we trim localhost from the list with grep -v, and we fetch the 5th output column using awk

grep -E '.*([0-9]{1,4}\.)+.*' | sed 's/\(.*\)\..*/\1/' | \

Moving on to the next line, we fetch only those lines that contain something that vaguely resembles an IP address with grep -E (I prefer to use -E here since it gives us the extended regex syntax), and we pass the results into sed to strip off the trailing remote host’s port number. Alternatively, you could use something like 's/^\([0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\).*/\1/' instead to filter out IPv4 addresses, but since we already know roughly what to expect from the input we can simplify our regex. Furthermore, we also know that the IP address of the remote host in FreeBSD will always have a dot followed by the port number appended, and we can naively remove this.

sort -g -k 1 | uniq -c | sort -n -k 1

Lastly, we sort (generically, with -gunique addresses in our list including their totals, and we sort numerically by the first column (now containing the count).

Linux

Here is a breakdown for the Linux-specific stuff:

netstat -anW --tcp --udp | grep -v 127.0.0.1 | awk '{ print $5 }' | \

Following in the footsteps of FreeBSD, we use -an to display all connected numeric addresses so we don’t waste time running reverse lookups. However, in most Linux distributions, lengthy columns–and especially IPv6 addresses–will be truncated by netstat’s output. To counter this, we use -W to show the wide listing, and we use --tcp and --udp to filter out only those protocols. You may need to modprobe sctp in order to get this to work; if you can’t, this string of commands might still work. Lastly, we filter connections to localhost with grep -v, and we fetch the 5th column using awk Easy enough, right?

grep --color=never -E '.*[0-9]{1,4}(\.|\:).*' | sed 's/\(.*\)\:.*/\1/' | \

In this next line, we use the extended regex feature of grep -E to filter out lines that look somewhat address-y, and we separate the remote host’s address from its port using sed. In this case, Linux appends port numbers using a colon (:), so we have to deviate slightly from the FreeBSD example. Also, since some distros might alias grep with grep --color=auto|always, we use --color=never to eliminate feeding ANSI control characters to sed.

sort -g -k 1 | uniq -c | sort -n -k 1

Lastly, we sort by the IP address using a generic sort (-g), filter out only those addresses that are unique, count them, and then sort by the count column which is now tacked onto the front.

Now we can get a fancy list of IP addresses, how many connections from them are being made to us, and sort them accordingly! Manipulating grep accordingly can re-introduce localhost or remove specific addresses that might not be of interest.

No comments.
***

Fun with find

A friend of mine was asking how to append a string to files contained in a directory structure of unknown depth. I dug around a little bit and found this gem.

Eric has been having several difficult issues building KDE 4-point-something on Funtoo (a Gentoo fork) and it occurred to him that it might be possible to add a specific use flag to every IUSE contained within the build. Unfortunately, due to the nature of the package directory structure, it would prove tiresome attempting to append the same string to each file. Besides, that’s what scripting is all about, isn’t it?

Here’s one possible solution:

find . -type f -name 'IUSE' -exec sh -c 'echo "exceptions" >> {}' \;

For a more secure solution, use -execdir:

find . -type f -name 'IUSE' -execdir sh -c 'echo "exceptions" >> {}' \;

If that syntax frightens you, for loop constructs are also a possibility:

for file in `find . -type f -name 'IUSE'` ; do echo "exceptions" >> $i ; done

If you know of others, share them!

No comments.
***

Quickie: IPython

I love Python. It’s a beautiful language; simple, concise, and it’s a much more “pure” object-oriented language than others. (PHP, I’m looking at you.) It does have its drawbacks, mostly as a consequence of duck typing, but I can’t think of very many situations where that would present a significant drawback.[1]

In this quickie, I’d like to introduce something I recently discovered while working through a TurboGears tutorial. It’s called IPython and is an incredibly useful interactive shell. (Be aware that the IPython site doesn’t appear to be working at the moment.) For a quick overview, take a look at the Wikipedia entry. If you’d rather hear my take, here it is:

  • It supports tabbed auto-completion – You can use this to auto-complete module names (and module paths like twisted.web.http) or to inspect object methods, class methods, and just about anything else
  • It works like most common shells – Even under Windows, you can fire up IPython and obtain a reasonably Unix-like shell.
  • Looking through the Wikipedia entry, it has a few other useful features such as the capability of running blocking functions in a separate thread of execution.

With as long as IPy has been around, I’m surprised I just discovered it. I have heard of it before, but my previous biases of self-hosting language shells being rather poor apparently interfered with my desire to try new things. Try it! You won’t regret it; it’s certainly a lot easier than inspecting everything with dir() when you can’t remember whether that method was has_index or hasindex!

Footnotes

[1]: Some developers feel that duck typing can be met with disastrous consequences particularly in large, poorly documented projects. This school of thought generally holds that strongly typed languages are self-documenting. While I agree, I think that the drawbacks of duck typing can be mitigated by 1) taking care to ensure that published API documentation truly conform to the APIs they document, 2) making liberal use of docstrings within methods and function definitions, and 3) updating the docstrings immediately before or after a change has been made that may break compatibility with outside callers. In a perfect world, the project documentation would be in perfect sync with the source; this isn’t a perfect world, of course, and I feel that keeping docstrings consistent with the documentation at the very least is fundamentally important.
No comments.
***