up-to-date performance comparison between packetfilter and iptables?

I can't help you with that unless I happen to google across it, but out of curiosity: What for?

Basically, the difference in performance is small, while the syntax and supported features vary more.
If you want performance benchmarks, you can probably find ones saying that both are the fastest. :D

(In other words: Don't select your firewall on performance numbers.)
 
HHunt is right...Go with what you like better. 99% of the time you won't notice a performance difference, particularly in a home, small-business or even medium-sized business application.
 
Warning - not strictly on-topic post ahead.

I do know of one place where performance does matter, though - the residential network at my school. 4000 users. Before they disabled BitTorrent, Resnet was generating 150k packets per second (pps) all day long - and NAT is on. 100% router cpu load, all the time. They're getting new uber[H]ardware from Cisco that can handle NAT on 500k pps. That sounds like some fun stuff to play with. :D

And then they're re-enabling torrents :D

 
unhappy_mage said:
Warning - not strictly on-topic post ahead.

I do know of one place where performance does matter, though - the residential network at my school. 4000 users. Before they disabled BitTorrent, Resnet was generating 150k packets per second (pps) all day long - and NAT is on. 100% router cpu load, all the time. They're getting new uber[H]ardware from Cisco that can handle NAT on 500k pps. That sounds like some fun stuff to play with. :D

And then they're re-enabling torrents :D


500k pps is immense, when I think about it. :D
I wonder what kind of hardware would be neccesary to handle that with a computer. Is PCI fast enough or would you need PCIe network cards? What about the CPU and memory bandwidth?

I really do wonder what you'd get out of a nice opteron with two Gbit cards and openBSD. I'd be suprised if it wasn't less, but I'd like to know how much.
 
Apparently at one point the new networking stuff in FreeBSD 5.x was able to effectively route 1mpps. They said the Linux guys were floored by that number.
 
[H]EMI_426 said:
Apparently at one point the new networking stuff in FreeBSD 5.x was able to effectively route 1mpps. They said the Linux guys were floored by that number.

When you mention it, I remember that. :D
In the meantime the code has matured and computers have become yet faster, but how much overhead does pf doing NAT add?

I'm already in bed, so I'm not about to test right now.
 
HHunt said:
500k pps is immense, when I think about it. :D
I wonder what kind of hardware would be neccesary to handle that with a computer. Is PCI fast enough or would you need PCIe network cards? What about the CPU and memory bandwidth?

I really do wonder what you'd get out of a nice opteron with two Gbit cards and openBSD. I'd be suprised if it wasn't less, but I'd like to know how much.
Well, we're only on oc-12 (see netflow chart), so I don't know how to test. However, let's consider it - assume an average of say 400 bytes per packet (reference); at 500k packets/sec, that's 200,000,000 bytes/sec, which is well over the speed of GigE. And then, how many operations is a NAT transform? At half a million NATs/sec, with Opteron 254s (2.8 gHz each) it had better be lower than 11200 clock cycles (I'd guess it's lower than that, but still a concern). There shouldn't be any problems with memory bandwidth, especially with NUMA, but how about pci-E? Each lane is 2.5 gbit full duplex with 8/10 coding, so user data flow of 2 gbit = 250 MB/s. So even a single lane would be enough to handle this much data. That's awesomely insane. Apparently Pci-E will last for a while...

 
Hmmm. When you put it like that, it really ought to work. I guess the main thing going for cisco is maturity and features.

I don't have anywhere near the hardware to do real test (only one GigE-card in the house, for one thing), but I suspect it would be possible to test pf by messing with aliases. I haven't put much thought into it, but it should be possible to estimate what overhead NAT adds with only one computer. We'll see if I get around to it. :D
 
HHunt said:
500k pps is immense, when I think about it. :D
I wonder what kind of hardware would be neccesary to handle that with a computer. Is PCI fast enough or would you need PCIe network cards? What about the CPU and memory bandwidth?

I really do wonder what you'd get out of a nice opteron with two Gbit cards and openBSD. I'd be suprised if it wasn't less, but I'd like to know how much.

I imagine pci-x would work...and the network cards are alot easier to find than pci-e :)

Rob
 
HHunt said:
I don't have anywhere near the hardware to do real test (only one GigE-card in the house, for one thing), but I suspect it would be possible to test pf by messing with aliases. I haven't put much thought into it, but it should be possible to estimate what overhead NAT adds with only one computer. We'll see if I get around to it. :D
Now there's an idea - and it gets rid of any potential network-card-based slowdowns, too. What do you recommend for a packet generator? I'll do Linux if you'll do BSD ;)

 
unhappy_mage said:
Now there's an idea - and it gets rid of any potential network-card-based slowdowns, too. What do you recommend for a packet generator? I'll do Linux if you'll do BSD ;)



Well, that's one of the things I haven't looked into yet, I've never needed one until now. I'll take recommendations. :D
(I'll mess around with it now and come back when I've got a general idea of what I'm getting into.)
 
Hm, strange.
Just to get an idea of what performance I get with nothing unusual running, I did a simple time ping -fc 100000 10.0.0.1.
That took about 3 seconds of wall time, for a flow rate of 30k pps. I'm not impressed.
(And yes, I disabled the icmp rate limiting.)

As for why, most of that time (roughly 70%) is used by ping, so it might just be that package generation is suprisingly expensive.
It's still faster than isic, which topped out at 15k pps.

edit: Adding a pf rule that takes all traffic to 10.0.0.1 (an alias on my network card) and redirects it to lo0 increases the wall time from 3.1s to 3.7s (for 100000 packages). Not too bad, but it's not all that much traffic to begin with.
 
Here are the results for my two machines:
2.4 with 512, 2.6.11.5
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
--- 127.0.0.1 ping statistics ---
100000 packets transmitted, 100000 received, 0% packet loss, time 8505ms
rtt min/avg/max/mdev = 0.015/0.027/3.202/0.016 ms, pipe 2, ipg/ewma 0.085/0.027 ms
real 0m8.538s
user 0m0.796s
sys 0m5.023s
dual 933 with 256, 2.6.14
PING 127.0.0.1 (127.0.0.1): 56 data bytes
--- 127.0.0.1 ping statistics ---
100000 packets transmitted, 100000 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/5.4 ms
real 0m20.852s
user 0m0.363s
sys 0m7.430s

So either BSD is better at handling packets by a *large* margin, or I need to get a better test.

 
Just for comparisations sake, the specs of my testing computer:
P4 xeon 2.8
1Gb PC2700
FreeBSD 7-CURRENT from 1 Aug. (Reminds me; It's time for an update)

8.5 seconds for 100000 packages to loopback does aound a bit high, yes.
My ping prints . for each package sent and backspace for each answer, and I've noticed that the terminal I use makes a lot of difference. Over ssh it's much slower, while a text-mode tty is the fastest. Redirecting it to /dev/null helps, at the cost of hiding the statistics.

Over SSH:
Code:
xeon# /usr/bin/time ping -fc 100000 127.0.0.1 > /dev/null
        2.91 real         0.22 user         1.75 sys
xeon# /usr/bin/time ping -fc 100000 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
--- 127.0.0.1 ping statistics ---
100000 packets transmitted, 100000 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.012/0.013/0.665/0.005 ms
        5.87 real         0.26 user         2.29 sys

Just for good measure, I tried on the other computer: (Slightly underclocked AthlonXP 2000)
Code:
nord# /usr/bin/time ping -fc 100000 127.0.0.1 > /dev/null
        3.47 real         0.21 user         1.78 sys
 
HHunt said:
My ping prints . for each package sent and backspace for each answer, and I've noticed that the terminal I use makes a lot of difference.
That's why you should always run tests that can have results altered by terminal conditions inside screen. screen pretty much returns instantly instead of waiting for terminal updates, etc. It doesn't completely nullify the effects of the terminal, but it negates a large amount of their effect.
 
I tried screen, and in this particular case it took almost exactly twice the time compared to a tty.
 
Aha! With output >/dev/null, here are the new times: 0m7.896s and 0m5.548s. Better, but still lacking. I dunno.

 
unhappy_mage said:
Aha! With output >/dev/null, here are the new times: 0m7.896s and 0m5.548s. Better, but still lacking. I dunno.


I guess FreeBSD might just be faster at handling a lot of small packets, then.
Still, 30 kpps is a whole lot less than 1 mpps, so I guess generating the traffic is rather expensive. I need another fast computer with a GigE card to test, I think. :D

edit: Just for the curiosity value, I booted the O2. (Posting from it now, in fact. Mozilla is a hog on this poor thing.)
Code:
MIPS 20# time ping -fc 100000 localhost
PING localhost (127.0.0.1): 56 data bytes
----localhost PING Statistics----
100000 packets transmitted, 99906 packets received, 0.1% packet loss

round-trip min/avg/max = 0.146/0.275/8.702 ms
  3209.9 packets/sec sent, 3207.0 packets/sec received
2.796u 9.134s 0:31.25 38.1% 0+0k 0+0io 0pf+0w

MIPS 21# hinv -c processor ; hinv -c memory
CPU: MIPS R5000 Processor Chip Revision: 2.1
FPU: MIPS R5000 Floating Point Coprocessor Revision: 1.0
1 200 MHZ IP32 Processor
Main memory size: 128 Mbytes
Secondary unified instruction/data cache size: 1 Mbyte on Processor 0
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
FLASH PROM version 4.17
MIPS 22# uname -a
IRIX MIPS 6.5 05190004 IP32
If this scaled linearely with MHz, it would use 3.1s at 2 GHz. Since the internal bandwidth of this thing is far out of propotion with the age and CPU speed, I think our numbers look rather good.
 
Okay, I decided to do the same tests as you guys and got some interesting results.

Code:
test# /usr/bin/time ping -fc 100000 localhost > /dev/null
        1.87 real         0.16 user         1.21 sys

Now, let's do it with a million packets...

Code:
test# /usr/bin/time ping -fc 1000000 localhost > /dev/null
       18.78 real         1.75 user        11.78 sys

Looks like the first batch was not a fluke, cause the million packets were right in line.

I'm seeing around 53k packets per second with the default packet size (56 bytes). Setting the packet size to zero drops the times a bit, but isn't very useful since there's no payload to process.

test is an Athlon XP 1800+, 512M RAM, KT600 chipset (IIRC). test runs 5.4-RELEASE-p6. I've done some network tuning (changed a few values in sysctl mostly) beforehand, so that might explain why I'm seeing throughput numbers essentially twice as fast as what HHunt was seeing.

ram is a dual PIII 733E machine with 512M ECC/registered RAM on a Tyan Tiger LE (ServerWorks ServerSet-III-LE chipset). Numbers from ram are less than stellar:

Code:
ram# /usr/bin/time ping -fc 100000 localhost > /dev/null
        8.08 real         0.46 user         5.49 sys

How about a million?

Code:
ram# /usr/bin/time ping -fc 1000000 localhost > /dev/null
       80.93 real         4.51 user        55.04 sys

Pretty much in-line.

I've done similar tuning to ram as I'd done to test, since ram is my home NFS server.

I decided to try pinging real interfaces, just to actually get the networking hardware involved, and didn't notice dramatic performance difference. Both machines have Intel GigE cards, except ram's card is a 64-bit card living in a 66MHz PCI slot. For test, the difference was very minimal, but for ram pinging the visible IP of the machine (could see activity on the switch, a GigE switch) there was no discernable difference in performance between pinging the real IP and pinging localhost. Interesting.
 
I suspect that creating the packages is a large amount of the load here, but given that I would have expected a dual-proc computer to scale well. The numbers from ram don't look like much more that you'd get from a single 733 (unfounded extrapolation, yeah), so ... hm.
 
Just to chip in:

AMD Duron 700MHz, 384MB PC133 RAM, OpenBSD 3.5

/usr/bin/time ping -fc 1000000 localhost > /dev/null
36.50 real 2.36 user 34.06 sys

I'll try it out with iperf later - I've heard good things about using it to test network speeds.
 
Just out of curiosity, I decided to run the same tests on test after upgrading to 6.0-RELEASE.

Code:
test# /usr/bin/time ping -fc 100000 localhost > /dev/null
        1.79 real         0.19 user         1.09 sys
test# /usr/bin/time ping -fc 1000000 localhost > /dev/null
       17.91 real         1.56 user        11.20 sys

Interesting. I'm still running the 4BSD scheduler. Up to nearly 56k pps now.
 
Back
Top