I warn you in advance that the following post describes a situation that does not make technical sense to me. But its the truth, and so I figured “hey, this is a geek website, and maybe some of the people visiting here are geeks, so maybe they’ll appreciate the absurdity”.

As I mentioned in some of my previous postings, I’ve been working at reconfiguring my network. My objective is relatively simple: get my Linux server behind my hardware firewall so that its a bit less exposed. I’ve had problems with my ISP not supporting certain features I needed to allow one to one network address translation. But I figured I could probably manage without that: my new firewall/router supports a ton of port forwarding features, so I should be able to make a one to many NAT work.

But when I moved my Linux server behind the firewall, it was as stable as a drunk walking on a skating rink wearing leather soled shoes. My Linux server’s connection kept failing, usually after less than 10 minutes of connectivity. The box had been stable for months, and when I put it back on the other side of the firewall, it worked fine. So…some sort of strange interaction between the Linksys RV016 router/switch and the servers network configuration?

I tried everything, up to and including going out and buying a new Linksys network interface card for the server. I reset the firewall to factory defaults, tried different cables, reconfigured drivers until I was blue in the face. It still failed as soon as I brought it behind the firewall. But I decided to give it a final go: I’m tenacious, and having sunk probably 10 hours into figuring this out so far, I wasn’t ready to declare defeat.

I decided to make sure I was keeping things as simple as possible. So, I reset the RV016 to factory defaults and set up a test plan that consistently demonstrated failure. I put the server behind the firewall and made zero changes to the RV016- it was still at factory defaults. Then I opened up a command line prompt in windows and started a continuous ping of the Linux box. I then also opened two SSH (secure shell) sessions to the box, and finally opened up SCP and started copying over about four gigabytes of files. Within less than five minutes the connection would fail: ping would stop getting responses, the secure shell sessions would “freeze”, and SCP would stop and time out. Every time, with the connection being “dead” for about 20 seconds before coming back up…beautiful.

I then started making careful, step by step changes. Nothing seemed to be making an improvement, and then I received a response to my support email to Linksys. Here is what they said:

Unstable connection with the router may be caused of different sources. As of this writing, we do not have any reported issues related to SSH connection to our router. We suggest you try the following options:

1. Disable Block Internet / WAN Requests on the router
2. Enable MTU option to maximize data transmission. You can set the MTU size to 1400.
3. Disable any form of firewall installed in all computers
4. Reset and reconfigure the router.
5. Re-flash firmware using the same version, 2.0.6 that you can download from our site.

If you have any further questions, feel free to visit our Knowledge Base at / or send us an e-mail at [email protected] so that we can assist you.

I decided to start from scratch yet again, but with my precious repeatable failure at hand. First I reflashed the firmware and reset to factory defaults (5 and 4 above). That made no apparent difference. I had no software firewalls active on my computers other than Windows XP firewall on my desktop, and I decided to leave that for the time being. Disabling “block Internet/WAN requests” on the RV016 was something I’d tried before unsuccessfully, and it hadn’t made any difference. But I disabled it anyway since the Linksys support folks suggested it.

So, I was down to changing the MTU (Maximum Transmission Unit) size…and guess what? That worked. My connection was suddenly rock solid. But here’s the thing- everything I’ve read indicates that an overly large MTU setting will, at worst, cause packet fragmentation and slow down the connection. It *should not* cause a connection to fail completely for a duration measured in tens of seconds.

I don’t understand why MTU should have this effect- time to do some more research. But I can rest better now knowing that my server is safely behind my firewall 🙂

Technorati Tags: , , ,

This Post Has 5 Comments

  1. Daren

    This is because Linksys has a bug in the way they monitor and calculate their MTU. The RV016 has a feature that allows you to assign up to
    7 ports as WAN ports. This feature does not have an option to only have 1 WAN port (the minimum is 2) bcause of this, each packet is
    analysed when it hits the RV016 and the MTU monitoring function checks to see average MTU size and it’s historical effect on latency.
    I assume you had the same issue we did. When you reset the router to factory settings did the first few SSH connections stay connected longer
    than the ones the followed? Ours did! This tells me that it is possible that the higher the percieved load, the worse the problem gets.

    By setting the MTU manually you caused the router to stop messing with the MTU. That is why SSH now works. Linksys will never admit
    to it but I am convinced they have an error in their Automatic MTU setting.

    Thanks for posting this, I am glad to know I am not the only one that had this issue.

  2. Bob Hahn

    Thanks for leaving this around. A year and a half later, it’s still relevant (unfortunately).

  3. Kelly Adams

    You are most welcome, Bob! Mind you, I’m a knowledge packrat- I don’t throw away historical posts unless I have to 🙂

  4. anonymous

    2008 version 2.0.18 firmware and it is still relevant. Was getting 400KB/sec downloads on a cable modem. Changed the automatic MTU to MANUAL 1500 and it jumped to over 1MB/sec.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.