Posted: 2/27/2010 9:23:08 PM EDT
|
Need networking gurus in here. Scenario one I have two linux servers. These servers have two network ports and are connected by two ethernet cables. The interfaces are bridged. If one cable is unplugged or suffers no connectivity then they switch to the other link. However, if there is just packet loss, even 50% or more, they continue to use the crappy link even if the other is good. Is there a routing daemon or something I can use to get it to kick over to the other link? Scenario two I have two multipathed linux servers in different datacenters. Again they can talk to the other server over two different paths. One path starts experiencing extreme packet loss but not total failure. I need something that automatically measures packet loss through each route and then changes the route to the more reliable link. -Foxxz |
|
Quagga
Has BGP, OSPF, and RIP –––––––––––––––––––––––––––––––––––––––– Here is how I Load Balance between two ISP's (Gateway's): 1. We must delete all default routes that are created upon bootup $/sbin/route del default gw 192.168.6.1 $/sbin/route del default gw 10.85.0.1 2. Setup iptables to allow packets from internal network to the internet: a.Route packets to the Internet connected to first ISP (ISP1) $/sbin/iptables –t nat –A POSTROUTING –o eth1 –j MASQUERADE b. Route packets to the Internet connected to second ISP (ISP2) $/sbin/iptables –t nat –A POSTROUTING –o eth2 –j MASQUERADE c. Forwarding from internal LAN $/sbin/iptables –A FORWARD –s 10.0.0.0/24 –j ACCEPT $/sbin/iptables –A FORWARD –d 10.0.0.0/24 –j ACCEPT $/sbin/iptables –A FORWARD –s ! 10.0.0.0/24 –j DROP d. Save the configuration $/sbin/iptables-save > /etc/sysconfig/iptables 3. We MUST enable ip_forwarding. There are two ways to do this: a. $echo 1 > /prov/sys/net/ipv4/ip_forwarding ***this will be lost upon the next reboot*** b. edit /etc/sysctl.conf: net.ipv4.ip_forward = 1 3. Load Balance the routes $/sbin/ip route add default equalize nexthop via 192.168.6.1 dev eth1 nexthop via 10.85.0.1 dev eth2 –––––––––––––––––––––––––––––––––––––––– I would suspect that this coupled with "tc": http://linux-ip.net/articles/Traffic-Control-HOWTO/ |
|
Unfortunately this is not a load issue or something where load balancing will help. But its a good suggestion. This is simply a case where one link can go bad with packet loss. The daemons are configured with a list of IPs to connect to. Depending on which IP they choose to connect to routing will send it out a different path. IE there are two IPs the daemon can connect to. One is routed out one path and the other is routed down the alternate path. It tries the first IP and uses it unless it can't connect. Then it tries the second one. It only cares that it got a connection. But there is no logic anywhere that causes it to flip over to the second if there is packet loss, congestion, slowness, etc. In fact the daemon wouldn't even see packet loss thanks to the nature of the OS handling the TCP session and retransmits. If it gets disconnected it will simply reconnect to the same IP over the same link. Only if its retries timeout will it try the second IP and link. I need to find something that pings down the separate links or makes a separate connection down each, and then updates the routing table accordingly depending on which has the lowest packet loss. I would think there is a routing protocol that would handle this situation but my studies have turned up dry. -Foxxz |
|
Understood.
Why couldn't you write a bash script to to us TC, ping, and cron to monitor the network segments, say every 5 seconds, and then add or del a route entry as needed? Maybe even incorporate Iperf to actively obtain the bandwidth and jitter specs for that link at that particular time. |
|
I see others have pointed out some technical workarounds, but I thought I'd just ask this:
WHY are you having links see ~50% packet loss? If you control these links, you need to find the root problem. If you are paying someone else for these links, check your SLAs and call them on the carpet to fix the problem at the source. Now, admittedly, you have found a weakness in your overall reliability scheme in that your servers SHOULD switch when a link becomes unstable, BUT that situation should be a rare failure, not a regular occurrence. I would be tracking down both issues, how to work around any link 'degradation', as well as preventing that degradation from being a day-to-day occurrence. FluxPrism |
|
the two guys above are right:
- first fix the problem, you should not be losing that many packets, start with your physical layer and check the cabling and NICs - second, create a technical workaround in case this happens again, you can check the output of the ifconfig command at random intervals, switch to the good interface/route, and trigger an alert |
|
Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz |
|
Quoted:
Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that |
|
Quoted: Quoted: Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that If you have a 10mbps link and your provider starts getting 12mbps of traffic destined to your IPs it starts dropping 2mbps of random packets. The drops are upstream and you will not see any link problems on your interfaces. -Foxxz |
|
Quoted:
Quoted:
Quoted:
Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that If you have a 10mbps link and your provider starts getting 12mbps of traffic destined to your IPs it starts dropping 2mbps of random packets. The drops are upstream and you will not see any link problems on your interfaces. -Foxxz well that's a whole different ballgame, your provider should be tracking those statistics and notifying you somehow if your connection is that critical |
|
Quoted: Quoted: Quoted: Quoted: Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that If you have a 10mbps link and your provider starts getting 12mbps of traffic destined to your IPs it starts dropping 2mbps of random packets. The drops are upstream and you will not see any link problems on your interfaces. -Foxxz well that's a whole different ballgame, your provider should be tracking those statistics and notifying you somehow if your connection is that critical Thats how the internet deals with congestion. If theres more traffic than bandwidth it either routes it down an alternate route or drops it if thats not possible. TCP realizes theres dropped packets, retransmits the drops, and slows down until there are no drops. -Foxxz |
|
Quoted:
Quoted:
Quoted:
Quoted:
Quoted:
Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that If you have a 10mbps link and your provider starts getting 12mbps of traffic destined to your IPs it starts dropping 2mbps of random packets. The drops are upstream and you will not see any link problems on your interfaces. -Foxxz well that's a whole different ballgame, your provider should be tracking those statistics and notifying you somehow if your connection is that critical Thats how the internet deals with congestion. If theres more traffic than bandwidth it either routes it down an alternate route or drops it if thats not possible. TCP realizes theres dropped packets, retransmits the drops, and slows down until there are no drops. -Foxxz yea i know that... in your original post though, you made it seem like the packet loss was local to the boxes... and needed a way to measure that loss... now you're saying there is no way to know about the packet loss b/c it's happening upstream...
|
|
Quoted:
Quoted:
Quoted:
Quoted:
Quoted:
Quoted:
Fixing packet loss would be great. I'm dealing with systems that will be DDoS'd as a fact of life. Packet loss is unavoidable due to link saturation. Mitigation is in place. This is just another part of said mitigation. Its working well aside from not having an automated way to deal with the packet loss or routing changeover. -Foxxz link saturation should be handled at a firewall and IDS/IPS, not on the actual server itself. setup a cron job to run a bash script that runs netstat -i to show statistics for the interface... grabs the field you want (awk or cut) for each interface (for example, TX and RX DROP) and then do a simple check to see if that exceeds some threshold (if $eth0_loss > $threshold; then force_failover), you can even get fancy and log the values so you can re-enable the interface if the attack has subsided ($eth0_loss_then=`cat $eth0_loss_log`; let delta=$eth0_loss_now-$eth0_loss_then; if $delta > $threshold; then force_failover). or something like that If you have a 10mbps link and your provider starts getting 12mbps of traffic destined to your IPs it starts dropping 2mbps of random packets. The drops are upstream and you will not see any link problems on your interfaces. -Foxxz well that's a whole different ballgame, your provider should be tracking those statistics and notifying you somehow if your connection is that critical Thats how the internet deals with congestion. If theres more traffic than bandwidth it either routes it down an alternate route or drops it if thats not possible. TCP realizes theres dropped packets, retransmits the drops, and slows down until there are no drops. -Foxxz yea i know that... in your original post though, you made it seem like the packet loss was local to the boxes... and needed a way to measure that loss... now you're saying there is no way to know about the packet loss b/c it's happening upstream... ![]() Yeah, you actually need almost the exact Opposite of what you originally asked for. Instead of a Packet Loss-triggered mechanism for switching links, you are now describing a need for a Congestion Management-triggered switching of links. Define your problem better and someone might be better able to assist you. FluxPrism |
|
I used the 10mbps link as an example. The link is getting saturated upstream from my equipment where I won't be able to detect it. My only clue that there is something amiss is my connection goes crappy and I start seeing packet loss and alot of TCP retransmits. Sorry to confuse. -Foxxz |