大佬教程收集整理的这篇文章主要介绍了网络 – 诊断Ubuntu中的数据包丢失/高延迟,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我们有一个运行Nginx(1.5.2)的Linux机箱(Ubuntu 12.04),它作为一些Tornado和Apache主机的反向代理/负载均衡器.上游服务器在物理上和逻辑上都是关闭的(相同的DC,有时是同一个机架),并且它们之间显示亚毫秒的延迟:
PING appserver (10.xx.xx.112) 56(84) bytes of data.
64 bytes from appserver (10.xx.xx.112): icmp_req=1 ttl=64 time=0.180 ms
64 bytes from appserver (10.xx.xx.112): icmp_req=2 ttl=64 time=0.165 ms
64 bytes from appserver (10.xx.xx.112): icmp_req=3 ttl=64 time=0.153 ms
我们每秒接收大约500个请求的持续负载,并且目前正在看到来自Internet的常规数据包丢失/延迟峰值,即使是基本ping:
sam@AM-KEEN ~> ping -c 1000 loadbalancer
PING 50.xx.xx.16 (50.xx.xx.16): 56 data bytes
64 bytes from loadbalancer: icmp_seq=0 ttl=56 time=11.624 ms
64 bytes from loadbalancer: icmp_seq=1 ttl=56 time=10.494 ms
... many packets later ...
request timeout for icmp_seq 2
64 bytes from loadbalancer: icmp_seq=2 ttl=56 time=1536.516 ms
64 bytes from loadbalancer: icmp_seq=3 ttl=56 time=536.907 ms
64 bytes from loadbalancer: icmp_seq=4 ttl=56 time=9.389 ms
... many packets later ...
request timeout for icmp_seq 919
64 bytes from loadbalancer: icmp_seq=918 ttl=56 time=2932.571 ms
64 bytes from loadbalancer: icmp_seq=919 ttl=56 time=1932.174 ms
64 bytes from loadbalancer: icmp_seq=920 ttl=56 time=932.018 ms
64 bytes from loadbalancer: icmp_seq=921 ttl=56 time=6.157 ms
--- 50.xx.xx.16 ping statistics ---
1000 packets transmitted,997 packets received,0.3% packet loss
round-trip min/avg/max/stddev = 5.119/52.712/2932.571/224.629 ms
模式始终是相同的:事情运行良好一段时间(<20ms),然后ping完全下降,然后三或四个高延迟ping(> 1000ms),然后它再次稳定下来.="" 流量通过绑定的公共接口(我们称之为bond0)进行配置,如下所示:="">20ms),然后ping完全下降,然后三或四个高延迟ping(>>
bond0 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5d
inet addr:50.xx.xx.16 Bcast:50.xx.xx.31 Mask:255.255.255.224
inet6 addr: cope:Global
inet6 addr: cope:Link
UP broaDCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:527181270 errors:1 dropped:4 overruns:0 frame:1
TX packets:413335045 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:240016223540 (240.0 GB) TX bytes:104301759647 (104.3 GB)
然后通过http将请求提交到专用网络上的上游服务器(我们可以称之为bond1),其配置如下:
bond1 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5c
inet addr:10.xx.xx.70 Bcast:10.xx.xx.127 Mask:255.255.255.192
inet6 addr: cope:Link
UP broaDCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:430293342 errors:1 dropped:2 overruns:0 frame:1
TX packets:466983986 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:77714410892 (77.7 GB) TX bytes:227349392334 (227.3 GB)
uname -a的输出:
Linux< hostname=""> 3.5.0-42-generic#65~minision1-Ubuntu SMP Wed Oct 2 20:57:18 UTC 2013 x86_64 GNU / Linux
我们已经定制了sysctl.conf以尝试解决问题,但没有成功. /etc/sysctl.conf的输出(省略了无关的配置):
# net: core
net.core.netdev_max_BACklog = 10000
# net: ipv4 stack
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_max_syn_BACklog = 10000
net.ipv4.tcp_congestion_control = cubic
net.ipv4.ip_local_port_range = 8000 65535
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_thin_dupack = 1
net.ipv4.tcp_thin_linear_timeouts = 1
net.netfilter.nf_conntrack_max = 99999999
net.netfilter.nf_conntrack_tcp_timeout_established = 300
输出dmesg -d,禁止非ICMP UFW消息:
[508315.349295 < 19.852453>]="" [ufw="" block]="" in="bond1" out=""> < 0.443127>]="" peer="" 190.xx.xx.131:59705/80="" unexpectedly="" shrunk="" window="" 1155488866:1155489425="" (repaired)="">
如何在Debian-family Linux机器上诊断出这个问题的原因?
如果你有TCP流量,你可以解决这个问题,因为内核中有计数器跟踪TCP采取恢复步骤来解决流中丢失的数据包.看一下netstat上的-s(stats)选项.显示的值是计数器,因此您需要观察它们一段时间以了解正常情况和异常情况,但数据是存在的.重传和数据丢失计数器特别有用.
[sadadmin@busted ~]$netstat -s | egrep -i 'loss|retran'
2058 segments retransmited
526 times recovered from packet loss due to SACK data
193 TCP data loss events
TCPLostRetransmit: 7
2 timeouts after reno fast retransmit
1 timeouts in loss state
731 fast retransmits
18 forWARD retransmits
97 retransmits in slow start
4 sack retransmits Failed
有些工具会对这些值进行采样并为您提供趋势,以便您可以轻松查看出现问题的时间.我用munin.
以上是大佬教程为你收集整理的网络 – 诊断Ubuntu中的数据包丢失/高延迟全部内容,希望文章能够帮你解决网络 – 诊断Ubuntu中的数据包丢失/高延迟所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。