障害試験時の断時間測定/経路確認ツールとしての ping(ExPing) と exp_analyzer についてです。
たとえば下記のような流れで障害試験を行うとします。
- 通信要件に沿って、PC 間で ping を連続的に実施しておいて障害を発生させる。
- ping は一旦 NG になるが、NW が収束すると再び ping OK となる。
- ping ログから NG となっていた時間を計算して、期待した時間で NW が収束しているかを判断する。
- traceroute を取得し、期待した経路であることを確認する。
この作業の ping/tracert を CMD で行うのは現実的ではなく、ExPing を利用する方が多いのではないでしょうか。私もずっとお世話になっています。
http://www.woodybells.com/exping.html
そして、ネットワークテスト向けに ExPing のログを把握しやすくするツールとして exp_analyzer を作りました。
Ping ログから断時間を読み取る
上の図のような簡単な構成で、1:1 で ping しているならログから断時間を読み取るのは簡単です。
"結果","日時","対象","IPアドレス","ステータス","備考"
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time: 5ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time: 5ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time: 6ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time: 5ms",""
"NG","2014/08/21 17:07:29","192.168.0.1","","Request timed out",""
"NG","2014/08/21 17:07:30","192.168.0.1","","Request timed out",""
"NG","2014/08/21 17:07:31","192.168.0.1","","Request timed out",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time: 5ms",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time: 5ms",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time: 6ms",""
上記ならすぐに把握できます。では下記のログではどうでしょう?
"結果","日時","対象","IPアドレス","ステータス","備考"
"OK","2010/11/05 11:44:27",1.1.1.1,,"Time: 5ms",OK-OK-OK-OK-OK
"OK","2010/11/05 11:44:27",2.2.2.2,,"Time: 5ms",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:27",3.3.3.3,,"Time: 5ms",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:27",4.4.4.4,,"Time: 5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:27",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"NG","2010/11/05 11:44:27",6.6.6.6,,"Request timed out",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:27",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:27",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"NG","2010/11/05 11:44:27",90.90.90.90,,"Request timed out",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:27",100.100.100.100,,"Time: 5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:28",1.1.1.1,,"Time: 5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:28",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"NG","2010/11/05 11:44:28",3.3.3.3,,"Request timed out",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:28",4.4.4.4,,"Time: 5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:28",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:28",6.6.6.6,,"Time: 5ms",NG-OK-OK-OK-NG
"OK","2010/11/05 11:44:28",7.7.7.7,,"Time: 5ms",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:28",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"NG","2010/11/05 11:44:28",90.90.90.90,,"Request timed out",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:28",100.100.100.100,,"Time: 5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:29",1.1.1.1,,"Time: 5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:29",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:29",3.3.3.3,,"Time: 5ms",OK-NG-OK-NG-OK
"NG","2010/11/05 11:44:29",4.4.4.4,,"Request timed out",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:29",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:29",6.6.6.6,,"Time: 5ms",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:29",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"OK","2010/11/05 11:44:29",8.8.8.8,,"Time: 5ms",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:29",90.90.90.90,,"Time: 5ms",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:29",100.100.100.100,,"Time: 5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:30",1.1.1.1,,"Time: 5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:30",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"NG","2010/11/05 11:44:30",3.3.3.3,,"Request timed out",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:30",4.4.4.4,,"Time: 5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:30",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:30",6.6.6.6,,"Time: 5ms",NG-OK-OK-OK-NG
"OK","2010/11/05 11:44:30",7.7.7.7,,"Time: 5ms",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:30",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:30",90.90.90.90,,"Time: 5ms",NG-NG-OK-OK-OK
"NG","2010/11/05 11:44:30",100.100.100.100,,"Request timed out",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:31",1.1.1.1,,"Time: 5ms",OK-OK-OK-OK-OK
"OK","2010/11/05 11:44:31",2.2.2.2,,"Time: 5ms",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:31",3.3.3.3,,"Time: 5ms",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:31",4.4.4.4,,"Time: 5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:31",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"NG","2010/11/05 11:44:31",6.6.6.6,,"Request timed out",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:31",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:31",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:31",90.90.90.90,,"Time: 5ms",NG-NG-OK-OK-OK
"NG","2010/11/05 11:44:31",100.100.100.100,,"Request timed out",OK-OK-OK-NG-NG
ExPing は 1台から複数の宛先に ping を繰り返すことが可能なので、複数の宛先を設定していると上記のようなログになります。また、ping のパターンは必ず OK -> NG -> OK になるとは限りません。こうした複雑なログから、宛先ごとの断時間を把握するのは厄介です。
そこで exp_analyzer に上記のログを処理させると下記のような出力になります。
Traceroute を比較する
exp_analyzer は ExPing が生成した traceroute のログを比較することができます。 正常時と障害時を比較することで、経路を把握しやすくなります。
正常時
TraceRoute
30.60.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.128.5.254
#004 0ms 20.128.6.253
#005 0ms 1.1.1.1
#006 0ms 30.60.1.1
30.70.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.128.5.253
#004 0ms 20.128.7.253
#005 0ms 1.1.1.2
#006 0ms 30.70.1.1
30.80.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.129.27.253
#004 0ms 20.128.8.2
#005 1ms 3.3.3.1
#006 1ms 30.80.1.1
30.90.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.129.27.254
#004 0ms 20.128.9.2
#005 1ms 3.3.3.2
#006 1ms 30.90.1.1
30.10.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.252
#003 0ms 20.129.23.241
#004 1ms 20.244.0.5
#005 1ms 2.2.2.1
#006 1ms 30.10.1.1
障害時
TraceRoute
30.60.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.128.5.254
#004 0ms 20.128.6.253
#005 1ms 1.1.1.1
#006 1ms 30.60.1.1
30.70.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.128.5.252
#004 1ms 20.128.7.252
#005 1ms 1.1.1.2
#006 1ms 30.70.1.1
30.80.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.129.27.253
#004 0ms 20.128.8.2
#005 0ms 3.3.3.1
#006 1ms 30.80.1.1
30.90.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.251
#003 0ms 20.129.27.254
#004 1ms 20.128.9.2
#005 1ms 3.3.3.2
#006 1ms 30.90.1.1
30.10.1.1
#001 0ms 192.168.134.141
#002 0ms 20.129.16.252
#003 0ms 20.129.23.241
#004 1ms 20.244.0.5
#005 1ms 2.2.2.1
#006 1ms 30.10.1.1
ぱっと見で両者の違いが分かりますか? exp_analyzer は上の 2 つを比較して把握しやすくします。
下記は上記の2ファイルを exp_analyzer で比較したものです。
多数の PC に対応する
テスト環境が大きくなってくると多数の PC が存在しており、通信対地も多数になります。 そのため、exp_analyzer は複数のファイルをいっぺんに処理できるようになっています。
いっぺんに処理するには ExPing の ping/traceroute ログを一箇所に集める必要があります。 PC に 2つ目の NIC を用意し、ログ収集用の NW を作っておくと簡単にログを集められます。無線 LAN で構築できるならそれも良いと思います。
各 PC で ftp server を立ち上げておいて、ログ収集 PC からバッチでログを収集できるようにしておくと良いです。
バッチを作るのが面倒な人向けに、複数の ftp server から file を収集するための tool を作ってみました。
http://www.ne.jp/asahi/yam/tools/ftp_multi_get.zip
構成をここまで作り込むのはちょっと大変ですが、PC 台数が多い場合は検討してみてください。ping/traceroute 以外のログも簡単に収集できますし、収集したログはバックアップにもなります。