step3 ping / traceroute の確認を楽に(exp_analyzer)

障害試験時の断時間測定/経路確認ツールとしての ping(ExPing) と exp_analyzer についてです。

down

たとえば下記のような流れで障害試験を行うとします。

  1. 通信要件に沿って、PC 間で ping を連続的に実施しておいて障害を発生させる。
  2. ping は一旦 NG になるが、NW が収束すると再び ping OK となる。
  3. ping ログから NG となっていた時間を計算して、期待した時間で NW が収束しているかを判断する。
  4. traceroute を取得し、期待した経路であることを確認する。


この作業の ping/tracert を CMD で行うのは現実的ではなく、ExPing を利用する方が多いのではないでしょうか。私もずっとお世話になっています。

http://www.woodybells.com/exping.html


そして、ネットワークテスト向けに ExPing のログを把握しやすくするツールとして exp_analyzer を作りました。

exp_analyzer




Ping ログから断時間を読み取る

上の図のような簡単な構成で、1:1 で ping しているならログから断時間を読み取るのは簡単です。

"結果","日時","対象","IPアドレス","ステータス","備考"
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time:     5ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time:     5ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time:     6ms",""
"OK","2014/08/21 17:07:28","192.168.0.1","192.168.0.1","Time:     5ms",""
"NG","2014/08/21 17:07:29","192.168.0.1","","Request timed out",""
"NG","2014/08/21 17:07:30","192.168.0.1","","Request timed out",""
"NG","2014/08/21 17:07:31","192.168.0.1","","Request timed out",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time:     5ms",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time:     5ms",""
"OK","2014/08/21 17:07:32","192.168.0.1","192.168.0.1","Time:     6ms",""

上記ならすぐに把握できます。では下記のログではどうでしょう?

"結果","日時","対象","IPアドレス","ステータス","備考"
"OK","2010/11/05 11:44:27",1.1.1.1,,"Time:     5ms",OK-OK-OK-OK-OK
"OK","2010/11/05 11:44:27",2.2.2.2,,"Time:     5ms",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:27",3.3.3.3,,"Time:     5ms",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:27",4.4.4.4,,"Time:     5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:27",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"NG","2010/11/05 11:44:27",6.6.6.6,,"Request timed out",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:27",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:27",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"NG","2010/11/05 11:44:27",90.90.90.90,,"Request timed out",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:27",100.100.100.100,,"Time:     5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:28",1.1.1.1,,"Time:     5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:28",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"NG","2010/11/05 11:44:28",3.3.3.3,,"Request timed out",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:28",4.4.4.4,,"Time:     5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:28",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:28",6.6.6.6,,"Time:     5ms",NG-OK-OK-OK-NG
"OK","2010/11/05 11:44:28",7.7.7.7,,"Time:     5ms",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:28",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"NG","2010/11/05 11:44:28",90.90.90.90,,"Request timed out",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:28",100.100.100.100,,"Time:     5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:29",1.1.1.1,,"Time:     5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:29",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:29",3.3.3.3,,"Time:     5ms",OK-NG-OK-NG-OK
"NG","2010/11/05 11:44:29",4.4.4.4,,"Request timed out",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:29",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:29",6.6.6.6,,"Time:     5ms",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:29",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"OK","2010/11/05 11:44:29",8.8.8.8,,"Time:     5ms",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:29",90.90.90.90,,"Time:     5ms",NG-NG-OK-OK-OK
"OK","2010/11/05 11:44:29",100.100.100.100,,"Time:     5ms",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:30",1.1.1.1,,"Time:     5ms",OK-OK-OK-OK-OK
"NG","2010/11/05 11:44:30",2.2.2.2,,"Request timed out",OK-NG-NG-NG-OK
"NG","2010/11/05 11:44:30",3.3.3.3,,"Request timed out",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:30",4.4.4.4,,"Time:     5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:30",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"OK","2010/11/05 11:44:30",6.6.6.6,,"Time:     5ms",NG-OK-OK-OK-NG
"OK","2010/11/05 11:44:30",7.7.7.7,,"Time:     5ms",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:30",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:30",90.90.90.90,,"Time:     5ms",NG-NG-OK-OK-OK
"NG","2010/11/05 11:44:30",100.100.100.100,,"Request timed out",OK-OK-OK-NG-NG
"OK","2010/11/05 11:44:31",1.1.1.1,,"Time:     5ms",OK-OK-OK-OK-OK
"OK","2010/11/05 11:44:31",2.2.2.2,,"Time:     5ms",OK-NG-NG-NG-OK
"OK","2010/11/05 11:44:31",3.3.3.3,,"Time:     5ms",OK-NG-OK-NG-OK
"OK","2010/11/05 11:44:31",4.4.4.4,,"Time:     5ms",OK-OK-NG-OK-OK
"NG","2010/11/05 11:44:31",5.5.5.5,,"Request timed out",NG-NG-NG-NG-NG
"NG","2010/11/05 11:44:31",6.6.6.6,,"Request timed out",NG-OK-OK-OK-NG
"NG","2010/11/05 11:44:31",7.7.7.7,,"Request timed out",NG-OK-NG-OK-NG
"NG","2010/11/05 11:44:31",8.8.8.8,,"Request timed out",NG-NG-OK-NG-NG
"OK","2010/11/05 11:44:31",90.90.90.90,,"Time:     5ms",NG-NG-OK-OK-OK
"NG","2010/11/05 11:44:31",100.100.100.100,,"Request timed out",OK-OK-OK-NG-NG

ExPing は 1台から複数の宛先に ping を繰り返すことが可能なので、複数の宛先を設定していると上記のようなログになります。また、ping のパターンは必ず OK -> NG -> OK になるとは限りません。こうした複雑なログから、宛先ごとの断時間を把握するのは厄介です。

そこで exp_analyzer に上記のログを処理させると下記のような出力になります。

result.html




Traceroute を比較する

exp_analyzer は ExPing が生成した traceroute のログを比較することができます。 正常時と障害時を比較することで、経路を把握しやすくなります。

正常時

TraceRoute
    30.60.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.128.5.254
        #004       0ms  20.128.6.253
        #005       0ms  1.1.1.1
        #006       0ms  30.60.1.1
    30.70.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.128.5.253
        #004       0ms  20.128.7.253
        #005       0ms  1.1.1.2
        #006       0ms  30.70.1.1
    30.80.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.129.27.253
        #004       0ms  20.128.8.2
        #005       1ms  3.3.3.1
        #006       1ms  30.80.1.1
    30.90.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.129.27.254
        #004       0ms  20.128.9.2
        #005       1ms  3.3.3.2
        #006       1ms  30.90.1.1
    30.10.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.252
        #003       0ms  20.129.23.241
        #004       1ms  20.244.0.5
        #005       1ms  2.2.2.1
        #006       1ms  30.10.1.1

障害時

TraceRoute
    30.60.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.128.5.254
        #004       0ms  20.128.6.253
        #005       1ms  1.1.1.1
        #006       1ms  30.60.1.1
    30.70.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.128.5.252
        #004       1ms  20.128.7.252
        #005       1ms  1.1.1.2
        #006       1ms  30.70.1.1
    30.80.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.129.27.253
        #004       0ms  20.128.8.2
        #005       0ms  3.3.3.1
        #006       1ms  30.80.1.1
    30.90.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.251
        #003       0ms  20.129.27.254
        #004       1ms  20.128.9.2
        #005       1ms  3.3.3.2
        #006       1ms  30.90.1.1
    30.10.1.1
        #001       0ms  192.168.134.141
        #002       0ms  20.129.16.252
        #003       0ms  20.129.23.241
        #004       1ms  20.244.0.5
        #005       1ms  2.2.2.1
        #006       1ms  30.10.1.1

ぱっと見で両者の違いが分かりますか? exp_analyzer は上の 2 つを比較して把握しやすくします。

下記は上記の2ファイルを exp_analyzer で比較したものです。

trace_result.html




多数の PC に対応する

テスト環境が大きくなってくると多数の PC が存在しており、通信対地も多数になります。 そのため、exp_analyzer は複数のファイルをいっぺんに処理できるようになっています。

base_nw

いっぺんに処理するには ExPing の ping/traceroute ログを一箇所に集める必要があります。 PC に 2つ目の NIC を用意し、ログ収集用の NW を作っておくと簡単にログを集められます。無線 LAN で構築できるならそれも良いと思います。

base_nw_diagram_w_B2B

各 PC で ftp server を立ち上げておいて、ログ収集 PC からバッチでログを収集できるようにしておくと良いです。

バッチを作るのが面倒な人向けに、複数の ftp server から file を収集するための tool を作ってみました。

http://www.ne.jp/asahi/yam/tools/ftp_multi_get.zip

構成をここまで作り込むのはちょっと大変ですが、PC 台数が多い場合は検討してみてください。ping/traceroute 以外のログも簡単に収集できますし、収集したログはバックアップにもなります。

social