Skip to content
November 29, 2010 / A network engineer's diary..

Configuring SLA monitors and using them for route tracking and graphing WAN conditions.

 

In this article let’s take a look at the Cisco SLA. Cisco SLA is a toolkit that enables to monitor and measure network statistics such as packet loss, delay and jitter (variable delay) in near real time. Should you choose to act on the measurements, for instance change a static route or use a backup interface when the jitter on the tracked route or tracked interface increases above a set threshold, sla provides with a set of tracking commands to do that too, furthermore one can even write some EEM (embedded event manager) processes on the device to achieve a bit more complex functions (such as sending a email to a tech when the jitter increases or any other non-core functions), we will keep that for the next section.

SLA (or SAA in some older IOS versions) if configured between two routers send a constant stream of live traffic, the type of which depends on the configuration itself, the router at the far end measures the deviation from expected behavior for this traffic and informs the originating router of any such variations. Should the range for deviations exceed a certain threshold, an alarm is generated on the originating router. To better understand the process, let’s consider an example wherein a SLA configured to measure Jitter on a pair of end to end routers. Originating router that participates in the SLA generates a constant stream of packets, time stamps each packet and dispatches it to the far end, the far end router looks at the timestamp on the packet and matches it with its own reception time, even negates its own processing time for the packet from the total and sends this numeric value back to the originating router. Should the deviation recorded exceed a threshold value, originating router generates an alarm.

To better understand this, lets take a look at a live scenario. We have two routers, one at the origination side that generates the live traffic and another at the termination side that receives and reports back the results. We would configure the origination router to send two streams of data, one for measuring the path loss and another stream for measuring the Jitter along the path. We would be using the path loss stream to track the connectivity and data collected from the jitter stream will be utilized to analyze network conditions.

The configured network looks like the one shown above. Let’s proceed with the configurations now; on the originating router we would select the type of traffic that needs to be measured, in the below example we have selected ICMP traffic to measure path loss and a steady stream of traffic on port 10000 to measure the jitter characteristics. Configuration on the originating router is as shown below.

 

Note that each type of test needs a monitor statement of its own. Also note that once a SLA monitor is set a scheduling command needs to be implemented for the time that the SLA measurements are to be taken.

On the terminating end, we use an IP SLA responder to initiate the receiving process as shown below.

Now, let’s take a look at the IP SLA monitoring statistics on the originating router.

Notice that the tests are named as Index 1 and Index 2; also note that the ‘latest operation return code’ is set to ‘OK’ on both the streams. Should the path loss or the jitter go above or below a set threshold, the return code will change to a ‘non OK’ value initiating an alarm.

Using these SLA measurements lets track a static route towards the far end point, so that if and when SLA detects a timeout due to path loss, a preconfigured static floating route kicks in to take its place.

Let’s take a look at the tracking configuration now.

Note that (in the figure above) the tracking number should correspond to the IP SLA index number. Also note that the tracking statement is then tagged to the static route through 10.1.1.2, we also have another floating static through 172.16.17.2 which would come into play should the SLA take the tracking route down.

This is how the reachability looks for 200.1.1.0 subnet when SLA for path loss works fine (when there is no path loss).

Now to simulate a loss of connectivity I have removed the frame relay map, note that the serial interface would still stay up, but the route is lost. Mentioned below is the debug at the originating router. Notice that the rtr 1 state change from up to down.

Now let’s verify the routing table again.

As planned our static route has took the place of tracked route and the subnet 200.1.1.0 is reachable again.

That’s the end of SLA and tracking, now let’s take a look at how the jitter characteristics that we had mapped on stream2 of originating router be displayed in graphical format using cacti (refer to the links mentioned below for configuration of this tool and can be run on any available Linux or BSD distributions available)

Let’s configure the originating router to honor the SNMP walk requests that it gets from cacti

Now we would do the corresponding configurations on CACTI as well (refer to the links mentioned below to install and configure cacti for this purpose)

Here is a graph that charts the round trip time (RTT) on the two router network (can you believe a 40ms delay and all I have is a frame relay switch sitting between two routers).

Graph Template: Cisco – SAA Basic Statistics

Here is another graph that charts the standard deviation and mean of jitter on the network.

Graph Template: Cisco – SAA Jitter Dispersion

G

Cacti can store and display the logs for years, making it really easy to diagnose and resolve time sensitive problems that normally plague WAN links. Thanks for reading on, stay tuned for more articles.

For more installation on CACTI configuration for SNMP walk configuration refer to http://bit.ly/i3l2Jz

For further information on templates that can be used for SLA monitoring refer to http://bit.ly/3P6K5g

Advertisements

One Comment

Leave a Comment
  1. haowu / Dec 5 2010 2:00 am

    really useful to see the SLA/SNMP way to pull up stats result into charts and trig further actions after some certain factor reaches the threshold. A full delicate explaination. cool

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: