In several previous blog posts I wrote about the Flowmon IPFIX Extensions with focus on the application layer visibility. Today, I will take you on a tour through the lower layers - network and transport - and write about Network Performance Monitoring (NPM). I will explain what and how NPM metrics are monitored by Flowmon and why you should be interested in them.
Fig 1: Example of top 10 statistic ordered by Server Response Time.
Network Performance Monitoring Metrics
As mentioned above, the monitoring of NPM metrics extends the traditional IPFIX monitoring carried out by Flowmon Probes and can be monitored even on 100Gb links. The Probe measures NPM metrics and exports them to the Flowmon Collector in a flow record with information like IP addresses, ports, timestamps and all other data fields you are used to seeing in flows. You can work with NPM metrics like with any other flow field – use NPM metrics for filtering flows, creating top N statistics (e.g. top 10 hosts with the highest network delay), reports (e.g. performance of data storage server) and alerts (e.g. send me an e-mail when the server has a high response time). Thanks to these analytical capabilities of the collector, we have an overview of the current and long-term state of network performance.
Fig 2: Example of list of flows with NPM metrics.
How are metrics measured?
Flowmon Probes measure the following NPM metrics:
- Round-Trip Time (RTT) – delay introduced by the network while establishing TCP connection (TCP handshake). RTT is measured as a delay between SYN and ACK packets – between the first and second packet sent by the client.
- Server Response Time (SRT) – delay introduced by the server is measured as a delay between confirming the first request of a client and the first packet of a response.
- Delay – delay between individual packets of server response. Delay is measured as a difference between timestamps of the second (current) and previous packet. The same applies for all subsequent packets.
- Jitter – variance in delay is measured for flows with three or more packets. Jitter is measured by measuring delays between the first and second packet, then the second and third packet, and the result is the difference between both delays. The same applies for all subsequent packets.
- TCP Retransmissions – number of packets needed to be sent again because of packet damage or loss. Retransmissions are identified when two packets have the same Sequence number in the TCP header while the Identification number in the IP datagram differs.
- Out-of-order packets – number of packets received in the wrong order, measured by monitoring the Sequence number in the TCP header.
Fig 3: How Flowmon Probe measures NPM metrics.
What NPM metrics can indicate?
NPM metrics can significantly help with network performance troubleshooting . Using the SRT and RTT metrics we can distinguish between delays in the network infrastructure (e.g. malfunctioning access point), from delays in the server (e.g. not enough HW resources). This kind of information is crucial for fast network troubleshooting. The delay and jitter metrics should interest us especially when we use VoIP calls or videoconferences, as they can indicate bad audio and video quality . When we are talking about transferring large volumes of data, we are interested mainly in the number of TCP retransmissions (see previous blog post), which can indicate problems on the physical layer (e.g. interference, faulty port) and lower bit rate, or out-of-order packets, which can indicate failures in communication links .
NPM Metrics Visualization
Since Flowmon 8.03, NPM metrics Round-Trip Time (RTT), Server Response Time (SRT), and jitter are visualized in Flowmon Monitoring Center / Analysis. Visualized metrics can help you get an at-a-glance insight into your network performance without the need of running a query over the flow data. Metrics are visualized for each profile channel and averaged when more channels are selected.
Fig 4: NPM metrics visualization.
Do not forget to enable NPM metrics visualization by clicking on “Change displayed channels” button in Analysis and then check “Display NPM statistics”. Only after this will you see the RTT (Round Trip Time), SRT (Server Response Time) metrics, and Jitter statistics in a chart.
NPM metrics will help you keep your network in “good shape”, end arguments with application admins (“for the last time, it’s not the network”), identify and solve performance problems before they affect more users and the business . Try it yourself in our demo or download a free trial. Share your experience with NPM in the comment section below.