Friday, 27 April 2018

VoIP Basics

Voice over IP (VoIP) allows voice services to run over IP networks. This post describes the basic concepts of the technologies and protocols used in a VoIP environment.
Two types of traffic exist in a VoIP platform: signaling and media. Signaling messages are used in order to establish, control and terminate calls, while media traffic is the actual voice data. First signaling messages are exchanged between the endpoints in order to establish a session and negotiate various media parameters. As soon as this step is completed, then media are allowed to pass through this session using the negotiated parameters. In order to modify the session during the call or to even terminate the call, signaling messages need to be exchanged again. 
The dominant signaling protocol used for VoIP today is the Session Initiation Protocol (SIP). SIP contains a variety of signaling messages (requests and replies), and it uses different logical entities in order to process the calls. During the signaling setup, the SIP messages contain a Session Description Protocol (SDP) body which is the protocol used to negotiate the various media parameters between the endpoints (such as IP addresses, codecs, etc.). Finally, for the media the Real-Time Transport Protocol (RTP) is used in order to carry the voice packets. More information about the protocols used in VoIP can be found in the corresponding Request For Comments (RFC):
Overall the voice data can travel via different kind of networks; most commonly via traditional TDM-based PSTN networks and, in the case of VoIP, via IP networks such as the public Internet. The end-user devices can be either IP-based phones (such as softphones or IP phones) or regular PSTN telephones. This means that there can be different call scenarios between users where a single call has to traverse through different networks, for example: PSTN-to-IP or IP-to-PSTN, which requires translation techniques between the different platforms.
To summarize, the table below presents the network components that a VoIP solution can consist of:
VoIP network components
Description
End-user devices
They are the endpoints of the calls and can be either IP- or PSTN-based.
Proxy server
They process the signaling messages on behalf of endpoints and control the calls.
Session Border Controller (SBC)
They are placed in the border of a VoIP network and provide topology hiding, security, session management, etc.
Gateways
They are responsible for the interconnection and translation between VoIP and PSTN platforms.
In later posts, I will describe more in detail the different protocols and procedures used in VoIP

A Basic SIP Call Flow

There are many different SIP scenarios and call flows in a VoIP environment. This post describes a very basic SIP call flow case where A is the caller and B is the recipient. Users A and B probably have a SIP proxy server each handling the signaling on behalf of them.
When A wants to initiate a new call, it sends an initial INVITE to B. This INVITE contains various headers with signaling information such as A’s and B’s addresses/phone numbers, SIP path information, etc. The INVITE also carriers a Session Description Protocol (SDP) body with information regarding the media settings that A supports/prefers e.g. codecs and media addresses. When B’s SIP proxy receives the INVITE, it sends back an “100 Trying” SIP response which means that it has accepted the INVITE and it processes it. When B’s phone starts ringing, a “180 Ringing” is sent back to notify A.
When B answers the phone, a “200 OK” SIP message is sent back to A. This message usually contains a SDP body with the media settings that B supports/prefers. The point here is that A and B -via SDP- are negotiating the media parameters that will be used during the call.  Finally A replies with an ACK in order to confirm that the “200 OK” has been received. From that moment, the two parties can start speaking with each other using the negotiated media parameters. When one of the parties releases the call, a BYE message is sent to the other party who in turn sends back a “200 OK” to confirm the call release.

Note that it is possible that other SIP messages contain a SDP body as well, for example the “180 Trying” (taking into consideration the RFCs). Note also that there could be other intermediary SIP devices in the path between A and B, e.g. a SIP proxy or B2BUA.
This example is a very basic scenario of how a successful SIP call can be established and terminated. Later posts will describe more in detail the various SIP call flows as well as signaling and media parameters and messages.

Introduction to SIP
The Session Initiation Protocol (SIP) is the dominant signaling protocol used in VoIP today. It is responsible for the establishment, control and termination of sessions by exchanging ASCII-text-based messages between the endpoints. This post goes through the basic components of SIP: messages and logical entities.
There are two types of SIP messages: requests and responses. The table below summarizes the most important SIP requests and the types of SIP responses:
SIP Requests
SIP Responses
INVITE
It initiates the session.
1xx Provisional
The request is received and is being processed.
REGISTER
It requests the registration of the user.
2xx Success
The request was successfully received and accepted.
ACK
It confirms final responses.
3xx Redirection
Further process is needed for this request.
BYE
It terminates the session.
4xx Client Error
The request has bad syntax or cannot be processed.
CANCEL
It cancels the current signaling process.
5xx Server Error
The server cannot process the request.
OPTIONS
It negotiates the capabilities of endpoints.
6xx Global Failure
Request cannot be processed by any server.
There are various logical entities in a SIP environment that are responsible for processing the SIP messages and they serve different roles. The table below provides an overview of logical entities:
SIP Logical Entities
Description
User Agent (UA)
It is the endpoint of a SIP session and can function as a Client (UAC) or a Server (UAS). The clients are sending new requests (e.g. to initiate or terminate sessions) while the servers are processing and responding to the requests.
Proxy Server
It acts on behalf of the end users functioning both as UAC and UAS.
Registrar Server
It is responsible for registering the contact and location information of the users (processing the REGISTER requests).
Redirect Server
It returns information about the location of the called party.
Back-To-Back User Agent (B2BUA)
Acts as an intermediary in a SIP session. It can process SIP messages, alter their content and regenerate them to the other party. Typical example of B2BUA is the Session Border Controller (SBC).
In a call scenario all these entities are working together and exchanging SIP messages during the signaling phase. SIP is also co-operating with other protocols in a VoIP environment such as the Session Description Protocol (SDP). All the above VoIP entities and protocols, as well as the various call flows and troubleshooting procedures will be analysed extensively in later posts of this blog.

How to Analyze VoIP SIP Calls in Wireshark

Analyzing SIP packets is one of the most common ways to troubleshoot VoIP issues in the network or systems. Wireshark is a very helpful tool when it comes to analyzing VoIP SIP calls. Therefore, it is good to be able to monitor the VoIP traffic and capture the SIP packets. Wireshark provides the possibility to detect the VoIP calls in a trace and analyze them accordingly.
In order to detect the VoIP calls in a Wireshark trace, you need to select from the top menu: Telephony –> VoIP Calls. Then a new window appears which lists all the calls that were found in the trace, and information regarding the start/end time, From/To headers, etc. Select a call from the list, and press “Flow“. Then you can see the call flow in a graphical environment. If you are interested to check the content of a particular message, you can click on that message in the flow and you will be able to see it in the main window.
If the trace contains RTP packets, then you are able to see the RTP streams in the flow as well. In order to check the RTP packets, you can select them in the flow and view them in the main window. Moreover, if you would like to listen the RTP streams (for example to check the voice quality), you can click on the “Player” button in the window with the VoIP calls and the decoded RTP streams will appear. Further information about analyzing RTP streams in Wireshark will be provided  in a different post.
Note: You can identify specific calls in Wireshark based on their Call-ID header which is unique for each call. This is helpful especially when a trace contains a lot of calls, and you need to keep track of them.
You have always the possibility to filter on different values in Wireshark in order to identify the packets/calls of your interest. In order to do that, click on the button “Filter:“, and then in the new window, click on “Expression…“. There you can find and expand the SIP protocol and choose the value/header to filter on.
This post described some basics of using Wireshark for analyzing VoIP calls. Later posts will focus deeper into analyzing and troubleshooting SIP and RTP packets.

Vendors would have you believe you need to spend thousands of dollars on protocol analyzer software to troubleshoot Voice over IP (VoIP) networks. After all, voice quality suffers if packets are late, missing or out of order. Fortunately, Wireshark is an open-source package that can do everything these tools can and more.
Wireshark, formerly known as Ethereal, is a network protocol analyzer. Its job is to listen to network traffic, display it in a format that makes sense and then help you find problems. VoIP involves a complex set of protocols that Wireshark can decode and relate to each other. For example, the procedure to set up a call involves a different protocol than the voice traffic itself. Wireshark uses the information from the call setup to better understand the voice flow. With this data in your hands, you can isolate the cause of VoIP problems.
To find the cause of VoIP problems, you must be able to follow the flow of calls from start to finish and ensure that the correct events are happening. This requires an understanding of both the underlying protocols and some telephony concepts. Fortunately, Wireshark provides some excellent tools to help interpret the data. The rest of this article focuses on using Wireshark's tools to solve three common VoIP problems.
The first example is of a phone that's not working—every time a number is dialed, the phone idles, and no ringing is heard in the earpiece. Wireshark is used here to look at the traffic between the phone and the PBX.
Launch Wireshark as root, and select CaptureOptions to bring up the Capture Options dialog, as shown in Figure 1. Ensure that the correct interface is being used; otherwise, you won't see the traffic. In this example, I am capturing traffic on the PBX's only interface, eth0.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f1.inline.jpg
Figure 1. Wireshark Capture Options Dialog Configured for VoIP Analysis
Before you capture, you have the option of specifying a capture filter to limit the number of packets you have to deal with. Because both the signaling traffic (SIP) and voice traffic (RTP) are UDP-based, I specify udp as a capture filter. As a convenience, I also check the Update list of packets in real time option so I can verify I'm getting the proper traffic. Click the Start button to start capturing, and soon the upper pane will show several packets (Figure 2).
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f2.inline.jpg
Figure 2. Wireshark Capture of an IP Phone's Traffic
Figure 2 shows the packet list. The first four columns show the packet number, time and the addresses of the hosts involved. Because Wireshark understands SIP, it is able to identify the packets as SIP in the Protocol column and give detailed information in the Info column.
A cursory examination of the information in Figure 2 shows that the same three packets repeat themselves four times. First, the ipphone host sends a SIP INVITE message to pbxhost, which is used to open a conversation. Next, the PBX responds with “407 Proxy Authentication Required” to which the phone responds with an ACK, and then tries again in packet 6. Because this process repeats itself continuously, it's a good guess that the phone is trying to authenticate with the wrong user name and password.
Monitoring: It's All About Location
If some of the packets you are looking for don't reach the network interface of the machine running Wireshark, you've got a problem. Switched networks, by design, try to limit the traffic only to the ports involved with the communication in order to improve performance. Spanning trees and IP routing only complicate the matter of trying to find out where to put your probe.
Your best bet is to capture traffic directly on one of the machines involved. If you can't get Wireshark on that machine (it runs on Windows too), you always can capture traffic locally with tcpdump and ship the trace to another machine for analysis. If that isn't possible, or if the traffic spans multiple machines, it's time to get help from your network hardware.
Most managed switches offer a feature that will mirror all the frames from one interface to another. Thus, you could plug your Wireshark station in to a switch port with your Internet connection, mirror all the traffic from the Internet router to your workstation, and you'd be able to analyze all Internet traffic. Cisco calls this Switchport Analyzer or SPAN, HP calls it a Monitor Port, and 3Com calls it a Roving Analysis Port.
The choice of which port to mirror depends on your network, but generally, ports that connect to other switches or routers are a good bet. This is also where capture filters help reduce the background noise you have to deal with.
The first example made use of Wireshark's high-level interpretation of individual packets to help highlight a simple problem. The next example requires a more holistic view of a VoIP call. Here, Wireshark is used to view the SIP call setup and then identify the critical packets. In this example, a phone is experiencing one-way audio. Outbound phone calls ring and are picked up, but only one party hears audio. The conversation was captured in the same manner as in the previous example, resulting in 140 frames. This is too many to look at without some help, so some of Wireshark's advanced analysis tools are used.
With the capture loaded, select StatisticsVoIP calls. Wireshark analyzes the capture for any VoIP-related packets and provides a summary on the screen. Click on the call (if there are multiple calls shown, hold down the Ctrl key to select additional calls), and click on the Graph button. You will see a summary of the SIP call, as shown in Figure 3.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f3.inline.jpg
Figure 3. Wireshark's Graphical Interpretation of a SIP Conversation
The Graph Analysis dialog shows the SIP messages sent by the various parties. The first message in Figure 3 is an INVITE message, which is the first step in setting up a call. In this case, the IP phone is asking the PBX to place a call to a SIP address of sip:613@fwd.pulver.com, which is an echo service provided by Free World Dialup for testing SIP calls. The response to this is a request for authentication, which is acknowledged. The phone tries again and is given a status of “Trying” by the PBX. The PBX then proceeds to INVITE the remote endpoint by contacting fwd.pulver.com. Several more messages are exchanged before the call is set up properly.
The problem at hand is one-way audio. Indeed, the Graph Analysis window shows that the IP phone sent Real-Time Protocol (RTP) voice data to a SIP endpoint on the Internet, but it does not show a stream in the reverse direction. To determine why no audio was sent back, it is necessary to read deeper into the SIP packets.
One of SIP's jobs is to set up the RTP stream between two endpoints. It does this through the Session Description Protocol (SDP), which carries the information about codecs, IP addresses and port numbers that is necessary for VoIP to work. Note that the endpoints speaking SDP need not be the ones talking to each other! In Figure 3, pbxhost and fwd.pulver.com are negotiating with each other over the Internet, but they each specify a different endpoint to terminate the voice call. The RTP streams are unidirectional, so a full duplex conversation requires a separate RTP stream to be set up in each direction, using two separate SDP messages.
With that in mind, it seems prudent to find out the SDP information sent from pbxhost to fwd.pulver.com. This SDP message will contain the IP address, UDP port and codec that the remote end is to use to talk to the local IP phone. SDP messages are tagged with SDP in the Graph Analysis window to help you spot them. This packet is at offset 0.490 in Figure 3. When you click on the message in the Graph Analysis window, the corresponding packet is highlighted in the packet list pane of the main Wireshark window.
Wireshark also decodes the protocols contained within the currently selected packet in the packet detail pane, in addition to providing a summary of the packets in the packet list pane. Decodes are separated for each layer, such as Ethernet, IP, UDP and SIP. You can dig into the fields by clicking the arrows on the left side of the window. Figure 4 shows the SDP message from above with some of the relevant fields expanded.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f4.inline.jpg
Figure 4. Wireshark's Display of a SIP Packet Containing an SDP Message
Each line in an SDP message describes a particular attribute of the session to be created and follows a simple attribute=value format, where the attribute is a single letter and the value is a text string. Wireshark uses its understanding of the protocol to add some extra text, such as descriptions of the attributes. For VoIP sessions, the important attributes are:
  • a—attributes, such as codecs and silence suppression.
  • c—connection information, including the IP address expecting the RTP stream.
  • m—media description, including the port number on which the RTP endpoint will be listening.
From the Connection Information (c) line, you can determine that the host being sent is 192.168.1.151. The m attribute specifies that the RTP stream is expected to be on port 34008. Looking through the attributes, the PCM u-law and a-law codecs are offered, along with the compressed GSM format and touch tones via the telephone-event media format. The problem with this offering is the IP address 192.168.1.151 is a private address, unreachable from the Internet. When the remote host tries to send packets to 192.168.1.151, the packets are lost because no such route exists on the Internet.
For the sake of completeness, the SDP information for the other side of the conversation is at time 7.554. After the offer is acknowledged, the two phones begin sending voice packets to the address and port specified in the earlier SDP messages. Because pbxhost offered an invalid address to fwd.pulver.com, the other half of the voice call is never seen.
This illustrates a common problem when using VoIP through a NAT gateway. The problem can be solved several ways depending on one's needs, but that's another article!
So far, the troubleshooting has focused on the call signaling aspect of VoIP. Once the endpoints begin sending RTP streams, the effects of network quality can be heard in the audio.
Network conditions that affect VoIP are latency, jitter and packet loss. Latency is the time it takes for a packet to travel from point A to point B. Jitter is the variation in latency over a series of packets. Loss is the number of packets sent from point A that never make it to point B. Because VoIP audio is sent over real-time UDP, a packet that arrives out of order has to be discarded if the packet ahead of it already has been played.
For the final example, I made a call and captured the conversation including the call setup. Wireshark uses the SIP information to get more details about the RTP packet stream, which enables the RTP analysis tools to be used.
Select StatisticsRTPShow All Streams. Wireshark uses the decoded packets to provide a list of all the audio conversations and some basic statistics, as shown in Figure 5.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f5.inline.jpg
Figure 5. Wireshark's List of Found RTP Streams
Figure 5 shows four streams, because each audio stream comprises two streams in opposite directions, and the PBX was bridging the connection between the two phones. The first indication of a problem is the final field, which shows an X when there are VoIP-related problems. The Max Delta (latency) was 84ms for that stream, which is good. The Max Jitter was also good at 11ms (150ms of one-way latency and 20ms of jitter are the limits of what's considered acceptable). However, nearly 20% packet loss was encountered, which is extraordinary!
Select the problematic stream for further analysis by clicking on it. After that, click on the Find Reverse button to select the other half of the conversation. Finally, click on Analyze to provide a packet-by-packet look at the stream. Lost packets will show up as having the wrong sequence number. This screen, shown in Figure 6, also displays helpful statistics, such as the current bandwidth, latency and jitter.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/093/9398/9398f6.inline.jpg
Figure 6. RTP Stream Analysis of a Stream with Excessive Packet Drops
Figure 6 clearly shows some network problems, because the packet loss is not normal. (The steady bandwidth and regularly spaced packet drops are strong indicators of priority queue overruns or policing in network equipment.) This loss will not be seen in the reverse direction, because the capture was taken close to the source of the conversation. You can save the contents of the window with the Save CSV button.
Network problems are more difficult to solve, because they require interaction with network equipment and potentially with other parties. If your VoIP environment doesn't use the Internet, you can capture packets at various points on the network to find where the poor conditions are introduced. If the call flows over the Internet, you must investigate your connection to the Internet and possibly work with your carrier.
The RTP analysis can be used with proprietary systems with some extra configuration. If Wireshark doesn't understand the signaling, it won't be able to decode all of the RTP information. Select EditPreferencesProtocolsRTP, and check the Try to decode RTP outside of conversations box. Analysis will be a bit slower, but Wireshark will try to determine whether every UDP packet belongs to an RTP stream, allowing you to use the RTP tools to look at the call.
The final feature we examine here is the ability to listen to the contents of the voice call. With the Stream Analysis window still open, select the Save Payload button, select the .au file format, and provide a filename. After pressing the OK button, the voice call itself is saved to your hard drive. The resulting file can be played in XMMS, soxplay or some other audio program.
Troubleshooting VoIP is unlike most other network troubleshooting, because of the separate signaling and voice protocols, and the real-time nature of voice traffic. These three examples have shown the important features of Wireshark that deal specifically with analyzing the SIP call setup and RTP voice streams. These features rival those of commercial products and enable to you find the root of VoIP problems.


No comments:

Post a Comment