If the previous chapter taught you how a number reaches your service domain, this chapter teaches you how a call actually becomes a live session.
You do not need to become a standards lawyer.
You do need to understand:
- how a voice session is set up
- how media flows
- where policy gets enforced
- where things break in the real world
The simplest useful call model
Think of a modern business call in four stages:
- a public number is dialed
- routing logic decides which provider/service domain should receive it
- signaling establishes a call session
- media flows between endpoints
Stages 1 and 2 are about numbering and routing truth.
Stages 3 and 4 are what this chapter is about.
Routing gets the session to SpeakOps. Only then does your application logic start to matter.
Signaling versus media
This distinction is non-negotiable.
Signaling
Signaling is the control conversation about the call.
Examples:
- who is calling whom
- is the destination ringing
- was the call answered
- should the call be redirected
- what codecs will be used
Media
Media is the actual audio stream once the call is live.
Examples:
- RTP audio packets
- secure RTP variants
- transcoded or recorded streams
A call can have successful signaling and broken media.
That single sentence explains a huge amount of telecom pain:
- call connects but no audio
- one-way audio
- delayed media
- recording failures
A call can be set up correctly in signaling while the audio path still breaks, which is why support needs both traces.
Why SIP matters so much
For most modern programmable voice systems, SIP is the application-control language of call setup.
Do not think of SIP as "just another protocol to memorize."
Think of it as the grammar by which voice systems negotiate:
- session creation
- responses
- transfer and redirection
- endpoint capabilities
- identity and routing hints
If SpeakOps touches modern telephony, SIP is one of the highest ROI technical literacies you can acquire.
The minimum SIP flow you must understand
At a simplified level:
INVITEsays "I want to create a call session."- provisional responses say "I got it" or "it is ringing."
- a success response says "the call is accepted."
- acknowledgment confirms the setup.
- the session eventually terminates.
In rough form:
Caller side Callee side
| ---- INVITE ------> |
| <--- 100 Trying --- |
| <--- 180 Ringing -- |
| <---- 200 OK ------ |
| ----- ACK --------> |
| <==== RTP audio ==> |
| ----- BYE --------> |
| <---- 200 OK ------ |
That is not the full universe, but it is the minimum useful skeleton.
The response codes that matter most for product people
You do not need every code. You do need a practical instinct for the big ones.
100 Trying
The request has been received and processing has started.
180 Ringing
The destination is being alerted.
200 OK
The request succeeded.
3xx
Redirection or alternate path logic is involved.
4xx
Client-side or request-specific failure.
Useful intuition:
486 Busy Heremeans the endpoint is busy403 Forbiddenoften means policy or authorization rejection
5xx
Server-side or infrastructure failure.
6xx
Global rejection.
For example:
603 Declinemeans the call was explicitly rejected
Why does this matter for founders?
Because product behavior, support tooling, and analytics depend on classifying call failures correctly.
SIP is not the whole story
Even if your product lives at the SIP layer, the real ecosystem may involve:
- PSTN gateways
- SBCs
- topology hiding
- policy engines
- legacy interop
- trust and authentication layers
So when a call "looks simple" in your app, that often masks a chain of separate session boundaries behind the scenes.
SBCs: the real edge of telecom control
If you remember one infrastructure term from this chapter, remember SBC.
A session border controller is where voice networks often enforce:
- security boundaries
- normalization rules
- codec policy
- rate limits
- topology hiding
- interconnect rules
Why is this so important?
Because founders often imagine a call as a direct logical pipe from caller to callee.
In reality, the call often crosses administrative trust boundaries, and each boundary may terminate and recreate the session in a controlled way.
That means:
- headers may be rewritten
- identity assertions may change
- codec offers may be constrained
- routing decisions may get overridden
This is one reason "arbitrary logic" has real limits.
B2BUA thinking
Many voice systems behave as back-to-back user agents rather than transparent forwarders.
The practical intuition is:
- the system accepts one leg of the call
- it creates another leg toward the next destination
- it can apply policy between those legs
From a startup point of view, this is excellent news. It means products like SpeakOps can build useful control surfaces:
- screening
- branching
- forwarding
- fallback
- recording
- analytics
But it also means you should think in terms of call legs, not just "the call."
Media reality: why calls can connect but still fail
Once signaling succeeds, audio still has to travel.
This is where RTP matters.
You do not need to master packet captures on day one. You do need the conceptual model:
- signaling negotiates the session
- media transports the conversation
- the media path may be separate from the signaling path
Common failure patterns:
- one-way audio
- both-way silence
- clipping or jitter
- bad transcoding
- recording path failures
The startup lesson is simple:
A "connected call" metric is not enough.
You also need media-quality observability.
Why old PSTN concepts still matter
You might wonder:
If modern voice is SIP and RTP, why should I care about SS7 or ISUP at all?
Because the telecom world is still a federation of old and new infrastructure.
A call may originate or terminate through systems whose operational assumptions come from PSTN-era signaling and numbering design. Even if you never touch raw SS7 messages, legacy behavior still shapes:
- portability
- interconnect
- call completion patterns
- supplementary services
- trust boundaries
So the correct stance is:
- go deep on SIP
- understand SS7 conceptually
- do not spend months memorizing legacy message internals unless your business model truly needs it
One complete inbound business-call walkthrough
Let us run the whole thing once.
Step 1: a caller dials the business number
The number is a public address. It is not yet the actual service target.
Step 2: originating-side routing determines current destination domain
Portability-aware routing logic resolves the current serving environment.
Step 3: the call enters the destination provider or application edge
This may be through:
- SIP interconnect
- gateway conversion
- SBC policy edge
Step 4: your service logic decides what should happen
Examples:
- ring a user
- branch to an IVR
- forward to another PSTN number
- route to a SIP endpoint
Step 5: one or more new call legs may be created
Your app may accept the inbound leg and create an outbound leg toward the real destination.
Step 6: signaling completes and media begins
Only now is there an actual conversation.
Step 7: events, recordings, analytics, and policy continue around the session
In other words:
a business call is not one thing. It is an orchestration of addressing, routing, session setup, policy, and media.
What founders should build into observability
If SpeakOps becomes a serious voice product, you will want:
- signaling event timelines
- final disposition codes
- per-leg tracing
- rule-evaluation logs
- media health indicators
- upstream-provider attribution
Without that, your support team will spend its life saying:
"The call should have worked. We are checking with our carrier."
What you should be able to explain after this chapter
By the end of this chapter, you should be able to answer:
- What is the difference between signaling and media?
- Why can a call be "answered" but still fail as a conversation?
- Why do SBCs and call-leg boundaries matter?
- Where would SpeakOps' routing logic actually sit in the end-to-end path?
The next chapter turns that understanding into product control:
Given that calls arrive as sessions, what arbitrary routing and behavior can SpeakOps truly program?