S1 Signaling Success Rate issue in a few cells only

Hello Experts.

I have S1 Signaling Success Rate issue.

But it happened just on a few cell in a site, not all cell. I mean, it doesn’t indicate Transport issue.

But in parallel, we also check the Transport hygiene to make sure and the result is no issue at Transport side.

Any explanation or idea to solving this issue?

Vendor is Huawei.

Can you tell which transmission counters and alarms you have checked?

From that basis you are assuming that S1 links are fine?

Have you checked the IP configuration on the eNodeB and also radio connection?

ping eNB <> MME no packet loss
ping router <> MME no packet loss
ping router <> SGW no packet loss

Already checked. Ping test no issue.

Please check packet loss or frame loss in control plane.

Ping with higher byte size like 1500.

Did you check this counters?

Counter Description
L.Cell.Unavail.Dur.Sys.S1Fail Duration of cell unavailability due to S1 interface faults
L.Cell.PLMN.Unavail.Dur.Sys.S1Fail Duration of cell unavailability due to S1 interface faults for a specific operator
L.NB.Cell.Unavail.Dur.Sys.S1Fail Duration of NB-IoT cell unavailability due to S1 interface faults
L.E-RAB.FailEst.MME.S1AP Number of E-RAB setup failures because of conflicts with S1AP-related procedures
L.IRATHO.E2G.SRVCC.FailOut.S1WaitingTimerOut Number of SRVCC-based outgoing handover execution failures from E-UTRAN to GERAN due to the expiry of the S1 interface response message timer
L.HHO.Prep.FailIn.FlowCtrl Number of times that the target eNodeB sends a handover preparation failure message for an intra-duple-mode handover over the S1 or X2 interface to the source eNodeB because of flow control

In Huawei you can check only the radio part per sector…

There must be some alarms on main S1 links, can you confirm?

Is it interval based or continuous trend?

Yes those are important counters to check the availability :point_up_2:

If it was working before, how come suddenly?

Was there any changes done on network interface nodes?

SW, HW, Config changes?

It is common that S1 fails, but in such case cell should be locked automatically by system because no calls could be made on it.

I’m not sure if it is caused by transport side.

Bcause it happened on a few cells in a site.

Based on my experience, if S1 Sig SR caused by Transport, all cell in a site was suffered.

L.E-RAB.FailEst.MME.S1AP occured, but not inline with S1 Sig SR trend.

OK. Check these ones too if correlated with S1 Sir SR trend:

L.E-RAB.Rel.S1Reset.eNodeB.QCI.1
L.E-RAB.Rel.S1Reset.eNodeB.QCI.2
L.E-RAB.Rel.S1Reset.eNodeB.QCI.3
L.E-RAB.Rel.S1Reset.eNodeB.QCI.4
L.E-RAB.Rel.S1Reset.eNodeB.QCI.5
L.E-RAB.Rel.S1Reset.eNodeB.QCI.6
L.E-RAB.Rel.S1Reset.eNodeB.QCI.7
L.E-RAB.Rel.S1Reset.eNodeB.QCI.8
L.E-RAB.Rel.S1Reset.eNodeB.QCI.9

Also check those two counters:

Counter Description
L.DLSctpCong.Num Number of times of downlink SCTP congestion control
L.UECNTXRel.S1SCTPFault.Num Number of UE context releases because of S1 SCTP link disconnection

I found this counter suffered and inline with S1 Sig SR:

L.UECNTX.Rel.S1Reset.MME

Ok this is normal to be there: Number of abnormal UE context releases initiated by the MME due to S1 RESET

But still doesn’t tell you the cause of S1 failure…

Did you check with Transmission team for their counters too?

Because it is clear you had S1 resets.