gRPC: binary protocols and performance

TL; DR

gRPC = HTTP/2 + Protobuf + strict contracts + streaming. It gives low latency, efficient traffic and stable contracts between services. Ideal for internal north-south/east-west calls, realtime channels (server/client/bidi streaming), as well as a mobile front via gRPC-Web. Success is provided by: small proto-contracts, deadlines and cancellations, exponential retrays with idempotency, connection pooling, Envoy at the edge, mTLS, key encryption and full observability.

1) When to choose gRPC and when not

Suitable for:

Internal APIs between microservices (balance, limits, calculation, anti-fraud).
High-frequency queries with strict SLOs by p95/p99.
Long-lived streams (tables/tournaments, live events, payout statuses).
Mobile clients (via gRPC-Web or BFF).

Leave REST/GraphQL for:

Public integrations, webhooks, payment teams with tough idempotency and CDN caches.
Admin UIs with rich aggregation sampling (GraphQL-BFF over gRPC).

2) Contracts and Evolution (Protobuf)

Principles of the scheme: we only add fields, do not reuse numbers; mandatory - through validation, not 'required'.
Versioning: packages/namespace ('payments. v1`, `payments. v2`); deprecate via'deprecated = true'and migration windows.
Semantics: "thin" messages without arrays of hundreds of KB; large samples - stream or pagination.

Example (simplified):

proto syntax = "proto3";
package payments.v1;

service Payouts {
rpc Create (CreatePayoutRequest) returns (CreatePayoutResponse) {}
rpc GetStatus (GetStatusRequest) returns (GetStatusResponse) {}
rpc StreamStatuses (StreamStatusesRequest) returns (stream StatusEvent) {}
}

message CreatePayoutRequest {
string idempotency_key = 1;
string user_id = 2;
string currency = 3;
int64 amount_minor = 4; // cents
}

message CreatePayoutResponse { string payout_id = 1; }
message GetStatusRequest { string payout_id = 1; }
message GetStatusResponse { string state = 1; string reason = 2; }
message StreamStatusesRequest { repeated string payout_ids = 1; }
message StatusEvent { string payout_id = 1; string state = 2; int64 ts_ms = 3; }

3) Transport and connections

HTTP/2 multiplexes many RPCs into one TCP connection: keep long-lived channels with connection pooling (on a client, 2-4 channels/target upstream is usually enough).
Keepalive: send pings less often than balancer timeouts (for example, every 30 seconds), limit 'max _ pings _ without _ data'.
Flow control/backpressure: HTTP/2 window settings + client/server queue boundaries.

4) Performance: what really affects

Message sizes: target - ≤ 64-128 KB; Enable gzip/brotli for big answers for huge payload - stream.
Protobuf serialization is 5-10 × more compact than JSON; avoid'string 'for numbers and'map <string, string>' where possible.
CPU/allocs: profile codec and resolvers; use "zero-copy" buffers and pre-allocate.
Threading: gRPC servers are sensitive to locks - bring I/O to async, put deadline on external databases.
Nagle/Delayed ACK: usually leave by default; experiment carefully.

5) Deadlines, cancellations, retreats, idempotence

Always set the'deadline' on the client (p95 upstream × 2), throw the context into the services/database.
If canceled on the client, the server must interrupt and free resources.
Retrai: only for idempotent operations (GET analogs, status, stream reading). For changers, use the'idempotency _ key'key and store the result.
Backoff policy is exponential with jitter; the limit of attempts and the "retray buffer" on the client.
gRPC status codes: use 'DEADLINE _ EXCEEDED', 'UNAVAILABLE' (retracted), 'FAILED _ PRECONDITION', 'ALREADY _ EXISTS', 'ABORTED', etc. - slim semantics saves nerves.

6) Streams: server, client, bidi

Server streaming for long responses and feeds (check for memory leaks when the client is slow).
Client streaming - downloads/batches.
Bidirectional - interactive (live-tables, internal-events).
Add sequence/offset in messages for ordering and resume at the application level (gRPC alone does not provide replay after reconnection).

7) Balancing and topology

xDS/Envoy as data-plane: L7-balancing, circuit-breaking, outlier-ejection.
Consistent hash (by 'user _ id '/' table _ id') - keeps hot keys on one upstream, reduces cross-node locks.
Hedging/mirroring: careful; helps for p99 tails but increases load.
Multi-region: local end-points with geo-routing; pin-ning "home region" by session.

Example Envoy (fragment):

yaml load_assignment:
endpoints:
- lb_endpoints:
- endpoint: { address: { socket_address: { address: svc-a-1, port_value: 8080 } } }
- endpoint: { address: { socket_address: { address: svc-a-2, port_value: 8080 } } }
outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s circuit_breakers:
thresholds:
max_connections: 1024 max_requests: 10000

8) Safety

mTLS between all hops (gateway ↔ services); short TTL certificates, automatic rotation (ACME/mesh).
AuthZ: JWT/OIDC on the edge, laying claims to services; ABAC/RBAC at gateway/mesh level.
PII/PCI: filtering fields, disabling logging sensitive data; token encryption in transit/at rest.
gRPC-Web: the same auth principles, but shells through the HTTP/1. 1 (Proxy Envoy).

9) Observability

Metrics: rps, p50/p95/p99 latency per method, error rate by code, active streams, message size, thread/pool saturation.
Tracing: W3C/' traceparent 'in metadata; spans on the client and server propagate context to database/cache.
Logs: correlation by 'trace _ id', sampling, strict disguise.
Helschecks: separate 'Health' service ('grpc. health. v1. Health/Check ') and' Watch 'for stream health.

10) Compression, limits and protection

Enable message compression (per-call), limit 'max _ receive _ message _ length '/' max _ send _ message _ length'.
Rate/Quota at the gateway level; circuit-breaker by error/latency.
Deadline budget: Do not cling to infinitely long deadlines between hops - each link cuts its budget.
Protection against "expensive" requests: limit the size/number of elements in the message, interrupt long streams.

11) Gateways and interoperability

gRPC-Gateway/Transcoding: export part of methods as REST (for partners/admins).
gRPC-Web: front directly to Envoy, which is transcoded.
GraphQL-BFF: resolvers can walk in gRPC; for payment domain mutations, REST with idempotency is preferred.

12) Idempotency in modifying operations

Template:

The client generates'idempotency _ key '.
The server saves the result by key to TTL (for example, 24 hours).
Repeated'Create 'with the same key returns the same' payout _ id '/status.

Pseudo:

go if exists(key) { return storedResult }
res:= doBusiness()
store(key, res)
return res

13) Errors and status mapping

Local domain errors → 'status. WithDetails` (google. rpc. ErrorInfo) with codes:

'INVALID _ ARGUMENT '(validation),' NOT _ FOUND ',' ALREADY _ EXISTS ',
'FAILED _ PRECONDITION ',' ABORTED ',
`UNAUTHENTICATED`/`PERMISSION_DENIED`,
'RESOURCE _ EXHAUSTED '(quotas/limits),
'UNAVAILABLE '(network/upstream),' DEADLINE _ EXCEEDED '.
For the client: only retract 'UNAVAILABLE', 'DEADLINE _ EXCEEDED' and cases marked with idempotent.

14) Testing and UAT

Contract tests by '.proto' (golden files).
Load: p50/p95/p99 latency, throughput, CPU, memory, GC.
Streams: backpressure tests, interrupts, resume.
Networks: loss/jitter emulation; timeouts/hedging tests.
Security: mutators of tokens/serts, rota keys in runtime.

Checklist:

Deadline on each client call.

[The] Retreats are only where idempotent.

Message size limits.
Health/Watch and alerts on p95/p99.
mTLS and rotation.
End-to-end tracing.
Envoy circuit-breaking и outlier-ejection.
gRPC-Web e2e for browser (if needed).

15) Anti-patterns

Giant messages instead of streams.
Endless deadlines and no cancellation.
Retrays of unsafe mutations are duplicates.
Without connection pooling - storm of connections.
Absence of health/watch - "blind" failures.
Laying PII in trails/logs.
Monolithic one endpoint pool for the whole world - without regional proximity.

16) NFT/SLO (landmarks)

Edge→Service additive: ≤ 10-30 ms p95 within the region.
Method latency: p95 ≤ 150-250 ms (business operations), p99 ≤ 500 ms.
Error rate (5xx/`UNAVAILABLE`): ≤ 0. 1% of RPS.
Uptime: ≥ 99. 95% for critical services.
Streams: connection retention ≥ 24 hours, drop-rate <0. 01 %/hour.

17) Mini-Specs and Sample Configurations

Client deadline/retray (pseudo Go):

go ctx, cancel:= context.WithTimeout(ctx, 300time.Millisecond)
defer cancel()
resp, err:= cli.GetStatus(ctx, req, grpc.WaitForReady(true))

Retray policy (Java, YAML profile):

yaml methodConfig:
- name: [{service: payments.v1.Payouts, method: GetStatus}]
retryPolicy:
maxAttempts: 4 initialBackoff: 100ms maxBackoff: 1s backoffMultiplier: 2.0 retryableStatusCodes: [UNAVAILABLE, DEADLINE_EXCEEDED]

gRPC-Gateway (OpenAPI fragment for transcoding):

yaml paths:
/v1/payouts/{id}:
get:
x-grpc-service: payments.v1.Payouts x-grpc-method: GetStatus

Resume Summary

gRPC is a working end-to-end bus for iGaming microservices: compact binary protocols, strict contracts and powerful streaming. So that it brings real benefits, keep contracts small and stable, implement deadlines/cancellations/retrays with idempotency, exploit Envoy/xDS and mTLS, measure p95/p99 and teach the system to live under backpressure. In conjunction with REST webhooks and GraphQL-BFF, you get a fast, economical and secure API layer that scales with the product.

gRPC: binary protocols and performance

Resume Summary

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects