GH GambleHub

Distributed Tracing: OpenTelemetry

Distributed Tracing: OpenTelemetry

1) Näme üçin OTel we näme berýär

OpenTelemetry (OTel) - açyk standart we OTLP protokoly bolan telemetriýa (treýsler, metrikler, loglar) üçin SDK/agentler/kollektorlar toplumy. Maksatlar:
  • Gözleg ýollarynyň üsti bilen görünmegi (gateway → hyzmatlar → DB/kesiş/nobatlar).
  • Çalt RCA/zaýalanmalary we çykaryşlary düzetmek (kanareýkalar/gök-ýaşyl).
  • SLO we awto-yza gaýdyp gelmek bilen baglanyşyk (maglumatlar boýunça operasiýa çözgütleri).
  • Wendor-agnostika: bir APM-e bagly bolmazdan, islendik arka tarapa eksport.

Esasy ýörelgeler: standardize, sample smart, secure by default, correlate everything.

2) Esaslar: kontekstler, spanlar, atributlar

Trace - agaç/jaň grafasy; Span - amal (RPC, SQL, nobat çagyryşy).
Span Kind: `SERVER`, `CLIENT`, `PRODUCER`, `CONSUMER`, `INTERNAL`.
W3C Trace Context: 'traceparent', 'tracestate' sözbaşylary; kontekst hyzmatara geçirilýär.
Attributes - açar bahasy (pes kardinallyk!), Events - wagt bellikleri, Status - kod/ýalňyşlygyň beýany.
Links - berk iýerarhiýanyň daşyndaky span baglanyşygy (async/fan-out/fan-in üçin möhümdir).

Span adyny dakmak:
  • HTTP: 'HTTP {METHOD}' ('GET/withdraw' atribut hökmünde)
  • DB: `DB SELECT` / `DB INSERT`
  • Queue: `QUEUE publish topic=X` / `QUEUE consume topic=X`

3) Semantiki konwensiýalar (semconv)

Durnukly atribut shemalaryny ulanyň:
  • HTTP/GRPC: `http. method`, `http. route`, `http. status_code`, `url. full`.
  • DB: `db. system=postgresql`, `db. statement '(diňe howpsuz gysmak!),' db. name`.
  • Messaging: `messaging. system=kafka`, `messaging. operation=receive`, `messaging. destination`.
  • Cloud/K8s/Host: `cloud. region`, `k8s. pod. name`, `container. id`.
  • Resource attributes (hökmany): 'service. name`, `service. version`, `deployment. environment`.

Shemanyň durnuklylygyny SDK/Collector çeşmelerinde 'schemaUrl' arkaly görkeziň.

4) Sampling: head, tail, adaptive

Head-based (SDK-da): öňünden, arzan karar berýär; ýokary-QPS üçin gowy, ýöne "gyzykly" ýollary sypdyryp biler.
Tail-based (Collector-da): ýol tamamlanandan soň karar berýär; statusy, gizlinligi, atributlary boýunça düzgünlere mümkinçilik berýär.
Adaptive/Dinamik: p95 ýalňyşlyk/ösüş ýüze çykan halatynda samplyň paýyny ýokarlandyrýar.

Önümçilik derejesiniň resepti: Baş 1-5% Global + Tail "möhüm" saýlama: 'status = ERROR', 'latency> p95', "pul ýollary", PSP/KYC ýalňyşlyklary.

5) Baglanyşyk: metrikler, loglar, söwdalar

Exemplars: 'trace _ id' bellikleri metrik gistogramlarda (trasa çalt bökmek).
Logs: 'trace _ id '/' span _ id' goşuň we loglardan trasa geçiň.
SpanMetrics (processor): SLO/alertler üçin RED-metrika ('requests, errors, duration') ýollaryndan birleşdirýär.

6) Ýerleşdiriş arhitekturasy

Agent (DaemonSet) her düwünde programmalardan (OTLP) we forwarditlerden ýygnaýar.
Gateway (Cluster/Region) - marşrutlaşdyryş/sampling/baýlaşdyryş paýlaýjylary bolan merkezi Collector.
OTLP: gRPC `4317`, HTTP `4318`; TLS/mTLS-i açyň.

"Agent + gateway" plýuslary: izolýasiýa, buferizasiýa, ýerli backpressure, ýönekeýleşdirilen tor.

7) OpenTelemetry Collector - esasy şablon (gateway)

yaml receivers:
otlp:
protocols:
grpc: { endpoint: 0. 0. 0. 0:4317 }
http: { endpoint: 0. 0. 0. 0:4318 }

processors:
memory_limiter: { check_interval: 5s, limit_percentage: 75 }
batch: { timeout: 2s, send_batch_size: 8192 }
attributes:
actions:
- key: deployment. environment action: upsert value: prod resource:
attributes:
- key: service. namespace action: upsert value: core tail_sampling:
decision_wait: 5s policies:
- name: errors type: status_code status_code: { status_codes: [ERROR] }
- name: slow_traces type: latency latency: { threshold_ms: 800 }
- name: important_routes type: string_attribute string_attribute:
key: http. route values: ["/withdraw", "/deposit"]
- name: baseline_prob type: probabilistic probabilistic: { sampling_percentage: 5 }

exporters:
otlp/apm:
endpoint: apm-backend:4317 tls: { insecure: true }
prometheus:
endpoint: 0. 0. 0. 0:9464

extensions:
health_check: {}
pprof: { endpoint: 0. 0. 0. 0:1777 }
zpages: { endpoint: 0. 0. 0. 0:55679 }

service:
extensions: [health_check, pprof, zpages]
pipelines:
traces:  { receivers: [otlp], processors: [memory_limiter,attributes,resource,batch,tail_sampling], exporters: [otlp/apm] }
metrics: { receivers: [otlp], processors: [batch], exporters: [prometheus] }
logs:   { receivers: [otlp], processors: [batch], exporters: [] }

8) SLO üçin SpanMetrics we RED

Prosessor goşuň:
yaml processors:
spanmetrics:
metrics_exporter: prometheus histogram:
explicit:
buckets: [50ms,100ms,200ms,400ms,800ms,1600ms,3200ms]
service:
pipelines:
traces: { receivers: [otlp], processors: [spanmetrics,batch,tail_sampling], exporters: [otlp/apm] }
metrics: { receivers: [otlp], processors: [batch], exporters: [prometheus] }

SLO/alertler üçin 'traces _ spanmetrics _ calls {service, route, code}' we 'duration _ bucket' bar.

9) K8s: Collector (DaemonSet + Deployment)

Agent (DaemonSet) bölek:
yaml apiVersion: apps/v1 kind: DaemonSet metadata: { name: otel-agent, namespace: observability }
spec:
template:
spec:
containers:
- name: otelcol image: otel/opentelemetry-collector:latest args: ["--config=/conf/agent. yaml"]
ports:
- { containerPort: 4317, name: otlp-grpc }
- { containerPort: 4318, name: otlp-http }

Gateway (Deployment) - birnäçe söz, Service ClusterIP/Ingress, HPA CPU/QPS boýunça.

10) Howpsuzlyk we gizlinlik

TLS/mTLS между SDK → Agent → Gateway → Backend.
Gateway girelgesinde tassyklamak (Basic/OAuth/Headers); gelip çykyşyny çäklendiriň.
PII redaksiýasy: Häsiýetleri süzüň/gizläň ('user. email ',' card. ') Collector prosessorynda.
Çäklendirmeler: SDK-da wakanyň ululygyny/atributlaryň sanyny çäklendiriň (kardinallykdan goramak).
Arka tarapda RBAC + taslamalaryň/tenantlaryň aýratyn nyşanlary.

Kollektordaky süzgüç mysaly:
yaml processors:
attributes/redact:
actions:
- key: user. email action: delete
- key: payment. card action: delete

11) Instrumentirlemek: çalt başlamak

Node. js

js import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-grpc";
import { Resource } from "@opentelemetry/resources";
import { SemanticResourceAttributes as R } from "@opentelemetry/semantic-conventions";

const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: "http://otel-agent. observability:4317" }),
resource: new Resource({
[R.SERVICE_NAME]: "payments-api",
[R.SERVICE_VERSION]: "1. 14. 2",
[R.DEPLOYMENT_ENVIRONMENT]: "prod"
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk. start();

Java (Spring)

java
// Gradle: io. opentelemetry. instrumentation:opentelemetry-spring-boot-starter
// application. yml otel:
service:
name: orders-api exporter:
otlp:
endpoint: http://otel-agent. observability:4317 traces:
sampler: parentbased_traceidratio sampler-arg: 0. 05

Python (FastAPI)

python from opentelemetry import trace from opentelemetry. sdk. resources import Resource from opentelemetry. exporter. otlp. proto. grpc. trace_exporter import OTLPSpanExporter from opentelemetry. sdk. trace import TracerProvider from opentelemetry. sdk. trace. export import BatchSpanProcessor

provider = TracerProvider(resource=Resource. create({"service. name":"fraud-scoring","deployment. environment":"prod"}))
provider. add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-agent. observability:4317", insecure=True)))
trace. set_tracer_provider(provider)

Go

go exp, _:= otlptracegrpc. New(ctx, otlptracegrpc. WithEndpoint("otel-agent. observability:4317"), otlptracegrpc. WithInsecure())
res:= resource. NewWithAttributes(semconv. SchemaURL, semconv. ServiceNameKey. String("gateway"), semconv. DeploymentEnvironmentKey. String("prod"))
tp:= sdktrace. NewTracerProvider(sdktrace. WithBatcher(exp), sdktrace. WithResource(res), sdktrace. WithSampler(sdktrace. ParentBased(sdktrace. TraceIDRatioBased(0. 05))))
otel. SetTracerProvider(tp)

12) Asinhronizm: nobatlar, tekerler, cron

PRODUCER/CONSUMER 'links' arkaly baglanyşygy bolan (habarlaryň öz ömri bar).
Konteksti habaryň sözbaşylaryna ýaýratyň ('traceparent '/' baggage').
Batch-consume-de habara span boýunça dörediň ýa-da 'messaging' atributy bilen birleşdiriň. batch. size`.
cron/joblar üçin: başlangyç wakalara + linkleri başlamak üçin täze trace (bar bolsa).

13) Baggage we nyşana almak

Iň az durnukly açarlary ('tenant _ id', 'region', 'vip _ tier') baggage-de saklaň; PII gadagan ediň.
Metrleri segmentlere bölmek üçin gateway/gateway-logger arkaly zyňyň.

14) Relizler we SLO-geýting bilen integrasiýa

Kanar ädimleri → marşrutlar/ýuz segmentleri boýunça 'traces _ spanmetrics _' -ni barlaň.
Pese gaçanda (5xx/p95) - awto-stop we yza gaýdyp (Argo Rollouts AnalysisTemplate + PromQL).
Metrleriň nusgalary göni goýberiş aralygynyň "erbet" ýollaryna eltýär.

15) Çäkler we öndürijilik

Ограничивайте: `OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT`, `OTEL_SPAN_EVENT_COUNT_LIMIT`, `OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT`.
/ stacktrace kadadan çykmalaryny ähtimallyga/ýygylyga görä sample ediň.
SDK we Collector-da Batch prosessor; partlamalarda ýollaryňyzy ýitirmezlik üçin nobatlaryňyzy saklaň.

16) Gabat gelmek we migrasiýa

Propagatorlar: W3C ulanyň; Göçeniňizde B3/X-Ray okamaga goldaw beriň (dual-propagation).
Eksport: OTLP → APM (Jaeger/Tempo/Elastic/X-Ray we ş.m.).
Semconv-iň durnukly wersiýalary - 'schemaUrl' belläň we täzelenmeleri meýilleşdiriň.

17) Anti-patternler

Atributlaryň ýokary kardinallygy ('user _ id' in label, dinamiki açarlar).
'trace _ id' → däl loglar baglanyşyk ýok.
Internet-APM programmalaryndan gönüden-göni eksport etmek (gateway bolmazdan, TLS/mTLS bolmazdan).
Önümde "bary-ýogy" 100% ýygnamak gymmat we manysyz.
SQL soraglarynyň dampalary. statement`.
Hyzmatyň utgaşdyrylmadyk ady/wersiýasy - metrikler "dargaýar".

18) Giriş çek-sanawy (0-45 gün)

0-10 gün

2-3 möhüm hyzmatlarda SDK/awtoulag gurallaryny açyň.
TLS bilen Agent (DaemonSet) + Gateway (Deployment), OTLP 4317/4318 guruň.
'service' goşuň. name`, `service. version`, `deployment. environment 'hemme ýerde.

11-25 gün

Tail-sampling ýalňyşlyklar/gizlinlik/" pul" ugurlary boýunça.
SpanMetrics → Prometheus, Exemplars we RED/SLO dashbordlaryny öz içine alýar.
W3C-i API şlýuzy/NGINX/mesh arkaly wagyz etmek; loglary baglanyşdyrmak.

26-45 gün

Nobatlary ýapmak/DB/kesmek; async üçin links.
Collector-da PII redaksiýa syýasaty; SDK-daky atributlaryň çäkleri.
Relizleriň SLO geýtingini we awto-yza gaýdyp gelmegini birleşdirmek.

19) Kämillik ölçegleri

Gelýän haýyşlary yzarlamak bilen ýapmak ≥ 95% (sampling head/tail bilen birlikde).
Exemplars bilen metrikleriň paýy ≥ 80%.
"Metrikden trasa" RCA wagty ≤ 2 min (p50).
Atributlarda/hadysalarda PII syzmagy 0 (skaner).
Ähli hyzmatlarda 'service' bar. name/version/environment 'we ylalaşylan semantika.

20) Goşundylar: peýdaly bölekler

NGINX propagandasy:
nginx proxy_set_header traceparent $http_traceparent;
proxy_set_header tracestate $http_tracestate;
proxy_set_header baggage   $http_baggage;
Prometheus с Exemplars (Grafana):

histogram_quantile(0. 95, sum(rate(traces_spanmetrics_duration_bucket{route="/withdraw"}[5m])) by (le))

Policy: PII atributlary gadagan etmek (psevdo-linter)

yaml forbid_attributes:
- user. email
- payment. card
- personal.

21) Netijenama

OpenTelemetry syn edilişi standartlaşdyrylan, dolandyrylýan kontura öwürýär: bitewi semantika, howpsuz propaganda, akylly sampling we metrikler we loglar bilen güýçli baglanyşyk. Agent gateway guruň, tail-sampling, spanmetrics we Exemplars goşuň, PII we kardinallyga gözegçilik ediň - bu ýol diňe bir hata düzetmek üçin däl, eýsem SRE/Release awtomatlaşdyrylan çözgütler üçin hem gural bolar, MTTR we her goýberilende töwekgelçilikleri azaldar.

Contact

Biziň bilen habarlaşyň

Islendik sorag ýa-da goldaw boýunça bize ýazyp bilersiňiz.Biz hemişe kömek etmäge taýýar.

Telegram
@Gamble_GC
Integrasiýany başlamak

Email — hökmany. Telegram ýa-da WhatsApp — islege görä.

Adyňyz obýýektiw däl / islege görä
Email obýýektiw däl / islege görä
Tema obýýektiw däl / islege görä
Habar obýýektiw däl / islege görä
Telegram obýýektiw däl / islege görä
@
Eger Telegram görkezen bolsaňyz — Email-den daşary şol ýerden hem jogap bereris.
WhatsApp obýýektiw däl / islege görä
Format: ýurduň kody we belgi (meselem, +993XXXXXXXX).

Düwmäni basmak bilen siz maglumatlaryňyzyň işlenmegine razylyk berýärsiňiz.