Privacy by Design
Privacy by Design (GDPR)
1) What is it about and why
Privacy by Design (PbD) is the principle according to which privacy is embedded in a product from the very beginning: in business requirements, architecture, code, processes and operation. In terms of GDPR, this is manifested in "privacy by design and by default" (minimizing fees, default settings are as private as possible, transparency and accountability).
PbD objectives:- Minimize the collection and processing of personal data (PD).
- Ensure legality, transparency, correctness, limitation of goals and deadlines.
- Reduce risks (technical and legal), simplify audits and prove compliance.
2) GDPR roles, legal frameworks and principles
2. 1 Roles
Controller-Defines the goals/means of processing.
Processor-Processes personal data on behalf of the controller under the DPA contract.
Data Subject: an individual to whom personal data belongs.
DPO (Data Protection Officer): on demand - independent monitoring and consultation.
2. 2 Legal grounds (select and document)
Consent, contract, legitimate interest, legal duty, vital interests, public task. For each - goal, data, retention, revocation (for consent).
2. 3 Processing principles (Art. 5)
Legality, fairness, transparency
Goal limitation
Data minimization
Accuracy
Storage restriction
Integrity and confidentiality
Accountability - the ability to prove compliance.
3) PbD process in SDLC (reference framework)
1. Initiation: formulation of processing goals and legal grounds, assignment of owners of data and DPO point.
2. Mapping of data (Data Mapping): sources → fields → confidential model → where flow → who reads → where are stored → term.
3. Risk Assessment/DPIA: LINDDUN-model of privacy threats, impact assessment, mitigation measures.
4. Architectural solutions: selection of minimization, pseudonymization, encryption, distinction schemes.
5. Requirements for UX/consents/notifications: clear texts, granular consent, default setting.
6. Implementation: private defaults, secure telemetry, secret-free logging/PII.
7. Verification: privacy tests, static analysis, private unit tests, DPIA protocols.
8. Operation: DSAR processes, retention and disposition, incident monitoring, vendor reviews.
9. Regular review: re-DPIA when changing goals/technologies.
4) Engineering PbD patterns
4. 1 Minimization and decomposition
Collect only the required fields; apply progressive profiling.
Separate ID and content: store the link key separately (token/reference).
4. 2 Aliasing and anonymization
Aliasing - Store the real ID separately; the working layer sees the token.
Anonymization: use k-anonymity, l-diversity, t-closure; for analytics - differential privacy (ε -budget).
4. 3 Access control and role separation
PoLP, ABAC/RBAC, segregation of duties, separate contours for admins and analysts.
Those. measures: mTLS, SSO/OIDC, scoped tokens, temporary accounts for access to personal data.
4. 4 Encryption and isolation
In Transit: TLS 1. 3/mTLS; At Rest: AEAD/Envelope + KMS/HSM.
Separate keys for the tenant/dataset; crypto deletion for "right to be forgotten."
4. 5 Retention and removal
Explicit TTL policies per field/goal; auto-purge in pipelines; "two-phase" deletion (logical → physical).
For backups - separate keys and short storage windows for personal snapshots.
4. 6 Private telemetry and logging
Default is no PII; use tokens/hashing with salt.
Masking/tokenization of sensitive fields on the producer.
4. 7 UX Privacy & Consent
Granular consent by category (analytics, marketing, personalization).
"Private defaults": everything is not critical - turned off until it agreed.
Clear option "Withdraw consent" and just-in-time notification when new use.
5) DPIA and LINDDUN (short)
DPIA (Data Protection Impact Assessment): required at high risk (large-scale monitoring, assessment, new technology). It consists of a description of processing, necessity/proportionality, risk assessment, mitigation measures.
LINDDUN угрозы: Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness, Non-compliance. For each threat - countermeasures (minimization, pseudonymization, DP, transparency, consent management, audit).
6) Cross-border transfers
Identify vendor storage/access locations.
Use SCC (standard contractual provisions) and carry out Transfer Impact Assessment.
Technical measures: end-to-end encryption, client cryptography for especially sensitive data, restriction of remote access.
7) Vendors and processors (Vendor Management)
DPA/nested processors, technical and organizational measures, sub-processors - under control.
Regular reviews and audits; right to inspection; data export plan.
8) Data Subject Rights (DSAR)
Access, remediation, deletion, restriction, portability, objection, not to be AADM (profiling/automation) object.
SLA and automation: request tracking, identification proof, response log.
Technical hooks in the product: quick search and export by ID; cascade removal by retention.
9) Automated solutions and profiling (Art. 22)
If decisions with a "significant impact" are made automatically - to ensure the right to human intervention, explainability, transparency of signs.
Log path and model versioning; appeal mechanism.
10) Processing security (Art. 32) and incidents (Art. 33/34)
Risk-oriented measures: encryption, integrity, resilience, recovery plans (RTO/RPO).
PD incidents: → triage detection process → risk assessment → notification of the regulator ≤ 72 hours (where required) and subjects (if high risk).
Separate playbook, DPO/lawyers contact list, notification templates.
11) Privacy and ML/Analytics
Data Governance sets: data-line, licenses/grounds, consents.
Techniques: differential privacy, federated learning, secure aggregation, minimizing features.
Protection against attacks: membership/model inversion - regular leak assessments, ε settings, noise/clip.
Synthetic data - only with verification of the absence of restoration of persons.
12) Architectural diagrams (patterns)
12. 1 "Dual Loop" ID Architecture
Loop A (PDS - Personal Data Store): real identifying data (RID), access - strictly limited, keys/encryption/audit.
Outline B (Operational): business data with tokens; communications through a token broker with limits and audits.
12. 2 Consent Service
A centralized service that stores consent versions and history.
SDK: 'can _ use (category, purpose)' - solves on the fly; everything is logged.
12. 3 Retention policies as code
YAML Configuration - Entity → Field → TTL → Expiration Action (Anonymize/Delete/Coarse).
The scheduler performs jobs, reporting is available to the DPO.
13) Mini recipes
Default minimization pseudocode:
def collect(field, purpose):
if not is_required(field, purpose):
return None # do not collect v = read_input (field)
return truncate(v, policy. max_length(field))
Retention policy (YAML example):
yaml dataset: users fields:
email: { ttl: P18M, on_expire: pseudonymize }
phone: { ttl: P12M, on_expire: delete }
session_logs: { ttl: P3M, on_expire: aggregate }
consent: { ttl: P7Y, on_expire: archive }
Granular consents (semantics):
analytics:
default: deny legal_basis: consent scope: anonymous_metrics marketing:
default: deny legal_basis: consent scope: email,push
DSAR export (skeleton):
GET /privacy/export? subject_id=... -> zip:
- profile. json (metadata, legal basis)
- activity. ndjson (events, aggregates)
- consents. json (consent history)
- processors. json (list of processors and transfers)
14) Documentation and Accountability
ROPA (Records of Processing Activities) - register of operations: purpose, legal basis, categories of data/subjects, transfers, retention periods, measures.
Policies: privacy, cookies, information notifications in the product (in plain language).
Staff training and annual reviews.
15) Frequent errors
Collection "just in case" and storage "forever."
Consent as sole ground, although contract/legitimate interest is appropriate.
"Empty" cookie banners without real switches.
No data mapping and not ready for DSAR.
Logs with PII, unprotected backups, mixing REED and operational data.
There is no control of suppliers and cross-border transfers.
16) Checklists
Before launching the feature/product:- The purpose of processing and the legal basis are determined; updated by ROPA.
- Data mapping and DPIA performed (if required).
- Implemented minimization, aliasing, encryption (en route/at rest).
- Consents are granular, with clear UX; defaults are private.
- You have set up retention policies as code; deletion/anonymization checked.
- Logs/Telemetry - no PII; masking is enabled.
- DSAR hooks and exports prepared.
- Team training and DPO approval completed.
- Quarterly review of retentions and legal grounds.
- Periodic processor/sub-processor audits.
- Monitoring incidents and readiness for notification ≤ 72 hours.
- Revision of DPIA with process/technology changes.
- Storage of compliance artifacts (DPIA, ROPA, test reports).
17) FAQ
Q: Is it possible to completely "run away" from consent?
A: Sometimes yes (contract/legal duty/legal interest), but only when strictly necessary and with an assessment of the balance of interests. Marketing and non-critical analytics - most often require consent.
Q: Is aliasing enough?
A: No, it's still personal data. To exit the GDPR sphere, you need reliable anonymization (checked for the impossibility of re-identification).
Q: What about ML and personalisation?
A: Minimize features, use DP/federated approaches, log decisions, ensure the right to human intervention and non-profiling.
Q: What to do when business and privacy conflict?
A: Redesign the collection (progressive profiling), switch to aggregates/synthetics, reassess the legal basis, offer an option without tracking.
- "Secret Management"
- "At Rest Encryption"
- "In Transit Encryption"
- "Audit and immutable logs"
- "Sign and Verify Requests"
- "Key Management and Rotation"