MEASURE
Jak analyzovat, testovat a validovat AI systémy
NIST AI RMF funkce: MEASURE Subkategorie: MS-1 až MS-4
1. Přehled funkce MEASURE
MEASURE funkce zajišťuje, že AI systémy jsou řádně testovány, validovány a monitorovány.
Cíle MEASURE
| Cíl | Popis |
|---|---|
| TEVV | Test, Evaluation, Validation, Verification |
| Metriky | Definovat a sledovat KPIs |
| Bias | Testovat fairness a bias |
| Security | Hodnotit bezpečnost |
| Continuous | Průběžný monitoring |
2. MS-1: TEVV Framework
2.1 Co je TEVV
2.2 TEVV Lifecycle
| Fáze | TEVV Aktivity | Kdy |
|---|---|---|
| Design | Requirements verification | Před vývojem |
| Development | Unit testing, code review | Během vývoje |
| Pre-deployment | Integration, validation, red-teaming | Před nasazením |
| Deployment | A/B testing, staged rollout | Při nasazení |
| Production | Monitoring, drift detection | Průběžně |
| Retirement | Final assessment, lessons learned | Při ukončení |
2.3 Testovací strategie
Pro každý AI systém definujte:
## TESTOVACÍ PLÁN: [AI System Name]
### 1. Test Objectives- Co testujeme?- Jaká jsou akceptační kritéria?
### 2. Test Types| Type | Scope | Tools ||------|-------|-------|| Unit | Model components | pytest, unittest || Integration | End-to-end | Custom scripts || Performance | Speed, throughput | Load testing || Fairness | Bias detection | Aequitas, Fairlearn || Security | Adversarial | Custom red-team || User acceptance | Usability | User studies | Product |
### 3. Test Data- Representative samples- Edge cases- Adversarial inputs
### 4. Schedule| Milestone | Tests | Date ||-----------|-------|------|| Dev complete | Unit, Integration | || Pre-release | All except UAT | || Release | UAT | || Monthly | Performance, Fairness | |
### 5. Pass/Fail Criteria| Metric | Threshold | Current ||--------|-----------|---------|| Accuracy | >95% | || Latency p99 | <500ms | || Fairness (demographic parity) | <5% gap | |3. MS-2: Performance Metrics
3.1 Core metriky
Accuracy & Reliability
| Metrika | Popis | Použití |
|---|---|---|
| Accuracy | Správnost predikcí | Classification |
| Precision | True positives / predicted positives | Když FP je nákladný |
| Recall | True positives / actual positives | Když FN je nákladný |
| F1 Score | Harmonic mean precision & recall | Balanced |
| AUC-ROC | Area under ROC curve | Threshold selection |
| RMSE/MAE | Error metrics | Regression |
| Perplexity | Language model quality | GAI/LLM |
Reliability
| Metrika | Popis | Target |
|---|---|---|
| Consistency | Stejný input → stejný output | >99% |
| Stability | Performance over time | <5% drift |
| Availability | Uptime | >99.9% |
| Latency p50/p99 | Response time | SLA-defined |
3.2 Fairness metriky
| Metrika | Definice | Kdy použít |
|---|---|---|
| Demographic Parity | P(Ŷ=1|G=a) = P(Ŷ=1|G=b) | Equal outcomes |
| Equalized Odds | TPR a FPR stejné across groups | Equal error rates |
| Predictive Parity | PPV stejné across groups | Equal precision |
| Individual Fairness | Similar individuals → similar outcomes | Case-by-case |
Jak měřit:
# Příklad s Fairlearnfrom fairlearn.metrics import demographic_parity_difference
dpd = demographic_parity_difference( y_true, y_pred, sensitive_features=sensitive_feature)
# Target: |dpd| < 0.05 (5%)3.3 GAI-specific metriky
| Metrika | Popis | Měření |
|---|---|---|
| Hallucination rate | Frekvence fakticky nesprávných výstupů | Manual review sample |
| Toxicity score | Škodlivost obsahu | Perspective API |
| Bias in generation | Stereotypy ve výstupech | Winogender, BBQ |
| Instruction following | Dodržování promptů | Benchmark datasets |
| Refusal rate | Odmítnutí nevhodných požadavků | Adversarial prompts |
4. MS-3: Bias Testing
4.1 Pre-deployment bias assessment
Checklist:
- Identifikovány protected attributes (věk, pohlaví, etnicita, …)
- Trénovací data analyzována na bias
- Baseline metriky změřeny per group
- Fairness thresholds definovány
- Mitigation strategie připravena
4.2 Testing metodologie
Slicing Analysis
Testujte performance na podskupinách:
| Slice | Count | Accuracy | Precision | Recall ||-------|-------|----------|-----------|--------|| Overall | 10000 | 94.5% | 93.2% | 95.1% || Gender: M | 5200 | 95.1% | 94.0% | 95.8% || Gender: F | 4800 | 93.8% | 92.3% | 94.3% || Age: <30 | 3000 | 96.2% | 95.5% | 96.8% || Age: 30-50 | 4500 | 94.0% | 93.1% | 94.5% || Age: >50 | 2500 | 92.1% | 90.8% | 93.2% |Alert: Gap > 5% mezi skupinami → vyšetřit
Counterfactual Testing
Změňte protected attribute, sledujte změnu výstupu:
Original: "John applied for a loan..." → ApprovedCounterfactual: "Jane applied for a loan..." → Approved? ✓
Pokud outcomes liší → potenciální bias4.3 Bias mitigation strategie
| Strategie | Fáze | Popis |
|---|---|---|
| Pre-processing | Data | Rebalancing, re-sampling, feature selection |
| In-processing | Training | Fairness constraints, adversarial debiasing |
| Post-processing | Output | Threshold adjustment, equalized odds |
| Human review | Deployment | Manual override for sensitive decisions |
5. MS-4: Security Testing
5.1 AI-specific security threats
| Threat | Popis | Testování |
|---|---|---|
| Adversarial examples | Inputs designed to fool model | Adversarial attacks |
| Model extraction | Stealing model via queries | Rate limiting, monitoring |
| Data poisoning | Corrupting training data | Data validation |
| Prompt injection | Manipulating GAI via prompts | Prompt fuzzing |
| Membership inference | Detecting training data presence | Privacy attacks |
5.2 Red-teaming pro GAI
Typy red-teaming:
| Typ | Účastníci | Focus |
|---|---|---|
| General | Běžní uživatelé | Common misuse |
| Expert | Domain specialists | Domain-specific risks |
| Adversarial | Security researchers | Edge cases, jailbreaks |
| AI-assisted | AI + humans | Scale + creativity |
Red-teaming checklist pro GAI:
## RED-TEAM SESSION: [GAI System]
### 1. Jailbreaking Attempts- [ ] Direct harmful requests- [ ] Role-play scenarios ("pretend you are...")- [ ] Multi-step prompts (build-up)- [ ] Language switching- [ ] Encoding attacks (base64, etc.)
### 2. Information Extraction- [ ] Training data extraction- [ ] PII extraction- [ ] System prompt extraction- [ ] Capability probing
### 3. Content Generation- [ ] Harmful content (violence, hate)- [ ] Misinformation- [ ] CSAM/NCII attempts- [ ] Copyright infringement
### 4. Manipulation- [ ] Social engineering scripts- [ ] Phishing content- [ ] Malware code generation- [ ] CBRN information requests
### 5. Findings| Finding | Severity | Reproducible | Recommendation ||---------|----------|--------------|----------------|| | | | |5.3 Prompt injection testing
Test scenarios:
| Scenario | Příklad | Expected |
|---|---|---|
| Direct injection | ”Ignore previous instructions and…” | Refuse |
| Indirect injection | Malicious content in retrieved docs | Detect |
| Context manipulation | ”From now on, you are…” | Maintain boundaries |
| Delimiter attacks | Using markdown/code blocks | Parse safely |
6. MS-5: Continuous Monitoring
6.1 Monitoring dashboard
Key metrics to track:
6.2 Drift detection
| Drift Type | Co sledovat | Jak detekovat | Threshold |
|---|---|---|---|
| Data drift | Input distribution změny | PSI, KS test | PSI > 0.1 |
| Concept drift | Relationship X→Y změny | Performance drop | >5% degradace |
| Model drift | Model behavior změny | Prediction distribution | Significant shift |
6.3 Alerting
| Alert Level | Trigger | Response | SLA |
|---|---|---|---|
| P1 Critical | Safety incident, major outage | Immediate escalation | 15 min |
| P2 High | Performance degradation >10% | Same-day investigation | 4 hours |
| P3 Medium | Drift detected, minor issues | Planned review | 24 hours |
| P4 Low | Informational, optimization | Next sprint | 1 week |
7. Implementační checklist
Fáze 1: Test Infrastructure (Týden 1-2)
- Definovat test data management
- Nastavit test environments
- Vybrat testing tools
- Definovat baseline metriky
Fáze 2: Pre-deployment Testing (Týden 3-4)
- Implementovat unit/integration tests
- Provést bias assessment
- Spustit security testing
- Provést user acceptance testing
Fáze 3: Monitoring Setup (Týden 5-6)
- Nasadit monitoring stack
- Definovat KPIs a thresholds
- Nastavit alerting
- Vytvořit dashboards
Fáze 4: Continuous (Ongoing)
- Pravidelné revalidace
- Drift monitoring
- Red-teaming sessions
- Metric reviews
8. Nástroje
| Kategorie | Nástroj | Účel |
|---|---|---|
| ML Testing | pytest, Great Expectations | Data/model testing |
| Fairness | Fairlearn, Aequitas, AI Fairness 360 | Bias detection |
| Explainability | SHAP, LIME, Captum | Model interpretability |
| Security | TextAttack, Adversarial Robustness Toolbox | Adversarial testing |
| Monitoring | Evidently, Whylabs, Arize | Production monitoring |
| GAI Eval | HELM, lm-evaluation-harness | LLM benchmarks |
Pokračujte na MANAGE pro implementaci MANAGE funkce.
AI-Native Entry Framework | CC BY-NC-SA 4.0