Information
# A-FRAUD: Agentic Framework for Risk Analysis & Unified Detection
**Project Name:** A-FRAUD Agentic Framework
**Vision:** Building a self-adapting "Digital Immune System" that solves complex fraud problems based on behavior and network analysis.
---
## Table of Contents
- [Overview](#overview)
- [System Architecture](#system-architecture)
- [Agent Clusters](#agent-clusters)
- [Technical Stack](#technical-stack)
- [Security & Compliance](#security--compliance)
- [Expected Performance](#expected-performance)
---
## Overview
A-FRAUD is an advanced multi-agent system (MAS) designed for TPBank to combat fraud through specialized agent clusters working in coordination via Apache Kafka. The system implements a weighted consensus mechanism to make decisions, ensuring both high accuracy and low latency for transaction processing.
### Key Features
- **Multi-Agent Architecture:** Specialized agents for identity, behavior, network, linguistic, and inter-bank analysis
- **Self-Learning:** Reinforcement learning from human feedback (RLHF) for continuous improvement
- **Privacy-Preserving:** Compliance with Vietnamese regulations (Decree 13/2023/NĐ-CP, Decision 2345/QĐ-NHNN)
- **High Performance:** Supports 1000+ TPS with sub-2-second latency for complex transactions
- **Explainable AI:** Transparent reasoning reports for compliance and customer service
---
## System Architecture
### High-Level Architecture
\`\`\`mermaid
graph TB
subgraph Gateway[API Gateway]
API[Transaction API]
end
subgraph Core[Core Banking]
CB[Core Banking System]
end
API -->|Transaction Request| Adjudicator[Adjudicator Agent]
subgraph Agents[Specialized Agent Clusters]
Sentinel[Sentinel Agent
Identity & Anti-Fraud] Biometric[Biometric Agent
Behavioral Analytics] GraphWeaver[Graph Weaver Agent
Network Analysis] Linguistic[Linguistic Agent
Semantic Intent] Diplomat[Diplomat Agent
Inter-bank Collaboration] end Adjudicator -->|Kafka Messages| Sentinel Adjudicator -->|Kafka Messages| Biometric Adjudicator -->|Kafka Messages| GraphWeaver Adjudicator -->|Kafka Messages| Linguistic Adjudicator -->|gRPC/mTLS| Diplomat Sentinel -->|Risk Reports| Adjudicator Biometric -->|Risk Reports| Adjudicator GraphWeaver -->|Risk Reports| Adjudicator Linguistic -->|Risk Reports| Adjudicator Diplomat -->|Risk Reports| Adjudicator Adjudicator -->|Decision| Explicator[Explicator Agent
XAI & Reporting] Adjudicator -->|Action| CB subgraph Data[Data Layer] Kafka[Apache Kafka
Message Bus] Neo4j[Neo4j
Graph Database] Milvus[Milvus
Vector Database] Redis[Redis
Cache] TimescaleDB[TimescaleDB
Time-Series] end Agents --> Kafka Adjudicator --> Kafka GraphWeaver --> Neo4j Sentinel --> Milvus Linguistic --> Milvus Adjudicator --> Redis Adjudicator --> TimescaleDB \`\`\` ### Data Flow \`\`\`mermaid sequenceDiagram participant CB as Core Banking participant Adj as Adjudicator participant S as Sentinel participant B as Biometric participant G as Graph Weaver participant L as Linguistic participant D as Diplomat participant E as Explicator CB->>Adj: Transaction Request Adj->>Adj: Triage Decision (Fast-path vs Deep-dive) alt Fast-path (<50ms) Adj->>CB: ALLOW/BLOCK (Rule-based) else Deep-dive (500ms-2s) Adj->>S: Identity Verification Request Adj->>B: Behavioral Analysis Request Adj->>G: Network Analysis Request Adj->>L: Intent Analysis Request Adj->>D: Inter-bank Query Request S->>Adj: Risk Score + Confidence B->>Adj: Risk Score + Confidence G->>Adj: Risk Score + Confidence L->>Adj: Risk Score + Confidence D->>Adj: Risk Score + Confidence Adj->>Adj: Weighted Consensus Calculation Adj->>E: Decision + Reasoning Trace Adj->>CB: Final Decision (ALLOW/CHALLENGE/BLOCK) E->>E: Generate Explanation Report end \`\`\` --- ## Agent Clusters ### 1. Adjudicator Agent (Orchestration Cluster) **Code:** A-FRAUD-ADJUDICATOR-00 **Technology:** Golang (Gin/Echo, Drools/Grule) **Role:** System orchestrator and final decision maker The Adjudicator acts as the "Prefrontal Cortex" of the system, orchestrating information flow and weighing evidence from all specialized agents. #### Decision Workflow - **Fast-path (<50ms):** Rule-based engine for low-risk transactions or hard-stop blacklists - **Deep-dive (500ms-2s):** Activates all specialized agents for grey-zone or high-value transactions (>20 million VND) #### Weighted Consensus Formula The final risk score (R_final) is calculated using: \`\`\` R_final = σ(Σ(v_i × c_i × w_i)) \`\`\` Where: - \`v_i\`: Risk score from agent i (0: Safe, 1: Fraud) - \`c_i\`: Confidence level of agent i (0-1) - \`w_i\`: Weight of agent i (adjusted by Auditor Agent over time) - \`σ\`: Sigmoid activation function for decision amplification #### Action Matrix | Threshold | Status | Action | |-----------|--------|--------| | 0.0 - 0.3 | Green (Safe) | \`ALLOW\`: Approve transaction immediately | | 0.3 - 0.7 | Yellow (Suspicious) | \`CHALLENGE\`: Require FaceID or shuffled keypad (Step-up Auth) | | 0.7 - 1.0 | Red (High Risk) | \`BLOCK\`: Block transaction, lock account, notify Fraud team | #### Resilience Mechanisms - **Circuit Breaker:** Automatically disconnects non-responsive agents after 500ms - **Shadow Reasoning:** Uses historical statistical averages from TimescaleDB when agent data is unavailable - **Dynamic Thresholding:** Automatically raises approval threshold from 0.7 to 0.85 during infrastructure issues --- ### 2. Sentinel Agent (Identity Cluster) **Code:** A-FRAUD-SENTINEL-01 **Technology:** Python (PyTorch, InsightFace, Tesseract/OpenCV) **Vector Database:** Milvus **Role:** Entity authentication and identity verification #### Core Modules 1. **OCR & NFC Module:** Extracts information from CCCD chip (MRZ - Machine Readable Zone), verifies digital signature 2. **Biometric Liveness Module:** Active liveness detection (blink, turn head, smile) to detect deepfakes 3. **Face Matching Module:** ArcFace/InsightFace for feature extraction, Cosine Similarity for vector comparison #### Input Parameters - \`Face_Embedding_Vector\` (512-dim) - \`NFC_Chip_Data\` - \`OCR_CCCD_Result\` #### Processing Flow 1. Extract selfie and chip photo → 2 embeddings (512-dim vectors) 2. Calculate Cosine Similarity → Matching_Score (0.0-1.0) 3. Cross-check with blacklist → Is_Blacklisted (Boolean) 4. Synthetic identity check → Identity_Anomaly_Score #### Security & Compliance - **No-raw-data Policy:** Only stores feature vectors, not original images (compliant with Decree 13) - **Salted Hashing:** CCCD numbers hashed with SHA-256 + internal Salt - **mTLS Communication:** All inter-agent communication via gRPC over mTLS --- ### 3. Biometric Agent (Behavioral Cluster) **Code:** A-FRAUD-OBSERVER-02 **Technology:** Python (TensorFlow/Keras, RNN/LSTM), Apache Flink/Spark Streaming **Role:** Continuous authentication and behavioral anomaly detection #### Data Inputs - **Keystroke Dynamics:** Dwell time, flight time, typing speed - **Touch & Gesture Analysis:** Pressure, contact area, swipe patterns - **Sensor Fusion:** Accelerometer, gyroscope data #### Core Analysis **Behavioral Entropy Calculation:** \`\`\` H = -Σ(p_i × log₂(p_i)) \`\`\` - If \`H ≈ 0\`: Deterministic pattern → BOT detected → **BLOCK immediately** - If \`H\` within normal user range: **ALLOW** **Remote Access Detection:** - Analyzes latency correlation between network latency and UI interaction latency - Unusual correlation indicates remote control (TeamViewer, AnyDesk, malware) #### Active Probing When risk is in "Yellow Zone" (0.4-0.7): - **Shuffled Keypad:** Randomly repositions numbers on PIN/OTP keyboard - **Hesitation Index:** If response time increases >300% → Social engineering suspicion #### Privacy by Design - **No Keylogging:** Only records timing, never key values - **Local Feature Extraction:** Processing happens on TPBank app SDK - **Anonymization:** Behavioral data linked to encrypted Session_ID only --- ### 4. Graph Weaver Agent (Network Cluster) **Code:** A-FRAUD-WEAVER-03 **Technology:** Python (Neo4j, Cypher, Graph Data Science library) **Role:** Network and link analysis for money laundering detection #### Graph Schema **Nodes:** - \`Account\`: Bank account number - \`Customer\`: Identity information (hashed) - \`Device\`: Hardware ID, UUID - \`IP\`: Network address **Relationships:** - \`TRANSFERRED_TO\`: Money flow (amount, timestamp) - \`OWNED_BY\`: Account ownership - \`ACCESSED_VIA\`: Account-device/IP relationship - \`SHARED_WITH\`: Accounts sharing phone/email/device #### Detection Patterns 1. **Layering Detection:** Fast money transfer chains (<5 minutes) with preserved amounts 2. **Star/Hub Pattern:** Multiple small accounts (smurfs) → central account → large withdrawal 3. **Community Detection:** Louvain/Label Propagation algorithms to find account farms #### Algorithms - **PageRank:** Determines account "importance" (risk) based on connections - **Weakly Connected Components (WCC):** Identifies fraud network scale - **Path Finding:** Checks indirect relationships between source and destination accounts #### Scalability - **TTL:** Keeps transactions in Neo4j RAM for 30-60 days only - **Graph Partitioning:** Divides graph by geography or bank prefix - **Near Real-time Batch:** Scans entire graph every 15 minutes for new networks --- ### 5. Linguistic Agent (Semantic Cluster) **Code:** A-FRAUD-INTERPRETER-04 **Technology:** Python (LangChain, LangGraph, Local LLM - Llama-3/Mistral, vLLM/Ollama) **Vector Database:** Milvus **Role:** Context analysis and intent recognition #### Data Preprocessing **PII Scrubbing (NER-based):** - Original: "Transfer money to Nguyen Van A 0901234567 to pay tax fine" - After: "Transfer money to [NAME] [PHONE] to pay tax fine" #### Reasoning Engine 1. **Intent Classification:** Probability model P(Intent|Context) using embeddings 2. **Sentiment & Urgency Analysis:** Detects urgency keywords ("urgent", "immediate", "arrest warrant") 3. **Semantic Embedding Search:** Compares with known fraud playbooks in Milvus vector DB #### Detection Scenarios | Scenario | Keywords | Risk Score | |----------|----------|------------| | Authority Impersonation | "Pay fine", "Police", "Arrest warrant", "Prosecutor" | 0.95 (High) | | Investment/Job Scam | "Recruit agent", "Commission", "Deposit", "Shopee seller" | 0.85 (High) | | P2P/Marketplace Scam | "Car deposit", "Room deposit", "Goods payment" | 0.70 (Medium) | | Romance Scam | "Gift", "Flight ticket", "Send money home" | 0.80 (High) | #### Deployment - **On-premise LLM:** GPU servers (Nvidia A100/H100) at TPBank - **Quantization:** 4-bit or 8-bit (GGUF/EXL2) for <500ms response time - **Prompt Guard:** Prevents prompt injection attacks --- ### 6. Diplomat Agent (Collaboration Cluster) **Code:** A-FRAUD-DIPLOMAT-05 **Technology:** Python (gRPC, PySyft, mTLS) **Security:** HashiCorp Vault, CloudHSM **Role:** Inter-bank collaboration and risk intelligence sharing #### ECDH-PSI Protocol (Private Set Intersection) Allows two banks to find common blacklisted accounts without revealing full lists: 1. TPBank has suspect set S₁, partner bank has S₂ 2. TPBank chooses secret random a, sends a·G to partner 3. Partner chooses secret random b, sends b·G to TPBank 4. Both compute: TPBank → (b·G)·a, Partner → (a·G)·b 5. Since a·b·G = b·a·G, matching values = intersection #### Data Privacy Protocols - **Dynamic Salted Hashing:** HMAC-SHA256 with session-based Salt - **HSM Integration:** Encryption keys in Hardware Security Module (FIPS 140-2 Level 3) - **Anonymized Risk Scoring:** Sends normalized risk score (0.0-1.0) + fraud category code, not account numbers #### Workflow 1. **Probe:** Send PSI query to receiving bank 2. **Hash-Exchange:** Execute ECDH-PSI protocol 3. **Risk-Feedback:** Receive risk score from partner 4. **Consensus:** Aggregate results from multiple banks #### Compliance - **Decree 13 (DPA):** No raw PII transferred, only non-reversible hashes and risk scores - **Decision 2345:** Additional authentication layer for suspicious inter-bank transactions - **Audit Trail:** Encrypted logs for State Bank of Vietnam review --- ### 7. Explicator Agent (Reporting Cluster) **Code:** A-FRAUD-EXPLICATOR-06 **Technology:** Python (FastAPI, LangChain, RAG, Local LLM) **Role:** Explainable AI (XAI) and transparency reporting #### Core Technology - **Model:** Local LLM (Mistral-7B or Llama-3-8B) fine-tuned for banking - **Technique:** RAG (Retrieval-Augmented Generation) for evidence retrieval - **Temperature:** Set to 0 to prevent hallucinations #### Traceback Workflow 1. **Trigger:** Receives decision + Audit_ID from Adjudicator 2. **Evidence Retrieval:** Accesses reasoning logs from all agents 3. **Causal Mapping:** Identifies high-weight contributing factors 4. **Narrative Synthesis:** Converts structured JSON → natural language report #### Anti-Hallucination Guardrails - **Grounding:** LLM only uses facts from Evidence_Pool - **Template-based Validation:** Fixed report structure with slots - **Verification Loop:** Script verifies LLM numbers against raw Kafka data #### Sample Report Output \`\`\` [A-FRAUD EXPLANATION REPORT] - Transaction ID: TXN-2026-8888 - Decision: Challenge (Step-up Auth required) - Root Cause Analysis: 1. Behavioral Anomaly (40%): Detected "Paste" operation from clipboard, input speed 2.5x slower than average (Hesitation detected). 2. Semantic Analysis (35%): Transaction remark contains "Security deposit" - 92% match with known investment scam pattern. 3. Network Risk (25%): Recipient account reactivated after 6 months, received 3 small transactions consecutively. - Conclusion: Data indicates customer may be under coercion. Dynamic biometric authentication required for safety. \`\`\` #### Security - **PII Redaction:** Masks sensitive info before display (e.g., "Nguyen Van A" → "N*** V** A") - **Digital Signing:** All reports digitally signed and stored in Immutable Log - **Role-Based Reporting:** Detail level varies by staff role (CSKH vs Fraud Specialist) --- ## Technical Stack ### Core Infrastructure | Layer | Technology | Purpose | |-------|-----------|---------| | **Messaging** | Apache Kafka | Asynchronous agent communication | | **Orchestration** | Golang (Gin/Echo) | High-concurrency adjudicator | | **AI Framework** | Python (PyTorch, LangChain, LangGraph, TensorFlow) | Agent reasoning | | **Vector Search** | Milvus | Identity/behavioral vector storage | | **Graph Database** | Neo4j (GDS library) | Network relationship analysis | | **Cache** | Redis | Session state management | | **Time-Series** | TimescaleDB | Historical statistical data | | **Containerization** | Docker/Kubernetes | On-premise deployment | ### AWS Integration (Optional) - **EKS:** Kubernetes orchestration - **Lambda:** Fast-path serverless triggers - **Step Functions:** Complex deep-dive workflows - **MSK:** Managed Kafka - **SageMaker:** Model training and deployment - **Bedrock:** Managed LLM (with PrivateLink) - **Neptune:** Managed graph database alternative - **CloudHSM:** Hardware security module --- ## Security & Compliance ### Transport Layer - **gRPC over mTLS:** Encrypted communication between agents - **Kafka SSL:** Secure message bus ### Data Privacy - **Salted Hashing:** HMAC-SHA256 for PII - **Tokenization:** No raw PII in agent logs - **Feature Vectors Only:** Original biometric data discarded after processing ### AI Integrity - **Temperature = 0:** Prevents hallucinations in Explicator Agent - **Grounding:** Evidence-based reasoning only - **Template Validation:** Structured report generation ### Compliance - **Zero-Knowledge Proofs (ZKP):** Prove suspicious transactions without revealing account numbers - **Decree 13/2023/NĐ-CP:** Personal data protection compliance - **Decision 2345/QĐ-NHNN:** Online payment security requirements - **Immutable Logs:** Audit trail for regulatory review --- ## Expected Performance ### Key Performance Indicators (KPIs) 1. **Fraud Detection:** 35-40% improvement over rule-based systems 2. **False Positive Reduction:** 20% decrease through multi-dimensional reasoning 3. **Latency:** <2 seconds for most complex transactions 4. **Throughput:** 1000+ TPS support 5. **Compliance:** 100% adherence to biometric security and data protection requirements ### Resilience Metrics - **Circuit Breaker Response:** <500ms timeout - **Shadow Reasoning Fallback:** <100ms for historical data lookup - **Dynamic Thresholding:** Automatic adjustment during infrastructure issues --- ## System Status **Status:** Ready for Proof of Concept (PoC) deployment A-FRAUD is not just a fraud prevention tool—it is an agentic architecture capable of **SELF-UNDERSTANDING** context, **SELF-COORDINATING** inter-bank collaboration, and **SELF-LEARNING** from its own mistakes. --- **Designed by:** System Design Master & AI Tutor **For:** TPBank **Version:** 1.0.4
Identity & Anti-Fraud] Biometric[Biometric Agent
Behavioral Analytics] GraphWeaver[Graph Weaver Agent
Network Analysis] Linguistic[Linguistic Agent
Semantic Intent] Diplomat[Diplomat Agent
Inter-bank Collaboration] end Adjudicator -->|Kafka Messages| Sentinel Adjudicator -->|Kafka Messages| Biometric Adjudicator -->|Kafka Messages| GraphWeaver Adjudicator -->|Kafka Messages| Linguistic Adjudicator -->|gRPC/mTLS| Diplomat Sentinel -->|Risk Reports| Adjudicator Biometric -->|Risk Reports| Adjudicator GraphWeaver -->|Risk Reports| Adjudicator Linguistic -->|Risk Reports| Adjudicator Diplomat -->|Risk Reports| Adjudicator Adjudicator -->|Decision| Explicator[Explicator Agent
XAI & Reporting] Adjudicator -->|Action| CB subgraph Data[Data Layer] Kafka[Apache Kafka
Message Bus] Neo4j[Neo4j
Graph Database] Milvus[Milvus
Vector Database] Redis[Redis
Cache] TimescaleDB[TimescaleDB
Time-Series] end Agents --> Kafka Adjudicator --> Kafka GraphWeaver --> Neo4j Sentinel --> Milvus Linguistic --> Milvus Adjudicator --> Redis Adjudicator --> TimescaleDB \`\`\` ### Data Flow \`\`\`mermaid sequenceDiagram participant CB as Core Banking participant Adj as Adjudicator participant S as Sentinel participant B as Biometric participant G as Graph Weaver participant L as Linguistic participant D as Diplomat participant E as Explicator CB->>Adj: Transaction Request Adj->>Adj: Triage Decision (Fast-path vs Deep-dive) alt Fast-path (<50ms) Adj->>CB: ALLOW/BLOCK (Rule-based) else Deep-dive (500ms-2s) Adj->>S: Identity Verification Request Adj->>B: Behavioral Analysis Request Adj->>G: Network Analysis Request Adj->>L: Intent Analysis Request Adj->>D: Inter-bank Query Request S->>Adj: Risk Score + Confidence B->>Adj: Risk Score + Confidence G->>Adj: Risk Score + Confidence L->>Adj: Risk Score + Confidence D->>Adj: Risk Score + Confidence Adj->>Adj: Weighted Consensus Calculation Adj->>E: Decision + Reasoning Trace Adj->>CB: Final Decision (ALLOW/CHALLENGE/BLOCK) E->>E: Generate Explanation Report end \`\`\` --- ## Agent Clusters ### 1. Adjudicator Agent (Orchestration Cluster) **Code:** A-FRAUD-ADJUDICATOR-00 **Technology:** Golang (Gin/Echo, Drools/Grule) **Role:** System orchestrator and final decision maker The Adjudicator acts as the "Prefrontal Cortex" of the system, orchestrating information flow and weighing evidence from all specialized agents. #### Decision Workflow - **Fast-path (<50ms):** Rule-based engine for low-risk transactions or hard-stop blacklists - **Deep-dive (500ms-2s):** Activates all specialized agents for grey-zone or high-value transactions (>20 million VND) #### Weighted Consensus Formula The final risk score (R_final) is calculated using: \`\`\` R_final = σ(Σ(v_i × c_i × w_i)) \`\`\` Where: - \`v_i\`: Risk score from agent i (0: Safe, 1: Fraud) - \`c_i\`: Confidence level of agent i (0-1) - \`w_i\`: Weight of agent i (adjusted by Auditor Agent over time) - \`σ\`: Sigmoid activation function for decision amplification #### Action Matrix | Threshold | Status | Action | |-----------|--------|--------| | 0.0 - 0.3 | Green (Safe) | \`ALLOW\`: Approve transaction immediately | | 0.3 - 0.7 | Yellow (Suspicious) | \`CHALLENGE\`: Require FaceID or shuffled keypad (Step-up Auth) | | 0.7 - 1.0 | Red (High Risk) | \`BLOCK\`: Block transaction, lock account, notify Fraud team | #### Resilience Mechanisms - **Circuit Breaker:** Automatically disconnects non-responsive agents after 500ms - **Shadow Reasoning:** Uses historical statistical averages from TimescaleDB when agent data is unavailable - **Dynamic Thresholding:** Automatically raises approval threshold from 0.7 to 0.85 during infrastructure issues --- ### 2. Sentinel Agent (Identity Cluster) **Code:** A-FRAUD-SENTINEL-01 **Technology:** Python (PyTorch, InsightFace, Tesseract/OpenCV) **Vector Database:** Milvus **Role:** Entity authentication and identity verification #### Core Modules 1. **OCR & NFC Module:** Extracts information from CCCD chip (MRZ - Machine Readable Zone), verifies digital signature 2. **Biometric Liveness Module:** Active liveness detection (blink, turn head, smile) to detect deepfakes 3. **Face Matching Module:** ArcFace/InsightFace for feature extraction, Cosine Similarity for vector comparison #### Input Parameters - \`Face_Embedding_Vector\` (512-dim) - \`NFC_Chip_Data\` - \`OCR_CCCD_Result\` #### Processing Flow 1. Extract selfie and chip photo → 2 embeddings (512-dim vectors) 2. Calculate Cosine Similarity → Matching_Score (0.0-1.0) 3. Cross-check with blacklist → Is_Blacklisted (Boolean) 4. Synthetic identity check → Identity_Anomaly_Score #### Security & Compliance - **No-raw-data Policy:** Only stores feature vectors, not original images (compliant with Decree 13) - **Salted Hashing:** CCCD numbers hashed with SHA-256 + internal Salt - **mTLS Communication:** All inter-agent communication via gRPC over mTLS --- ### 3. Biometric Agent (Behavioral Cluster) **Code:** A-FRAUD-OBSERVER-02 **Technology:** Python (TensorFlow/Keras, RNN/LSTM), Apache Flink/Spark Streaming **Role:** Continuous authentication and behavioral anomaly detection #### Data Inputs - **Keystroke Dynamics:** Dwell time, flight time, typing speed - **Touch & Gesture Analysis:** Pressure, contact area, swipe patterns - **Sensor Fusion:** Accelerometer, gyroscope data #### Core Analysis **Behavioral Entropy Calculation:** \`\`\` H = -Σ(p_i × log₂(p_i)) \`\`\` - If \`H ≈ 0\`: Deterministic pattern → BOT detected → **BLOCK immediately** - If \`H\` within normal user range: **ALLOW** **Remote Access Detection:** - Analyzes latency correlation between network latency and UI interaction latency - Unusual correlation indicates remote control (TeamViewer, AnyDesk, malware) #### Active Probing When risk is in "Yellow Zone" (0.4-0.7): - **Shuffled Keypad:** Randomly repositions numbers on PIN/OTP keyboard - **Hesitation Index:** If response time increases >300% → Social engineering suspicion #### Privacy by Design - **No Keylogging:** Only records timing, never key values - **Local Feature Extraction:** Processing happens on TPBank app SDK - **Anonymization:** Behavioral data linked to encrypted Session_ID only --- ### 4. Graph Weaver Agent (Network Cluster) **Code:** A-FRAUD-WEAVER-03 **Technology:** Python (Neo4j, Cypher, Graph Data Science library) **Role:** Network and link analysis for money laundering detection #### Graph Schema **Nodes:** - \`Account\`: Bank account number - \`Customer\`: Identity information (hashed) - \`Device\`: Hardware ID, UUID - \`IP\`: Network address **Relationships:** - \`TRANSFERRED_TO\`: Money flow (amount, timestamp) - \`OWNED_BY\`: Account ownership - \`ACCESSED_VIA\`: Account-device/IP relationship - \`SHARED_WITH\`: Accounts sharing phone/email/device #### Detection Patterns 1. **Layering Detection:** Fast money transfer chains (<5 minutes) with preserved amounts 2. **Star/Hub Pattern:** Multiple small accounts (smurfs) → central account → large withdrawal 3. **Community Detection:** Louvain/Label Propagation algorithms to find account farms #### Algorithms - **PageRank:** Determines account "importance" (risk) based on connections - **Weakly Connected Components (WCC):** Identifies fraud network scale - **Path Finding:** Checks indirect relationships between source and destination accounts #### Scalability - **TTL:** Keeps transactions in Neo4j RAM for 30-60 days only - **Graph Partitioning:** Divides graph by geography or bank prefix - **Near Real-time Batch:** Scans entire graph every 15 minutes for new networks --- ### 5. Linguistic Agent (Semantic Cluster) **Code:** A-FRAUD-INTERPRETER-04 **Technology:** Python (LangChain, LangGraph, Local LLM - Llama-3/Mistral, vLLM/Ollama) **Vector Database:** Milvus **Role:** Context analysis and intent recognition #### Data Preprocessing **PII Scrubbing (NER-based):** - Original: "Transfer money to Nguyen Van A 0901234567 to pay tax fine" - After: "Transfer money to [NAME] [PHONE] to pay tax fine" #### Reasoning Engine 1. **Intent Classification:** Probability model P(Intent|Context) using embeddings 2. **Sentiment & Urgency Analysis:** Detects urgency keywords ("urgent", "immediate", "arrest warrant") 3. **Semantic Embedding Search:** Compares with known fraud playbooks in Milvus vector DB #### Detection Scenarios | Scenario | Keywords | Risk Score | |----------|----------|------------| | Authority Impersonation | "Pay fine", "Police", "Arrest warrant", "Prosecutor" | 0.95 (High) | | Investment/Job Scam | "Recruit agent", "Commission", "Deposit", "Shopee seller" | 0.85 (High) | | P2P/Marketplace Scam | "Car deposit", "Room deposit", "Goods payment" | 0.70 (Medium) | | Romance Scam | "Gift", "Flight ticket", "Send money home" | 0.80 (High) | #### Deployment - **On-premise LLM:** GPU servers (Nvidia A100/H100) at TPBank - **Quantization:** 4-bit or 8-bit (GGUF/EXL2) for <500ms response time - **Prompt Guard:** Prevents prompt injection attacks --- ### 6. Diplomat Agent (Collaboration Cluster) **Code:** A-FRAUD-DIPLOMAT-05 **Technology:** Python (gRPC, PySyft, mTLS) **Security:** HashiCorp Vault, CloudHSM **Role:** Inter-bank collaboration and risk intelligence sharing #### ECDH-PSI Protocol (Private Set Intersection) Allows two banks to find common blacklisted accounts without revealing full lists: 1. TPBank has suspect set S₁, partner bank has S₂ 2. TPBank chooses secret random a, sends a·G to partner 3. Partner chooses secret random b, sends b·G to TPBank 4. Both compute: TPBank → (b·G)·a, Partner → (a·G)·b 5. Since a·b·G = b·a·G, matching values = intersection #### Data Privacy Protocols - **Dynamic Salted Hashing:** HMAC-SHA256 with session-based Salt - **HSM Integration:** Encryption keys in Hardware Security Module (FIPS 140-2 Level 3) - **Anonymized Risk Scoring:** Sends normalized risk score (0.0-1.0) + fraud category code, not account numbers #### Workflow 1. **Probe:** Send PSI query to receiving bank 2. **Hash-Exchange:** Execute ECDH-PSI protocol 3. **Risk-Feedback:** Receive risk score from partner 4. **Consensus:** Aggregate results from multiple banks #### Compliance - **Decree 13 (DPA):** No raw PII transferred, only non-reversible hashes and risk scores - **Decision 2345:** Additional authentication layer for suspicious inter-bank transactions - **Audit Trail:** Encrypted logs for State Bank of Vietnam review --- ### 7. Explicator Agent (Reporting Cluster) **Code:** A-FRAUD-EXPLICATOR-06 **Technology:** Python (FastAPI, LangChain, RAG, Local LLM) **Role:** Explainable AI (XAI) and transparency reporting #### Core Technology - **Model:** Local LLM (Mistral-7B or Llama-3-8B) fine-tuned for banking - **Technique:** RAG (Retrieval-Augmented Generation) for evidence retrieval - **Temperature:** Set to 0 to prevent hallucinations #### Traceback Workflow 1. **Trigger:** Receives decision + Audit_ID from Adjudicator 2. **Evidence Retrieval:** Accesses reasoning logs from all agents 3. **Causal Mapping:** Identifies high-weight contributing factors 4. **Narrative Synthesis:** Converts structured JSON → natural language report #### Anti-Hallucination Guardrails - **Grounding:** LLM only uses facts from Evidence_Pool - **Template-based Validation:** Fixed report structure with slots - **Verification Loop:** Script verifies LLM numbers against raw Kafka data #### Sample Report Output \`\`\` [A-FRAUD EXPLANATION REPORT] - Transaction ID: TXN-2026-8888 - Decision: Challenge (Step-up Auth required) - Root Cause Analysis: 1. Behavioral Anomaly (40%): Detected "Paste" operation from clipboard, input speed 2.5x slower than average (Hesitation detected). 2. Semantic Analysis (35%): Transaction remark contains "Security deposit" - 92% match with known investment scam pattern. 3. Network Risk (25%): Recipient account reactivated after 6 months, received 3 small transactions consecutively. - Conclusion: Data indicates customer may be under coercion. Dynamic biometric authentication required for safety. \`\`\` #### Security - **PII Redaction:** Masks sensitive info before display (e.g., "Nguyen Van A" → "N*** V** A") - **Digital Signing:** All reports digitally signed and stored in Immutable Log - **Role-Based Reporting:** Detail level varies by staff role (CSKH vs Fraud Specialist) --- ## Technical Stack ### Core Infrastructure | Layer | Technology | Purpose | |-------|-----------|---------| | **Messaging** | Apache Kafka | Asynchronous agent communication | | **Orchestration** | Golang (Gin/Echo) | High-concurrency adjudicator | | **AI Framework** | Python (PyTorch, LangChain, LangGraph, TensorFlow) | Agent reasoning | | **Vector Search** | Milvus | Identity/behavioral vector storage | | **Graph Database** | Neo4j (GDS library) | Network relationship analysis | | **Cache** | Redis | Session state management | | **Time-Series** | TimescaleDB | Historical statistical data | | **Containerization** | Docker/Kubernetes | On-premise deployment | ### AWS Integration (Optional) - **EKS:** Kubernetes orchestration - **Lambda:** Fast-path serverless triggers - **Step Functions:** Complex deep-dive workflows - **MSK:** Managed Kafka - **SageMaker:** Model training and deployment - **Bedrock:** Managed LLM (with PrivateLink) - **Neptune:** Managed graph database alternative - **CloudHSM:** Hardware security module --- ## Security & Compliance ### Transport Layer - **gRPC over mTLS:** Encrypted communication between agents - **Kafka SSL:** Secure message bus ### Data Privacy - **Salted Hashing:** HMAC-SHA256 for PII - **Tokenization:** No raw PII in agent logs - **Feature Vectors Only:** Original biometric data discarded after processing ### AI Integrity - **Temperature = 0:** Prevents hallucinations in Explicator Agent - **Grounding:** Evidence-based reasoning only - **Template Validation:** Structured report generation ### Compliance - **Zero-Knowledge Proofs (ZKP):** Prove suspicious transactions without revealing account numbers - **Decree 13/2023/NĐ-CP:** Personal data protection compliance - **Decision 2345/QĐ-NHNN:** Online payment security requirements - **Immutable Logs:** Audit trail for regulatory review --- ## Expected Performance ### Key Performance Indicators (KPIs) 1. **Fraud Detection:** 35-40% improvement over rule-based systems 2. **False Positive Reduction:** 20% decrease through multi-dimensional reasoning 3. **Latency:** <2 seconds for most complex transactions 4. **Throughput:** 1000+ TPS support 5. **Compliance:** 100% adherence to biometric security and data protection requirements ### Resilience Metrics - **Circuit Breaker Response:** <500ms timeout - **Shadow Reasoning Fallback:** <100ms for historical data lookup - **Dynamic Thresholding:** Automatic adjustment during infrastructure issues --- ## System Status **Status:** Ready for Proof of Concept (PoC) deployment A-FRAUD is not just a fraud prevention tool—it is an agentic architecture capable of **SELF-UNDERSTANDING** context, **SELF-COORDINATING** inter-bank collaboration, and **SELF-LEARNING** from its own mistakes. --- **Designed by:** System Design Master & AI Tutor **For:** TPBank **Version:** 1.0.4