1) Architecture & Data Flow
The system is a React dashboard talking to a Flask ML API over /api/ml/. Models train, persist to .pkl, and generate graphs under reports/.
React SPA with a tabbed dashboard in src/App.js. Each tab renders one of the four component folders under src/components/.
Flask ML API in backend/api/ml_api.py. Models live under backend/ml_models/, with persisted .pkl models and generated graphs in reports/.
All frontend network calls go through src/api/mlApi.js which fetches JSON from /api/ml/... on the same origin (Nginx reverse proxy).
CI/CD evidence is stored in backend/data/cicd_state.json. The Reporting Portal builds its timeline + custody logs from that stored evidence and from the anomaly-detection latest results.
Request lifecycle (from a button click)
UI → mlApi.js → Flask → Model → UI
expand
| Step | What happens | Where in code |
|---|---|---|
| 1 | User clicks a UI button (or changes a tab/filter) | src/components/** |
| 2 | Component calls a function from the API client | src/api/mlApi.js |
| 3 | Browser sends HTTP request to /api/ml/... | fetch() |
| 4 | Nginx proxies to Flask on port 5000 | /etc/nginx/sites-enabled/* |
| 5 | Flask route validates input, loads model, trains/infer, persists state | backend/api/ml_api.py |
| 6 | JSON response returned; UI updates state and re-renders | React useState/useEffect |
1.1) Project File Structure
Physical layout of the monorepo mapping code to components.
/var/www/forensic-portal/
├── backend/
│ ├── api/ml_api.py # Main Flask entry point (All Routes)
│ ├── data/cicd_state.json # JSON Database for Evidence & Logs
│ ├── ecosystem.config.js # PM2 process config
│ └── ml_models/ # Logic per domain
│ ├── anomaly_detection/ # Commit patterns, dependency checks, pipeline tampering
│ ├── cicd_agents/ # HDFS/Hadoop log analysis models
│ ├── integrity_verification/# Malware classifier, Ed25519, Merkle, hashing
│ └── reporting_portal/ # Attack/Breach classifiers & report generation
└── src/
├── api/mlApi.js # Frontend-Backend bridge (fetch wrapper)
├── App.js # Main React layout & Tab Controller
└── components/ # UI Views matching backend domains
├── AnomalyDetection/ # Graphs, Timelines, Training UI
├── ForensicReadyCICDAgents/ # Agent Status, Evidence Collection UI
├── ForensicReportingPortal/ # Final Reports, Chain of Custody UI
└── IntegrityVerification/ # Crypto tools & Malware Verification UI
2) Flow Charts
These diagrams are “shapes” (SVG) so they render everywhere (no external assets). Use them as a mental model for how every button maps to API calls, persisted artifacts, and UI rendering.
Overall system flow
Model lifecycle (generic train → infer → reports)
3) Dashboard Layout (Tabs)
The top-level tabs are implemented in src/App.js. Each tab renders a component:
| Dashboard Tab | Component File | What it shows |
|---|---|---|
| CI/CD Agents | src/components/ForensicReadyCICDAgents/ForensicReadyCICDAgents.js | Agent status, evidence collection, agent configuration, HDFS anomaly model, Hadoop failure model |
| Anomaly Detection | src/components/AnomalyDetection/AnomalyDetection.js | ML detection for commit/dependency anomalies, timeline reconstruction, training reports/graphs |
| Integrity Verification | src/components/IntegrityVerification/IntegrityVerification.js | SBOM hashing, hash verification, Ed25519 signatures, Merkle root, malware model train/infer |
| Reporting Portal | src/components/ForensicReportingPortal/ForensicReportingPortal.js | Evidence visualization, integrity proofs, chain-of-custody logs, report generator, reporting ML models |
The header status indicator (“Connected/Disconnected”) calls GET /api/ml/health.
Connected/Disconnected logic
health check
expand
GET /api/ml/health
Response:
{
"success": true
}
4) Component 1 — Forensic-Ready CI/CD Agents
Frontend folder: src/components/ForensicReadyCICDAgents/
This tab is designed to: show connected CI/CD agents, collect evidence from datasets/models, let you set evidence capture configuration, and run two ML models (HDFS anomaly, Hadoop failure classification).
UI sections
| Section | File | What it contains |
|---|---|---|
| Agent Status | AgentStatus.js | List of agents + metrics (total/active/performance overhead) |
| Evidence Collection | EvidenceCollector.js | Filters evidence items by type and triggers collection |
| Agent Configuration | AgentConfig.js | Checkboxes and numeric settings; save/reset |
| HDFS Log Anomaly Detection | ForensicReadyCICDAgents.js | Train / Run Detection / Refresh Results; list anomalies |
| Hadoop Log Failure Classification | ForensicReadyCICDAgents.js | Train / Run Detection / Refresh Results; list detections; shows Accuracy/F1/Precision/Recall |
Buttons and what they do
| Button / Control | Where | Backend endpoint(s) | Behavior |
|---|---|---|---|
| Start Collection | EvidenceCollector.js → onStartCollection | POST /api/ml/cicd-agents/collect | Triggers dataset-based evidence creation on the backend (uses trained HDFS/Hadoop models if present). Then refreshes evidence list. |
| Save Configuration | AgentConfig.js | POST /api/ml/cicd-agents/config | Persists config to backend state. |
| Reset to Defaults | AgentConfig.js | POST /api/ml/cicd-agents/config/reset | Resets config on backend to defaults. |
| Train Model (HDFS) | ForensicReadyCICDAgents.js | POST /api/ml/cicd-agents/hdfs/train | Trains the HDFS TF-IDF + IsolationForest model and persists hdfs_log_anomaly_model.pkl. |
| Run Detection (HDFS) | ForensicReadyCICDAgents.js | POST /api/ml/cicd-agents/hdfs/infer-dataset | Runs detection on log dataset and stores latest anomalies retrievable via /latest. |
| Refresh Results (HDFS) | ForensicReadyCICDAgents.js | GET /api/ml/cicd-agents/hdfs/latest | Loads the latest stored anomalies. |
| Train Model (Hadoop) | ForensicReadyCICDAgents.js | POST /api/ml/cicd-agents/hadoop/train | Trains the supervised Hadoop log classifier and persists hadoop_log_failure_model.pkl. |
| Run Detection (Hadoop) | ForensicReadyCICDAgents.js | POST /api/ml/cicd-agents/hadoop/infer-dataset | Runs classification on dataset and stores latest “failures” as detections. |
| Refresh Results (Hadoop) | ForensicReadyCICDAgents.js | GET /api/ml/cicd-agents/hadoop/latest | Loads the latest stored detections. |
Request/Response details
CI/CD Agents APIs
expand
Evidence collection
POST /api/ml/cicd-agents/collect
Body:
{
"mode": "dataset"
}
Response:
{
"success": true,
"added": 2
}
The new evidence items are stored in backend/data/cicd_state.json and later consumed by the Reporting Portal timeline.
Evidence list
GET /api/ml/cicd-agents/evidence?limit=200
Response:
{
"success": true,
"evidence": [
{
"timestamp": "2025-12-13T02:32:20Z",
"type": "build-logs",
"description": "Hadoop job failure predicted",
"metadata": {
"app_id": "application_...",
"confidence": 80.32,
"label": 1
}
}
]
}
Model persistence and reports
| Model | Saved file | Reports directory |
|---|---|---|
| HDFS Log Anomaly | backend/ml_models/cicd_agents/hdfs_log_anomaly_model.pkl | backend/ml_models/cicd_agents/reports/hdfs_log_anomaly/ |
| Hadoop Log Failure | backend/ml_models/cicd_agents/hadoop_log_failure_model.pkl | backend/ml_models/cicd_agents/reports/hadoop_log_failure/ |
Agent Configuration (the “tick” checkboxes) explained
what each toggle does
expand
These checkboxes are not placeholders. They are a real config object stored on the backend and used to control what evidence categories can be collected/recorded.
GET /api/ml/cicd-agents/config
Response:
{
"success": true,
"config": {
"capture_git_diffs": true,
"capture_build_logs": true,
"capture_env_vars": true,
"capture_secrets_access": true,
"capture_artifacts": true,
"capture_pipeline_configs": true,
"auto_collect": false,
"collection_interval_seconds": 30,
"max_evidence_size_mb": 100,
"encrypt_evidence": false
}
}
POST /api/ml/cicd-agents/config
Body:
{
"config": {
"auto_collect": true,
"collection_interval_seconds": 30
}
}
Response:
{
"success": true,
"config": { "...merged_config" : true }
}
| UI control | Config field | Effect |
|---|---|---|
| Capture Git Commit Diffs | capture_git_diffs | Allows evidence items of type git-diffs to be collected/stored. |
| Capture Build Logs | capture_build_logs | Allows evidence items of type build-logs to be collected/stored. |
| Capture Environment Variables | capture_env_vars | Allows evidence items of type env-vars to be collected/stored. |
| Capture Secrets Access Events | capture_secrets_access | Allows evidence items of type secrets-access to be collected/stored. |
| Capture Build Artifacts | capture_artifacts | Allows evidence items of type artifacts to be collected/stored. |
| Capture Pipeline Config Files | capture_pipeline_configs | Allows evidence items of type pipeline-config to be collected/stored. |
| Enable Automatic Collection | auto_collect | Backend will treat this as an enable flag for scheduled collection behavior. |
| Collection Interval (seconds) | collection_interval_seconds | Interval target for automatic collection scheduling. |
| Max Evidence Size (MB) | max_evidence_size_mb | Safety limit used to cap evidence payload sizes when storing. |
| Enable Encryption for Stored Evidence | encrypt_evidence | Controls whether stored evidence should be encrypted at rest. |
How Training Works (Step-by-Step)
HDFS Log Anomaly (Unsupervised)
IsolationForest
Implementation: backend/ml_models/cicd_agents/hdfs_log_anomaly_detector.py
- Data Loading: Reads
hdfs_log/hdfs.log/sorted.log. - Parsing: Extracts event templates and groups logs by Block ID.
- Vectorization: Converts event sequences into numerical vectors using TF-IDF.
- Model Fitting: Trains an IsolationForest model (unsupervised) to identify outliers in the vector space.
- Persistence: Saves the model to
hdfs_log_anomaly_model.pkl.
Hadoop Log Failure (Supervised)
LogisticRegression
Implementation: backend/ml_models/cicd_agents/hadoop_log_failure_classifier.py
- Data Loading: Reads log files and the ground truth labels from
abnormal_label.txt. - Feature Extraction: Maps application log patterns to known normal/failure sequences.
- Vectorization: Applies TF-IDF to the sequence of log events.
- Model Fitting: Trains a LogisticRegression classifier to distinguish between "Normal" and "Anomaly" classes.
- Persistence: Saves the model to
hadoop_log_failure_model.pkland generates classification metrics (F1, Precision, Recall).
Data sources
| Dataset / File | Used by | Purpose |
|---|---|---|
| backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/hdfs_log/hdfs.log/sorted.log | HDFS model | Unsupervised anomaly detection on HDFS logs. |
| backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/Hadoop_log/Hadoop_log/abnormal_label.txt | Hadoop model | Supervised labels for Hadoop application IDs. |
Advanced / Internal Endpoints
| Endpoint | Purpose |
|---|---|
| POST /api/ml/cicd-agents/detect | Runs pattern-based detection on raw evidence payloads (internal helper). |
5) Component 2 — Automated Forensic Anomaly Detection Engine
Frontend folder: src/components/AnomalyDetection/
Tabs
| Tab | File | Purpose |
|---|---|---|
| Anomaly Detector | AnomalyDetection.js + AnomalyDetector.js | Run ML inference; filter anomalies by category/severity; show totals and accuracy. |
| Timeline Reconstruction | TimelineReconstruction.js | Groups anomalies by incident and builds a timeline view. |
| Training Reports | TrainingReports.js | Shows training metrics and graphs served from backend reports directory. |
Buttons and what they do
| Button / Control | Where | Backend endpoint(s) | Behavior |
|---|---|---|---|
| Train Models | AnomalyDetection.js | POST /api/ml/anomaly-detection/train | Trains commit pattern + dependency anomaly models; generates report images; persists .pkl models. |
| Run Detection | AnomalyDetector.js (button), handled in AnomalyDetection.js | POST /api/ml/anomaly-detection/infer | Runs inference on dataset and returns anomalies. UI maps confidence → severity. |
| Refresh Results | AnomalyDetection.js | GET /api/ml/anomaly-detection/latest | Loads stored “latest” anomalies for selected model. |
| Commit Patterns / Dependency Anomalies (model switch) | AnomalyDetection.js | Affects which model key is used in requests | Switches model parameter: commit_pattern or dependency_anomaly. |
| Reconstruct Timelines | TimelineReconstruction.js | No backend call | Pure UI grouping of already-loaded anomalies into incident timelines. |
Filters (Category / Severity) explained
UI-only filtering
expand
Category and Severity filters do not call the backend. They filter the already-loaded anomalies list.
| Filter | What it matches | Where it is applied |
|---|---|---|
| Category | Values like commit-patterns, pipeline-tampering, dependency-anomalies | AnomalyDetector.js |
| Severity | Derived from confidence: critical/high/medium/low | AnomalyDetection.js |
How “Run Detection” works (end-to-end)
mapping + severity
expand
The backend returns anomalies with confidence. The UI maps confidence to severity: critical ≥ 90, high ≥ 80, medium ≥ 70, else low.
POST /api/ml/anomaly-detection/infer
Body:
{
"model": "commit_pattern",
"limit": 500
}
Response:
{
"success": true,
"anomalies": [
{
"id": "commit_0",
"category": "commit-patterns",
"confidence": 94.15,
"details": { "is_off_hours": 1, "files_changed": 0 }
}
]
}
Stored results can be fetched via GET /api/ml/anomaly-detection/latest?model=commit_pattern.
How Training Works (Step-by-Step)
Commit Patterns (Unsupervised)
IsolationForest
Implementation: backend/ml_models/anomaly_detection/commit_pattern_analyzer.py
- Data Loading: Reads the NSL-KDD dataset (mapped to commit features like
files_changed,lines_added,is_off_hours). - Preprocessing: Scales features using StandardScaler to normalize ranges.
- Model Fitting: Trains an IsolationForest to learn the baseline of "normal" commit behavior.
- Persistence: Saves the model to
commit_pattern_model.pkland generates a confusion matrix using a holdout test set.
Dependency Anomalies (Unsupervised)
IsolationForest
Implementation: backend/ml_models/anomaly_detection/dependency_anomaly_detector.py
- Data Loading: Reads the UNSW-NB15 dataset (mapped to dependency graph features like
depth,dev_dependency_count). - Preprocessing: Scales features using StandardScaler.
- Model Fitting: Trains an IsolationForest to detect unusual dependency structures (e.g., extremely deep trees or massive dev-dependency bloat).
- Persistence: Saves the model to
dependency_anomaly_model.pkland generates a correlation heatmap.
Reports/graphs displayed
Training graphs are requested as images from the backend. Example: GET /api/ml/anomaly-detection/reports/commit_confusion_matrix.png
Data sources
| Dataset | Used by | Purpose |
|---|---|---|
| NSL-KDD (Network Intrusion) | Commit Pattern Analyzer | Provides baseline patterns for anomaly detection (mapped to commit features). |
| UNSW-NB15 (Network Anomaly) | Dependency Anomaly Detector | Training set for detecting anomalous dependency structures. |
Advanced / Internal Endpoints
| Endpoint | Purpose |
|---|---|
| POST /api/ml/anomaly-detection/commit-patterns | Internal inference for commit anomalies (used by general infer route). |
| POST /api/ml/anomaly-detection/pipeline-tampering | Detects unauthorized modifications to pipeline configurations (YAML/JSON). |
| POST /api/ml/anomaly-detection/dependency-anomalies | Internal inference for dependency structure anomalies. |
6) Component 3 — Integrity Verification (SBOM / Hash / Sign / Merkle / Malware)
Frontend folder: src/components/IntegrityVerification/
Tabs
| Tab | File | What it does |
|---|---|---|
| SBOM Generator | SBOMGenerator.js | Computes SHA-256 for SBOM JSON using backend. |
| Hash Verification | HashVerification.js | Compute SHA-256; verify expected hash. |
| Digital Signatures | DigitalSignatures.js | Ed25519 keypair generation, signing, signature verification. |
| Merkle Tree | MerkleTree.js | Computes Merkle root from provided leaf hashes. |
| Malware Classification | MicrosoftMalware.js | Trains and infers a malware family classifier from dataset; shows metrics and predictions. |
Buttons and what they do
| Button | Where | Backend endpoint(s) | Behavior |
|---|---|---|---|
| Compute Hash (SBOM) | SBOMGenerator.js | POST /api/ml/integrity-verification/sbom/hash | Returns SBOM SHA-256 and dependency count (when derivable). |
| Compute Hash (content) | HashVerification.js | POST /api/ml/integrity-verification/hash | Returns SHA-256 hash for the provided content. |
| Verify Hash | HashVerification.js | POST /api/ml/integrity-verification/hash/verify | Returns match/actual hash result. |
| Generate Keypair | DigitalSignatures.js | POST /api/ml/integrity-verification/keys/ed25519 | Returns Ed25519 public/private PEM strings. |
| Sign | DigitalSignatures.js | POST /api/ml/integrity-verification/sign/ed25519 | Returns signature (base64) for the payload using the private key. |
| Verify (signature) | DigitalSignatures.js | POST /api/ml/integrity-verification/verify/ed25519 | Returns valid true/false for payload+signature under public key. |
| Compute Root | MerkleTree.js | POST /api/ml/integrity-verification/merkle/root | Computes Merkle root and leaf count. |
| Train Model (malware) | MicrosoftMalware.js | POST /api/ml/integrity-verification/microsoft-malware/train | Trains malware classifier; persists microsoft_malware_model.pkl and writes reports/graphs. |
| Run Inference (malware) | MicrosoftMalware.js | POST /api/ml/integrity-verification/microsoft-malware/infer-dataset | Runs inference on dataset subset and stores latest predictions. |
| Refresh (malware) | MicrosoftMalware.js | GET /api/ml/integrity-verification/microsoft-malware/latest | Reloads latest stored predictions. |
Ed25519 signature workflow (exact sequence)
keys → sign → verify
expand
POST /api/ml/integrity-verification/keys/ed25519
Response:
{
"success": true,
"private_key_pem": "-----BEGIN PRIVATE KEY----- ...",
"public_key_pem": "-----BEGIN PUBLIC KEY----- ..."
}
POST /api/ml/integrity-verification/sign/ed25519
Body:
{
"payload": "hello",
"private_key_pem": "-----BEGIN PRIVATE KEY----- ..."
}
Response:
{
"success": true,
"signature_b64": "..."
}
POST /api/ml/integrity-verification/verify/ed25519
Body:
{
"payload": "hello",
"public_key_pem": "-----BEGIN PUBLIC KEY----- ...",
"signature_b64": "..."
}
Response:
{
"success": true,
"valid": true
}
How Training Works (Step-by-Step)
Microsoft Malware Classification (Supervised)
SGDClassifier
Implementation: backend/ml_models/integrity_verification/microsoft_malware_classifier.py
- Data Loading: Reads
Microsoft Malware Classification Challenge/data.csv. - Feature Selection: Selects relevant numerical columns (e.g., resource sizes, section counts) and the class label.
- Preprocessing: Scales features using StandardScaler.
- Model Fitting: Trains an SGDClassifier (Stochastic Gradient Descent) optimized for large datasets to classify malware families.
- Persistence: Saves the model to
microsoft_malware_model.pkland generates accuracy curves and confusion matrices.
Advanced / Internal Endpoints
| Endpoint | Purpose |
|---|---|
| POST /api/ml/integrity-verification/detect-tampering | Checks artifacts for integrity violations against known signatures/hashes. |
7) Component 4 — Forensic Chain-of-Custody & Reporting Portal
Frontend folder: src/components/ForensicReportingPortal/
This component consumes real evidence from the backend (CI/CD evidence state + anomaly latest results), constructs a timeline, provides custody logs, verifies integrity proofs via the hash API, generates reports, and includes an ML Models sub-tab using the Reporting Portal datasets.
Tabs
| Tab | Backend data source | How it works |
|---|---|---|
| Evidence Visualization | GET /api/ml/reporting-portal/forensic-data | Shows evidence timeline with filters (type + timeframe) and metadata. |
| Integrity Proofs | GET /api/ml/reporting-portal/forensic-data | Shows proof list derived from timeline; verify button calls hash verify API. |
| Chain of Custody | GET /api/ml/reporting-portal/custody | Displays custody logs with expandable history + checksum. |
| Report Generator | POST /api/ml/reporting-portal/report/generate | Generates a report payload; export buttons download JSON to disk. |
| ML Models | /api/ml/reporting-portal/* | Train/infer the Reporting Portal classifiers and display their report graphs. |
Buttons and what they do
| Button / Control | Where | Backend endpoint(s) | Behavior |
|---|---|---|---|
| Verify (integrity proof) | IntegrityProofs.js | POST /api/ml/integrity-verification/hash/verify | Verifies that the backend recomputed SHA-256 of the stored proof payload matches the expected hash. |
| Generate Report | ReportGenerator.js | POST /api/ml/reporting-portal/report/generate | Creates a report object containing sections, evidence count, timeline, integrity proofs, custody logs. |
| Export as PDF/JSON/XML | ReportGenerator.js | No backend call | Downloads the generated report as a JSON blob (filename extension varies). |
| Train Model (Reporting ML) | ReportingPortalModels.js | POST /api/ml/reporting-portal/train | Trains either attack_type or breach_type model; updates training_state and reports list. |
| Infer Dataset (Reporting ML) | ReportingPortalModels.js | POST /api/ml/reporting-portal/infer-dataset | Runs inference on a dataset subset and stores latest predictions for that model. |
| Infer Sample (Reporting ML) | ReportingPortalModels.js | POST /api/ml/reporting-portal/infer-sample | Predicts a single record provided as JSON in the UI textarea. |
Reporting ML: “Infer Sample” JSON format
required fields
expand
The UI textarea sends your JSON object directly to the backend. Use the dataset column names as keys.
Attack Type (from Attack_Dataset.csv)
POST /api/ml/reporting-portal/infer-sample
Body:
{
"model": "attack_type",
"sample": {
"Attack Name": "Phishing",
"Description": "Credential harvesting email campaign targeting employees"
}
}
Response:
{
"success": true,
"prediction": "Phishing",
"confidence": 0.81
}
Breach Type (from Cyber Security Breaches.csv)
POST /api/ml/reporting-portal/infer-sample
Body:
{
"model": "breach_type",
"sample": {
"Organization": "Example Corp",
"Type_of_Breach": "Hacking",
"Summary": "Unauthorized access and data exfiltration detected",
"Records_Lost": 120000
}
}
For breach_type inference, Type_of_Breach is the training label; for real-time prediction, you can omit it (backend will treat it as unknown and predict it).
Where the Reporting Portal data comes from
real evidence
expand
GET /api/ml/reporting-portal/forensic-data composes:
| Source | Backend storage | Used to render |
|---|---|---|
| CI/CD evidence items | backend/data/cicd_state.json | Timeline events + custody logs |
| Anomaly Detection “latest” results | backend memory (latest_results) | Additional timeline anomalies |
| Integrity proofs | Derived per timeline item | Integrity Proofs tab |
How Training Works (Step-by-Step)
Attack Type Classification (Supervised)
LogisticRegression + TF-IDF
Implementation: backend/ml_models/reporting_portal/attack_type_classifier.py
- Data Loading: Reads
Attack_Dataset.csv. - Text Construction: Concatenates text columns (Title, Description, Impact, etc.) into a single "feature text" string per row.
- Vectorization: Applies TF-IDF (up to 12,000 features, 1-2 n-grams) to convert text to vectors.
- Model Fitting: Trains a LogisticRegression model to classify the specific
Attack Type. - Persistence: Saves the model pipeline to
attack_type_model.pkl.
Breach Type Classification (Supervised)
ColumnTransformer (Mixed Data)
Implementation: backend/ml_models/reporting_portal/breach_type_classifier.py
- Data Loading: Reads
Cyber Security Breaches.csv. - Feature Engineering: Uses a ColumnTransformer to handle mixed data types:
- Text: TF-IDF on the
Summarycolumn. - Categorical: One-Hot Encoding on
State, etc. - Numerical: Median imputation on
Individuals_Affected, year, etc.
- Text: TF-IDF on the
- Model Fitting: Trains a LogisticRegression classifier on the combined feature set to predict
Type_of_Breach. - Persistence: Saves the complex pipeline to
breach_type_model.pkl.
Reporting Portal ML datasets
| Dataset file | Model | Label column | Reports directory |
|---|---|---|---|
| backend/ml_models/reporting_portal/Attack_Dataset.csv | attack_type | Attack Type | backend/ml_models/reporting_portal/reports/attack_type/ |
| backend/ml_models/reporting_portal/Cyber Security Breaches.csv | breach_type | Type_of_Breach | backend/ml_models/reporting_portal/reports/breach_type/ |
Advanced / Internal Endpoints
| Endpoint | Purpose |
|---|---|
| POST /api/ml/reporting-portal/correlate | Correlates discrete evidence items into a unified attack timeline. |
8) API Index (Endpoints)
All endpoints are served by Flask in backend/api/ml_api.py.
GET /api/ml/health
GET /api/ml/cicd-agents/status
GET /api/ml/cicd-agents/evidence?limit=200
POST /api/ml/cicd-agents/collect
GET /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config/reset
POST /api/ml/cicd-agents/register
GET /api/ml/cicd-agents/hdfs/status
POST /api/ml/cicd-agents/hdfs/train
POST /api/ml/cicd-agents/hdfs/infer-dataset
GET /api/ml/cicd-agents/hdfs/latest
GET /api/ml/cicd-agents/hdfs/reports/<filename>
GET /api/ml/cicd-agents/hadoop/status
POST /api/ml/cicd-agents/hadoop/train
POST /api/ml/cicd-agents/hadoop/infer-dataset
GET /api/ml/cicd-agents/hadoop/latest
GET /api/ml/cicd-agents/hadoop/reports/<filename>
GET /api/ml/anomaly-detection/status
POST /api/ml/anomaly-detection/train
POST /api/ml/anomaly-detection/infer
POST /api/ml/anomaly-detection/infer-sample
GET /api/ml/anomaly-detection/latest?model=commit_pattern
GET /api/ml/anomaly-detection/reports/<filename>
POST /api/ml/integrity-verification/hash
POST /api/ml/integrity-verification/hash/verify
POST /api/ml/integrity-verification/sbom/hash
POST /api/ml/integrity-verification/merkle/root
POST /api/ml/integrity-verification/keys/ed25519
POST /api/ml/integrity-verification/sign/ed25519
POST /api/ml/integrity-verification/verify/ed25519
GET /api/ml/integrity-verification/microsoft-malware/status
POST /api/ml/integrity-verification/microsoft-malware/train
POST /api/ml/integrity-verification/microsoft-malware/infer-dataset
GET /api/ml/integrity-verification/microsoft-malware/latest
GET /api/ml/integrity-verification/microsoft-malware/reports/<filename>
GET /api/ml/reporting-portal/status
POST /api/ml/reporting-portal/train
POST /api/ml/reporting-portal/infer-sample
POST /api/ml/reporting-portal/infer-dataset
GET /api/ml/reporting-portal/latest?model=attack_type
GET /api/ml/reporting-portal/reports/<model>/<filename>
GET /api/ml/reporting-portal/forensic-data
GET /api/ml/reporting-portal/custody
POST /api/ml/reporting-portal/report/generate
9) PM2 Runtime
PM2 config is backend/ecosystem.config.js. The ML API process: forensic-ml-api runs backend/api/ml_api.py using the venv interpreter backend/venv/bin/python3, port 5000.
cd /var/www/forensic-portal/backend
pm2 start ecosystem.config.js
pm2 restart ecosystem.config.js --update-env
pm2 logs forensic-ml-api
pm2 save
10) curl Test Commands (Full Catalog)
Set BASE once and run any command. These are copy-pasteable smoke tests for every model and non-ML integrity function. All API routes are under /api/ml/.
BASE=https://monitoringsystem.space
HJSON='Content-Type: application/json'
Health
# Check if ML API is reachable; returns {"success": true}
curl -sS $BASE/api/ml/health
CI/CD Agents
# Get total agents and their statuses
curl -sS $BASE/api/ml/cicd-agents/status
# List recently collected evidence items (e.g., git-diffs, logs)
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=50
# View current agent configuration (what to capture)
curl -sS $BASE/api/ml/cicd-agents/config
# Register a new agent (simulated registration)
curl -sS -X POST $BASE/api/ml/cicd-agents/register -H "$HJSON" -d '{"name":"curl-agent-1","type":"Jenkins"}'
# Update configuration (disable auto_collect, set interval)
curl -sS -X POST $BASE/api/ml/cicd-agents/config -H "$HJSON" -d '{"config":{"auto_collect":false,"collection_interval_seconds":30}}'
# Reset configuration to defaults
curl -sS -X POST $BASE/api/ml/cicd-agents/config/reset -H "$HJSON" -d '{}'
# Trigger manual evidence collection from dataset (creates evidence items)
curl -sS -X POST $BASE/api/ml/cicd-agents/collect -H "$HJSON" -d '{"mode":"dataset"}'
# Verify new evidence was added
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=20
CI/CD Agents — HDFS Log Anomaly Detection
# Check training status of HDFS model
curl -sS $BASE/api/ml/cicd-agents/hdfs/status
# Train HDFS model (unsupervised) on 20k log lines
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/train -H "$HJSON" -d '{"limit":20000}'
# Verify status updates to "trained"
curl -sS $BASE/api/ml/cicd-agents/hdfs/status
# Run inference on dataset (limit 5000 lines) and store results
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/infer-dataset -H "$HJSON" -d '{"limit":5000,"store":true}'
# Retrieve the stored inference results (anomalies)
curl -sS $BASE/api/ml/cicd-agents/hdfs/latest
# Download generated graphs and training summary
curl -sS -o hdfs_dataset_matrix.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_dataset_matrix.png
curl -sS -o hdfs_score_distribution.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_score_distribution.png
curl -sS -o hdfs_training_summary.json $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_training_summary.json
CI/CD Agents — Hadoop Log Failure Classification
# Check training status of Hadoop model
curl -sS $BASE/api/ml/cicd-agents/hadoop/status
# Train Hadoop supervised model on 12k log entries
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/train -H "$HJSON" -d '{"limit":12000}'
# Verify status is "trained"
curl -sS $BASE/api/ml/cicd-agents/hadoop/status
# Run classification on dataset and store failures
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/infer-dataset -H "$HJSON" -d '{"limit":3000,"store":true}'
# Retrieve latest detected failures
curl -sS $BASE/api/ml/cicd-agents/hadoop/latest
# Download classification metrics and confusion matrix
curl -sS -o hadoop_confusion_matrix.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_confusion_matrix.png
curl -sS -o hadoop_metrics.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_metrics.png
curl -sS -o hadoop_training_summary.json $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_training_summary.json
Anomaly Detection Engine
# Check status of both anomaly models
curl -sS $BASE/api/ml/anomaly-detection/status
# Train both Commit Pattern and Dependency Anomaly models
curl -sS -X POST $BASE/api/ml/anomaly-detection/train -H "$HJSON" -d '{"models":["commit_pattern","dependency_anomaly"],"limit":2000}'
# Verify both are trained
curl -sS $BASE/api/ml/anomaly-detection/status
# Run inference for Commit Patterns and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"commit_pattern","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=commit_pattern
# Run inference for Dependency Anomalies and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"dependency_anomaly","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=dependency_anomaly
# Infer a single sample (manual JSON input)
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer-sample -H "$HJSON" -d '{"model":"commit_pattern","sample":{"commit_message_length":240,"files_changed":12,"lines_added":900,"lines_removed":20,"is_off_hours":1,"is_weekend":0,"timestamp_hour":2,"day_of_week":6,"time_since_last_commit":3600,"author_commit_count":1}}'
# Download generated anomaly reports
curl -sS -o commit_confusion_matrix.png $BASE/api/ml/anomaly-detection/reports/commit_confusion_matrix.png
curl -sS -o commit_metrics.png $BASE/api/ml/anomaly-detection/reports/commit_metrics.png
curl -sS -o dependency_heatmap.png $BASE/api/ml/anomaly-detection/reports/dependency_correlation_heatmap.png
curl -sS -o training_summary.json $BASE/api/ml/anomaly-detection/reports/anomaly_training_summary.json
Integrity Verification (Hash / SBOM / Merkle / Ed25519)
# Compute SHA-256 hash of a string
curl -sS -X POST $BASE/api/ml/integrity-verification/hash -H "$HJSON" -d '{"content":"hello"}'
# Verify content matches an expected hash
curl -sS -X POST $BASE/api/ml/integrity-verification/hash/verify -H "$HJSON" -d '{"content":"hello","expected_hash":"deadbeef"}'
# Compute hash of an SBOM JSON structure
curl -sS -X POST $BASE/api/ml/integrity-verification/sbom/hash -H "$HJSON" -d '{"sbom_json":"{\"name\":\"demo\",\"dependencies\":[{\"name\":\"a\",\"version\":\"1.0.0\"}]}"}'
# Compute Merkle Root for a list of leaf hashes
curl -sS -X POST $BASE/api/ml/integrity-verification/merkle/root -H "$HJSON" -d '{"hashes":["a3f1","b2c9","c0d1"]}'
# Ed25519: Generate Keypair, Sign, and Verify
KEYS=$(curl -sS -X POST $BASE/api/ml/integrity-verification/keys/ed25519 -H "$HJSON")
PRIV=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["private_key_pem"])')
PUB=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["public_key_pem"])')
SIG=$(curl -sS -X POST $BASE/api/ml/integrity-verification/sign/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"private_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PRIV")}" | python3 -c 'import sys,json;print(json.load(sys.stdin)["signature_b64"])')
curl -sS -X POST $BASE/api/ml/integrity-verification/verify/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"public_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PUB"),\"signature_b64\":\"$SIG\"}"
Integrity Verification — Microsoft Malware Classification
# Check status of malware model
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status
# Train malware classifier (supervised SGD)
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/train -H "$HJSON" -d '{"limit":30000}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status
# Run inference on dataset and store results
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/infer-dataset -H "$HJSON" -d '{"limit":500,"store":true}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/latest
# Download malware classification reports
curl -sS -o microsoft_malware_confusion_matrix.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_confusion_matrix.png
curl -sS -o microsoft_malware_metrics.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_metrics.png
curl -sS -o microsoft_malware_accuracy_curve.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_accuracy_curve.png
curl -sS -o microsoft_malware_training_summary.json $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_training_summary.json
Reporting Portal (Evidence/Custody/Report)
# Fetch timeline evidence (from CI/CD + Anomalies)
curl -sS $BASE/api/ml/reporting-portal/forensic-data
# Fetch chain of custody logs
curl -sS $BASE/api/ml/reporting-portal/custody
# Generate a full forensic report JSON
curl -sS -X POST $BASE/api/ml/reporting-portal/report/generate -H "$HJSON" -d '{"format":"json"}'
Reporting Portal — ML Models (Attack Type / Breach Type)
# Check status of reporting models
curl -sS $BASE/api/ml/reporting-portal/status
# Train Attack Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["attack_type"],"limit_attack":6000}'
# Train Breach Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["breach_type"]}'
# Infer Attack Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"attack_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=attack_type
# Infer Breach Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"breach_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=breach_type
# Infer single sample: Attack Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"attack_type","sample":{"Attack Name":"Malware","Description":"Ransomware infection spreading laterally in network"}}'
# Infer single sample: Breach Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"breach_type","sample":{"Organization":"Example Corp","Summary":"Unauthorized access and data exfiltration detected","Records_Lost":120000}}'
# Download reporting model graphs
curl -sS -o attack_type_metrics.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_metrics.png
curl -sS -o attack_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_confusion_matrix.png
curl -sS -o attack_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_learning_curve.png
curl -sS -o breach_type_metrics.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_metrics.png
curl -sS -o breach_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_confusion_matrix.png
curl -sS -o breach_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_learning_curve.png
11) Troubleshooting
Usually means Nginx isn’t proxying /api/ml/ to the Flask backend.
curl -i https://your-host/api/ml/health
Backend process not running, wrong port, or PM2 stopped. Check PM2 logs.
cd /var/www/forensic-portal/backend
pm2 status
pm2 logs forensic-ml-api
Training took longer than proxy timeout. Increase proxy timeouts for /api/ml/ or use lower limits.
Inference endpoints require a trained model. Train first, then infer, then fetch /latest.