Forensic Portal — Operator Handbook

1) Architecture & Data Flow

The system is a React dashboard talking to a Flask ML API over /api/ml/. Models train, persist to .pkl, and generate graphs under reports/.

Frontend

/var/www/forensic-portal/src

React SPA with a tabbed dashboard in src/App.js. Each tab renders one of the four component folders under src/components/.

Backend

/var/www/forensic-portal/backend

Flask ML API in backend/api/ml_api.py. Models live under backend/ml_models/, with persisted .pkl models and generated graphs in reports/.

Frontend → Backend connection

All frontend network calls go through src/api/mlApi.js which fetches JSON from /api/ml/... on the same origin (Nginx reverse proxy).

Evidence pipeline

CI/CD evidence is stored in backend/data/cicd_state.json. The Reporting Portal builds its timeline + custody logs from that stored evidence and from the anomaly-detection latest results.

Request lifecycle (from a button click) UI → mlApi.js → Flask → Model → UI

expand

Step	What happens	Where in code
1	User clicks a UI button (or changes a tab/filter)	src/components/**
2	Component calls a function from the API client	src/api/mlApi.js
3	Browser sends HTTP request to /api/ml/...	fetch()
4	Nginx proxies to Flask on port 5000	/etc/nginx/sites-enabled/*
5	Flask route validates input, loads model, trains/infer, persists state	backend/api/ml_api.py
6	JSON response returned; UI updates state and re-renders	React useState/useEffect

1.1) Project File Structure

Physical layout of the monorepo mapping code to components.

/var/www/forensic-portal/
├── backend/
│   ├── api/ml_api.py              # Main Flask entry point (All Routes)
│   ├── data/cicd_state.json       # JSON Database for Evidence & Logs
│   ├── ecosystem.config.js        # PM2 process config
│   └── ml_models/                 # Logic per domain
│       ├── anomaly_detection/     # Commit patterns, dependency checks, pipeline tampering
│       ├── cicd_agents/           # HDFS/Hadoop log analysis models
│       ├── integrity_verification/# Malware classifier, Ed25519, Merkle, hashing
│       └── reporting_portal/      # Attack/Breach classifiers & report generation
└── src/
    ├── api/mlApi.js               # Frontend-Backend bridge (fetch wrapper)
    ├── App.js                     # Main React layout & Tab Controller
    └── components/                # UI Views matching backend domains
        ├── AnomalyDetection/      # Graphs, Timelines, Training UI
        ├── ForensicReadyCICDAgents/ # Agent Status, Evidence Collection UI
        ├── ForensicReportingPortal/ # Final Reports, Chain of Custody UI
        └── IntegrityVerification/ # Crypto tools & Malware Verification UI

2) Flow Charts

These diagrams are “shapes” (SVG) so they render everywhere (no external assets). Use them as a mental model for how every button maps to API calls, persisted artifacts, and UI rendering.

Overall system flow

Model lifecycle (generic train → infer → reports)

3) Dashboard Layout (Tabs)

The top-level tabs are implemented in src/App.js. Each tab renders a component:

Dashboard Tab	Component File	What it shows
CI/CD Agents	src/components/ForensicReadyCICDAgents/ForensicReadyCICDAgents.js	Agent status, evidence collection, agent configuration, HDFS anomaly model, Hadoop failure model
Anomaly Detection	src/components/AnomalyDetection/AnomalyDetection.js	ML detection for commit/dependency anomalies, timeline reconstruction, training reports/graphs
Integrity Verification	src/components/IntegrityVerification/IntegrityVerification.js	SBOM hashing, hash verification, Ed25519 signatures, Merkle root, malware model train/infer
Reporting Portal	src/components/ForensicReportingPortal/ForensicReportingPortal.js	Evidence visualization, integrity proofs, chain-of-custody logs, report generator, reporting ML models

The header status indicator (“Connected/Disconnected”) calls GET /api/ml/health.

Connected/Disconnected logic health check

expand

Source: src/App.js calls getMlHealth() once on load. If the request succeeds and returns {"success": true}, the UI shows Connected, otherwise it shows Disconnected.

GET /api/ml/health
Response:
{
  "success": true
}

4) Component 1 — Forensic-Ready CI/CD Agents

Frontend folder: src/components/ForensicReadyCICDAgents/

This tab is designed to: show connected CI/CD agents, collect evidence from datasets/models, let you set evidence capture configuration, and run two ML models (HDFS anomaly, Hadoop failure classification).

UI sections

Section	File	What it contains
Agent Status	AgentStatus.js	List of agents + metrics (total/active/performance overhead)
Evidence Collection	EvidenceCollector.js	Filters evidence items by type and triggers collection
Agent Configuration	AgentConfig.js	Checkboxes and numeric settings; save/reset
HDFS Log Anomaly Detection	ForensicReadyCICDAgents.js	Train / Run Detection / Refresh Results; list anomalies
Hadoop Log Failure Classification	ForensicReadyCICDAgents.js	Train / Run Detection / Refresh Results; list detections; shows Accuracy/F1/Precision/Recall

Buttons and what they do

Button / Control	Where	Backend endpoint(s)	Behavior
Start Collection	EvidenceCollector.js → onStartCollection	POST /api/ml/cicd-agents/collect	Triggers dataset-based evidence creation on the backend (uses trained HDFS/Hadoop models if present). Then refreshes evidence list.
Save Configuration	AgentConfig.js	POST /api/ml/cicd-agents/config	Persists config to backend state.
Reset to Defaults	AgentConfig.js	POST /api/ml/cicd-agents/config/reset	Resets config on backend to defaults.
Train Model (HDFS)	ForensicReadyCICDAgents.js	POST /api/ml/cicd-agents/hdfs/train	Trains the HDFS TF-IDF + IsolationForest model and persists hdfs_log_anomaly_model.pkl.
Run Detection (HDFS)	ForensicReadyCICDAgents.js	POST /api/ml/cicd-agents/hdfs/infer-dataset	Runs detection on log dataset and stores latest anomalies retrievable via /latest.
Refresh Results (HDFS)	ForensicReadyCICDAgents.js	GET /api/ml/cicd-agents/hdfs/latest	Loads the latest stored anomalies.
Train Model (Hadoop)	ForensicReadyCICDAgents.js	POST /api/ml/cicd-agents/hadoop/train	Trains the supervised Hadoop log classifier and persists hadoop_log_failure_model.pkl.
Run Detection (Hadoop)	ForensicReadyCICDAgents.js	POST /api/ml/cicd-agents/hadoop/infer-dataset	Runs classification on dataset and stores latest “failures” as detections.
Refresh Results (Hadoop)	ForensicReadyCICDAgents.js	GET /api/ml/cicd-agents/hadoop/latest	Loads the latest stored detections.

Request/Response details CI/CD Agents APIs

expand

Evidence collection

POST /api/ml/cicd-agents/collect
Body:
{
  "mode": "dataset"
}
Response:
{
  "success": true,
  "added": 2
}

The new evidence items are stored in backend/data/cicd_state.json and later consumed by the Reporting Portal timeline.

Evidence list

GET /api/ml/cicd-agents/evidence?limit=200
Response:
{
  "success": true,
  "evidence": [
    {
      "timestamp": "2025-12-13T02:32:20Z",
      "type": "build-logs",
      "description": "Hadoop job failure predicted",
      "metadata": {
        "app_id": "application_...",
        "confidence": 80.32,
        "label": 1
      }
    }
  ]
}

Model persistence and reports

Model	Saved file	Reports directory
HDFS Log Anomaly	backend/ml_models/cicd_agents/hdfs_log_anomaly_model.pkl	backend/ml_models/cicd_agents/reports/hdfs_log_anomaly/
Hadoop Log Failure	backend/ml_models/cicd_agents/hadoop_log_failure_model.pkl	backend/ml_models/cicd_agents/reports/hadoop_log_failure/

Agent Configuration (the “tick” checkboxes) explained what each toggle does

expand

These checkboxes are not placeholders. They are a real config object stored on the backend and used to control what evidence categories can be collected/recorded.

GET /api/ml/cicd-agents/config
Response:
{
  "success": true,
  "config": {
    "capture_git_diffs": true,
    "capture_build_logs": true,
    "capture_env_vars": true,
    "capture_secrets_access": true,
    "capture_artifacts": true,
    "capture_pipeline_configs": true,
    "auto_collect": false,
    "collection_interval_seconds": 30,
    "max_evidence_size_mb": 100,
    "encrypt_evidence": false
  }
}

POST /api/ml/cicd-agents/config
Body:
{
  "config": {
    "auto_collect": true,
    "collection_interval_seconds": 30
  }
}
Response:
{
  "success": true,
  "config": { "...merged_config" : true }
}

UI control	Config field	Effect
Capture Git Commit Diffs	capture_git_diffs	Allows evidence items of type git-diffs to be collected/stored.
Capture Build Logs	capture_build_logs	Allows evidence items of type build-logs to be collected/stored.
Capture Environment Variables	capture_env_vars	Allows evidence items of type env-vars to be collected/stored.
Capture Secrets Access Events	capture_secrets_access	Allows evidence items of type secrets-access to be collected/stored.
Capture Build Artifacts	capture_artifacts	Allows evidence items of type artifacts to be collected/stored.
Capture Pipeline Config Files	capture_pipeline_configs	Allows evidence items of type pipeline-config to be collected/stored.
Enable Automatic Collection	auto_collect	Backend will treat this as an enable flag for scheduled collection behavior.
Collection Interval (seconds)	collection_interval_seconds	Interval target for automatic collection scheduling.
Max Evidence Size (MB)	max_evidence_size_mb	Safety limit used to cap evidence payload sizes when storing.
Enable Encryption for Stored Evidence	encrypt_evidence	Controls whether stored evidence should be encrypted at rest.

How Training Works (Step-by-Step)

HDFS Log Anomaly (Unsupervised)

IsolationForest

Implementation: backend/ml_models/cicd_agents/hdfs_log_anomaly_detector.py

Data Loading: Reads hdfs_log/hdfs.log/sorted.log.
Parsing: Extracts event templates and groups logs by Block ID.
Vectorization: Converts event sequences into numerical vectors using TF-IDF.
Model Fitting: Trains an IsolationForest model (unsupervised) to identify outliers in the vector space.
Persistence: Saves the model to hdfs_log_anomaly_model.pkl.

Hadoop Log Failure (Supervised)

LogisticRegression

Implementation: backend/ml_models/cicd_agents/hadoop_log_failure_classifier.py

Data Loading: Reads log files and the ground truth labels from abnormal_label.txt.
Feature Extraction: Maps application log patterns to known normal/failure sequences.
Vectorization: Applies TF-IDF to the sequence of log events.
Model Fitting: Trains a LogisticRegression classifier to distinguish between "Normal" and "Anomaly" classes.
Persistence: Saves the model to hadoop_log_failure_model.pkl and generates classification metrics (F1, Precision, Recall).

Data sources

Dataset / File	Used by	Purpose
backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/hdfs_log/hdfs.log/sorted.log	HDFS model	Unsupervised anomaly detection on HDFS logs.
backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/Hadoop_log/Hadoop_log/abnormal_label.txt	Hadoop model	Supervised labels for Hadoop application IDs.

Advanced / Internal Endpoints

Endpoint	Purpose
POST /api/ml/cicd-agents/detect	Runs pattern-based detection on raw evidence payloads (internal helper).

5) Component 2 — Automated Forensic Anomaly Detection Engine

Frontend folder: src/components/AnomalyDetection/

Tabs

Tab	File	Purpose
Anomaly Detector	AnomalyDetection.js + AnomalyDetector.js	Run ML inference; filter anomalies by category/severity; show totals and accuracy.
Timeline Reconstruction	TimelineReconstruction.js	Groups anomalies by incident and builds a timeline view.
Training Reports	TrainingReports.js	Shows training metrics and graphs served from backend reports directory.

Buttons and what they do

Button / Control	Where	Backend endpoint(s)	Behavior
Train Models	AnomalyDetection.js	POST /api/ml/anomaly-detection/train	Trains commit pattern + dependency anomaly models; generates report images; persists .pkl models.
Run Detection	AnomalyDetector.js (button), handled in AnomalyDetection.js	POST /api/ml/anomaly-detection/infer	Runs inference on dataset and returns anomalies. UI maps confidence → severity.
Refresh Results	AnomalyDetection.js	GET /api/ml/anomaly-detection/latest	Loads stored “latest” anomalies for selected model.
Commit Patterns / Dependency Anomalies (model switch)	AnomalyDetection.js	Affects which model key is used in requests	Switches model parameter: commit_pattern or dependency_anomaly.
Reconstruct Timelines	TimelineReconstruction.js	No backend call	Pure UI grouping of already-loaded anomalies into incident timelines.

Filters (Category / Severity) explained UI-only filtering

expand

Category and Severity filters do not call the backend. They filter the already-loaded anomalies list.

Filter	What it matches	Where it is applied
Category	Values like commit-patterns, pipeline-tampering, dependency-anomalies	AnomalyDetector.js
Severity	Derived from confidence: critical/high/medium/low	AnomalyDetection.js

How “Run Detection” works (end-to-end) mapping + severity

expand

The backend returns anomalies with confidence. The UI maps confidence to severity: critical ≥ 90, high ≥ 80, medium ≥ 70, else low.

POST /api/ml/anomaly-detection/infer
Body:
{
  "model": "commit_pattern",
  "limit": 500
}
Response:
{
  "success": true,
  "anomalies": [
    {
      "id": "commit_0",
      "category": "commit-patterns",
      "confidence": 94.15,
      "details": { "is_off_hours": 1, "files_changed": 0 }
    }
  ]
}

Stored results can be fetched via GET /api/ml/anomaly-detection/latest?model=commit_pattern.

How Training Works (Step-by-Step)

Commit Patterns (Unsupervised)

IsolationForest

Implementation: backend/ml_models/anomaly_detection/commit_pattern_analyzer.py

Data Loading: Reads the NSL-KDD dataset (mapped to commit features like files_changed, lines_added, is_off_hours).
Preprocessing: Scales features using StandardScaler to normalize ranges.
Model Fitting: Trains an IsolationForest to learn the baseline of "normal" commit behavior.
Persistence: Saves the model to commit_pattern_model.pkl and generates a confusion matrix using a holdout test set.

Dependency Anomalies (Unsupervised)

IsolationForest

Implementation: backend/ml_models/anomaly_detection/dependency_anomaly_detector.py

Data Loading: Reads the UNSW-NB15 dataset (mapped to dependency graph features like depth, dev_dependency_count).
Preprocessing: Scales features using StandardScaler.
Model Fitting: Trains an IsolationForest to detect unusual dependency structures (e.g., extremely deep trees or massive dev-dependency bloat).
Persistence: Saves the model to dependency_anomaly_model.pkl and generates a correlation heatmap.

Reports/graphs displayed

Training graphs are requested as images from the backend. Example: GET /api/ml/anomaly-detection/reports/commit_confusion_matrix.png

Data sources

Dataset	Used by	Purpose
NSL-KDD (Network Intrusion)	Commit Pattern Analyzer	Provides baseline patterns for anomaly detection (mapped to commit features).
UNSW-NB15 (Network Anomaly)	Dependency Anomaly Detector	Training set for detecting anomalous dependency structures.

Advanced / Internal Endpoints

Endpoint	Purpose
POST /api/ml/anomaly-detection/commit-patterns	Internal inference for commit anomalies (used by general infer route).
POST /api/ml/anomaly-detection/pipeline-tampering	Detects unauthorized modifications to pipeline configurations (YAML/JSON).
POST /api/ml/anomaly-detection/dependency-anomalies	Internal inference for dependency structure anomalies.

6) Component 3 — Integrity Verification (SBOM / Hash / Sign / Merkle / Malware)

Frontend folder: src/components/IntegrityVerification/

Tabs

Tab	File	What it does
SBOM Generator	SBOMGenerator.js	Computes SHA-256 for SBOM JSON using backend.
Hash Verification	HashVerification.js	Compute SHA-256; verify expected hash.
Digital Signatures	DigitalSignatures.js	Ed25519 keypair generation, signing, signature verification.
Merkle Tree	MerkleTree.js	Computes Merkle root from provided leaf hashes.
Malware Classification	MicrosoftMalware.js	Trains and infers a malware family classifier from dataset; shows metrics and predictions.

Buttons and what they do

Button	Where	Backend endpoint(s)	Behavior
Compute Hash (SBOM)	SBOMGenerator.js	POST /api/ml/integrity-verification/sbom/hash	Returns SBOM SHA-256 and dependency count (when derivable).
Compute Hash (content)	HashVerification.js	POST /api/ml/integrity-verification/hash	Returns SHA-256 hash for the provided content.
Verify Hash	HashVerification.js	POST /api/ml/integrity-verification/hash/verify	Returns match/actual hash result.
Generate Keypair	DigitalSignatures.js	POST /api/ml/integrity-verification/keys/ed25519	Returns Ed25519 public/private PEM strings.
Sign	DigitalSignatures.js	POST /api/ml/integrity-verification/sign/ed25519	Returns signature (base64) for the payload using the private key.
Verify (signature)	DigitalSignatures.js	POST /api/ml/integrity-verification/verify/ed25519	Returns valid true/false for payload+signature under public key.
Compute Root	MerkleTree.js	POST /api/ml/integrity-verification/merkle/root	Computes Merkle root and leaf count.
Train Model (malware)	MicrosoftMalware.js	POST /api/ml/integrity-verification/microsoft-malware/train	Trains malware classifier; persists microsoft_malware_model.pkl and writes reports/graphs.
Run Inference (malware)	MicrosoftMalware.js	POST /api/ml/integrity-verification/microsoft-malware/infer-dataset	Runs inference on dataset subset and stores latest predictions.
Refresh (malware)	MicrosoftMalware.js	GET /api/ml/integrity-verification/microsoft-malware/latest	Reloads latest stored predictions.

Ed25519 signature workflow (exact sequence) keys → sign → verify

expand

POST /api/ml/integrity-verification/keys/ed25519
Response:
{
  "success": true,
  "private_key_pem": "-----BEGIN PRIVATE KEY----- ...",
  "public_key_pem": "-----BEGIN PUBLIC KEY----- ..."
}

POST /api/ml/integrity-verification/sign/ed25519
Body:
{
  "payload": "hello",
  "private_key_pem": "-----BEGIN PRIVATE KEY----- ..."
}
Response:
{
  "success": true,
  "signature_b64": "..."
}

POST /api/ml/integrity-verification/verify/ed25519
Body:
{
  "payload": "hello",
  "public_key_pem": "-----BEGIN PUBLIC KEY----- ...",
  "signature_b64": "..."
}
Response:
{
  "success": true,
  "valid": true
}

How Training Works (Step-by-Step)

Microsoft Malware Classification (Supervised)

SGDClassifier

Implementation: backend/ml_models/integrity_verification/microsoft_malware_classifier.py

Data Loading: Reads Microsoft Malware Classification Challenge/data.csv.
Feature Selection: Selects relevant numerical columns (e.g., resource sizes, section counts) and the class label.
Preprocessing: Scales features using StandardScaler.
Model Fitting: Trains an SGDClassifier (Stochastic Gradient Descent) optimized for large datasets to classify malware families.
Persistence: Saves the model to microsoft_malware_model.pkl and generates accuracy curves and confusion matrices.

Advanced / Internal Endpoints

Endpoint	Purpose
POST /api/ml/integrity-verification/detect-tampering	Checks artifacts for integrity violations against known signatures/hashes.

7) Component 4 — Forensic Chain-of-Custody & Reporting Portal

Frontend folder: src/components/ForensicReportingPortal/

This component consumes real evidence from the backend (CI/CD evidence state + anomaly latest results), constructs a timeline, provides custody logs, verifies integrity proofs via the hash API, generates reports, and includes an ML Models sub-tab using the Reporting Portal datasets.

Tabs

Tab	Backend data source	How it works
Evidence Visualization	GET /api/ml/reporting-portal/forensic-data	Shows evidence timeline with filters (type + timeframe) and metadata.
Integrity Proofs	GET /api/ml/reporting-portal/forensic-data	Shows proof list derived from timeline; verify button calls hash verify API.
Chain of Custody	GET /api/ml/reporting-portal/custody	Displays custody logs with expandable history + checksum.
Report Generator	POST /api/ml/reporting-portal/report/generate	Generates a report payload; export buttons download JSON to disk.
ML Models	/api/ml/reporting-portal/*	Train/infer the Reporting Portal classifiers and display their report graphs.

Buttons and what they do

Button / Control	Where	Backend endpoint(s)	Behavior
Verify (integrity proof)	IntegrityProofs.js	POST /api/ml/integrity-verification/hash/verify	Verifies that the backend recomputed SHA-256 of the stored proof payload matches the expected hash.
Generate Report	ReportGenerator.js	POST /api/ml/reporting-portal/report/generate	Creates a report object containing sections, evidence count, timeline, integrity proofs, custody logs.
Export as PDF/JSON/XML	ReportGenerator.js	No backend call	Downloads the generated report as a JSON blob (filename extension varies).
Train Model (Reporting ML)	ReportingPortalModels.js	POST /api/ml/reporting-portal/train	Trains either attack_type or breach_type model; updates training_state and reports list.
Infer Dataset (Reporting ML)	ReportingPortalModels.js	POST /api/ml/reporting-portal/infer-dataset	Runs inference on a dataset subset and stores latest predictions for that model.
Infer Sample (Reporting ML)	ReportingPortalModels.js	POST /api/ml/reporting-portal/infer-sample	Predicts a single record provided as JSON in the UI textarea.

Reporting ML: “Infer Sample” JSON format required fields

expand

The UI textarea sends your JSON object directly to the backend. Use the dataset column names as keys.

Attack Type (from Attack_Dataset.csv)

POST /api/ml/reporting-portal/infer-sample
Body:
{
  "model": "attack_type",
  "sample": {
    "Attack Name": "Phishing",
    "Description": "Credential harvesting email campaign targeting employees"
  }
}
Response:
{
  "success": true,
  "prediction": "Phishing",
  "confidence": 0.81
}

Breach Type (from Cyber Security Breaches.csv)

POST /api/ml/reporting-portal/infer-sample
Body:
{
  "model": "breach_type",
  "sample": {
    "Organization": "Example Corp",
    "Type_of_Breach": "Hacking",
    "Summary": "Unauthorized access and data exfiltration detected",
    "Records_Lost": 120000
  }
}

For breach_type inference, Type_of_Breach is the training label; for real-time prediction, you can omit it (backend will treat it as unknown and predict it).

Where the Reporting Portal data comes from real evidence

expand

GET /api/ml/reporting-portal/forensic-data composes:

Source	Backend storage	Used to render
CI/CD evidence items	backend/data/cicd_state.json	Timeline events + custody logs
Anomaly Detection “latest” results	backend memory (latest_results)	Additional timeline anomalies
Integrity proofs	Derived per timeline item	Integrity Proofs tab

How Training Works (Step-by-Step)

Attack Type Classification (Supervised)

LogisticRegression + TF-IDF

Implementation: backend/ml_models/reporting_portal/attack_type_classifier.py

Data Loading: Reads Attack_Dataset.csv.
Text Construction: Concatenates text columns (Title, Description, Impact, etc.) into a single "feature text" string per row.
Vectorization: Applies TF-IDF (up to 12,000 features, 1-2 n-grams) to convert text to vectors.
Model Fitting: Trains a LogisticRegression model to classify the specific Attack Type.
Persistence: Saves the model pipeline to attack_type_model.pkl.

Breach Type Classification (Supervised)

ColumnTransformer (Mixed Data)

Implementation: backend/ml_models/reporting_portal/breach_type_classifier.py

Data Loading: Reads Cyber Security Breaches.csv.
Feature Engineering: Uses a ColumnTransformer to handle mixed data types:
- Text: TF-IDF on the Summary column.
- Categorical: One-Hot Encoding on State, etc.
- Numerical: Median imputation on Individuals_Affected, year, etc.
Model Fitting: Trains a LogisticRegression classifier on the combined feature set to predict Type_of_Breach.
Persistence: Saves the complex pipeline to breach_type_model.pkl.

Reporting Portal ML datasets

Dataset file	Model	Label column	Reports directory
backend/ml_models/reporting_portal/Attack_Dataset.csv	attack_type	Attack Type	backend/ml_models/reporting_portal/reports/attack_type/
backend/ml_models/reporting_portal/Cyber Security Breaches.csv	breach_type	Type_of_Breach	backend/ml_models/reporting_portal/reports/breach_type/

Advanced / Internal Endpoints

Endpoint	Purpose
POST /api/ml/reporting-portal/correlate	Correlates discrete evidence items into a unified attack timeline.

8) API Index (Endpoints)

All endpoints are served by Flask in backend/api/ml_api.py.

GET  /api/ml/health

GET  /api/ml/cicd-agents/status
GET  /api/ml/cicd-agents/evidence?limit=200
POST /api/ml/cicd-agents/collect
GET  /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config/reset
POST /api/ml/cicd-agents/register

GET  /api/ml/cicd-agents/hdfs/status
POST /api/ml/cicd-agents/hdfs/train
POST /api/ml/cicd-agents/hdfs/infer-dataset
GET  /api/ml/cicd-agents/hdfs/latest
GET  /api/ml/cicd-agents/hdfs/reports/<filename>

GET  /api/ml/cicd-agents/hadoop/status
POST /api/ml/cicd-agents/hadoop/train
POST /api/ml/cicd-agents/hadoop/infer-dataset
GET  /api/ml/cicd-agents/hadoop/latest
GET  /api/ml/cicd-agents/hadoop/reports/<filename>

GET  /api/ml/anomaly-detection/status
POST /api/ml/anomaly-detection/train
POST /api/ml/anomaly-detection/infer
POST /api/ml/anomaly-detection/infer-sample
GET  /api/ml/anomaly-detection/latest?model=commit_pattern
GET  /api/ml/anomaly-detection/reports/<filename>

POST /api/ml/integrity-verification/hash
POST /api/ml/integrity-verification/hash/verify
POST /api/ml/integrity-verification/sbom/hash
POST /api/ml/integrity-verification/merkle/root
POST /api/ml/integrity-verification/keys/ed25519
POST /api/ml/integrity-verification/sign/ed25519
POST /api/ml/integrity-verification/verify/ed25519

GET  /api/ml/integrity-verification/microsoft-malware/status
POST /api/ml/integrity-verification/microsoft-malware/train
POST /api/ml/integrity-verification/microsoft-malware/infer-dataset
GET  /api/ml/integrity-verification/microsoft-malware/latest
GET  /api/ml/integrity-verification/microsoft-malware/reports/<filename>

GET  /api/ml/reporting-portal/status
POST /api/ml/reporting-portal/train
POST /api/ml/reporting-portal/infer-sample
POST /api/ml/reporting-portal/infer-dataset
GET  /api/ml/reporting-portal/latest?model=attack_type
GET  /api/ml/reporting-portal/reports/<model>/<filename>
GET  /api/ml/reporting-portal/forensic-data
GET  /api/ml/reporting-portal/custody
POST /api/ml/reporting-portal/report/generate

9) PM2 Runtime

PM2 config is backend/ecosystem.config.js. The ML API process: forensic-ml-api runs backend/api/ml_api.py using the venv interpreter backend/venv/bin/python3, port 5000.

cd /var/www/forensic-portal/backend
pm2 start ecosystem.config.js
pm2 restart ecosystem.config.js --update-env
pm2 logs forensic-ml-api
pm2 save

10) curl Test Commands (Full Catalog)

Set BASE once and run any command. These are copy-pasteable smoke tests for every model and non-ML integrity function. All API routes are under /api/ml/.

BASE=https://monitoringsystem.space
HJSON='Content-Type: application/json'

Health

# Check if ML API is reachable; returns {"success": true}
curl -sS $BASE/api/ml/health

CI/CD Agents

# Get total agents and their statuses
curl -sS $BASE/api/ml/cicd-agents/status

# List recently collected evidence items (e.g., git-diffs, logs)
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=50

# View current agent configuration (what to capture)
curl -sS $BASE/api/ml/cicd-agents/config

# Register a new agent (simulated registration)
curl -sS -X POST $BASE/api/ml/cicd-agents/register -H "$HJSON" -d '{"name":"curl-agent-1","type":"Jenkins"}'

# Update configuration (disable auto_collect, set interval)
curl -sS -X POST $BASE/api/ml/cicd-agents/config -H "$HJSON" -d '{"config":{"auto_collect":false,"collection_interval_seconds":30}}'

# Reset configuration to defaults
curl -sS -X POST $BASE/api/ml/cicd-agents/config/reset -H "$HJSON" -d '{}'

# Trigger manual evidence collection from dataset (creates evidence items)
curl -sS -X POST $BASE/api/ml/cicd-agents/collect -H "$HJSON" -d '{"mode":"dataset"}'

# Verify new evidence was added
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=20

CI/CD Agents — HDFS Log Anomaly Detection

# Check training status of HDFS model
curl -sS $BASE/api/ml/cicd-agents/hdfs/status

# Train HDFS model (unsupervised) on 20k log lines
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/train -H "$HJSON" -d '{"limit":20000}'

# Verify status updates to "trained"
curl -sS $BASE/api/ml/cicd-agents/hdfs/status

# Run inference on dataset (limit 5000 lines) and store results
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/infer-dataset -H "$HJSON" -d '{"limit":5000,"store":true}'

# Retrieve the stored inference results (anomalies)
curl -sS $BASE/api/ml/cicd-agents/hdfs/latest

# Download generated graphs and training summary
curl -sS -o hdfs_dataset_matrix.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_dataset_matrix.png
curl -sS -o hdfs_score_distribution.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_score_distribution.png
curl -sS -o hdfs_training_summary.json $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_training_summary.json

CI/CD Agents — Hadoop Log Failure Classification

# Check training status of Hadoop model
curl -sS $BASE/api/ml/cicd-agents/hadoop/status

# Train Hadoop supervised model on 12k log entries
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/train -H "$HJSON" -d '{"limit":12000}'

# Verify status is "trained"
curl -sS $BASE/api/ml/cicd-agents/hadoop/status

# Run classification on dataset and store failures
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/infer-dataset -H "$HJSON" -d '{"limit":3000,"store":true}'

# Retrieve latest detected failures
curl -sS $BASE/api/ml/cicd-agents/hadoop/latest

# Download classification metrics and confusion matrix
curl -sS -o hadoop_confusion_matrix.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_confusion_matrix.png
curl -sS -o hadoop_metrics.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_metrics.png
curl -sS -o hadoop_training_summary.json $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_training_summary.json

Anomaly Detection Engine

# Check status of both anomaly models
curl -sS $BASE/api/ml/anomaly-detection/status

# Train both Commit Pattern and Dependency Anomaly models
curl -sS -X POST $BASE/api/ml/anomaly-detection/train -H "$HJSON" -d '{"models":["commit_pattern","dependency_anomaly"],"limit":2000}'

# Verify both are trained
curl -sS $BASE/api/ml/anomaly-detection/status

# Run inference for Commit Patterns and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"commit_pattern","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=commit_pattern

# Run inference for Dependency Anomalies and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"dependency_anomaly","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=dependency_anomaly

# Infer a single sample (manual JSON input)
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer-sample -H "$HJSON" -d '{"model":"commit_pattern","sample":{"commit_message_length":240,"files_changed":12,"lines_added":900,"lines_removed":20,"is_off_hours":1,"is_weekend":0,"timestamp_hour":2,"day_of_week":6,"time_since_last_commit":3600,"author_commit_count":1}}'

# Download generated anomaly reports
curl -sS -o commit_confusion_matrix.png $BASE/api/ml/anomaly-detection/reports/commit_confusion_matrix.png
curl -sS -o commit_metrics.png $BASE/api/ml/anomaly-detection/reports/commit_metrics.png
curl -sS -o dependency_heatmap.png $BASE/api/ml/anomaly-detection/reports/dependency_correlation_heatmap.png
curl -sS -o training_summary.json $BASE/api/ml/anomaly-detection/reports/anomaly_training_summary.json

Integrity Verification (Hash / SBOM / Merkle / Ed25519)

# Compute SHA-256 hash of a string
curl -sS -X POST $BASE/api/ml/integrity-verification/hash -H "$HJSON" -d '{"content":"hello"}'

# Verify content matches an expected hash
curl -sS -X POST $BASE/api/ml/integrity-verification/hash/verify -H "$HJSON" -d '{"content":"hello","expected_hash":"deadbeef"}'

# Compute hash of an SBOM JSON structure
curl -sS -X POST $BASE/api/ml/integrity-verification/sbom/hash -H "$HJSON" -d '{"sbom_json":"{\"name\":\"demo\",\"dependencies\":[{\"name\":\"a\",\"version\":\"1.0.0\"}]}"}'

# Compute Merkle Root for a list of leaf hashes
curl -sS -X POST $BASE/api/ml/integrity-verification/merkle/root -H "$HJSON" -d '{"hashes":["a3f1","b2c9","c0d1"]}'

# Ed25519: Generate Keypair, Sign, and Verify
KEYS=$(curl -sS -X POST $BASE/api/ml/integrity-verification/keys/ed25519 -H "$HJSON")
PRIV=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["private_key_pem"])')
PUB=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["public_key_pem"])')
SIG=$(curl -sS -X POST $BASE/api/ml/integrity-verification/sign/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"private_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PRIV")}" | python3 -c 'import sys,json;print(json.load(sys.stdin)["signature_b64"])')
curl -sS -X POST $BASE/api/ml/integrity-verification/verify/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"public_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PUB"),\"signature_b64\":\"$SIG\"}"

Integrity Verification — Microsoft Malware Classification

# Check status of malware model
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status

# Train malware classifier (supervised SGD)
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/train -H "$HJSON" -d '{"limit":30000}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status

# Run inference on dataset and store results
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/infer-dataset -H "$HJSON" -d '{"limit":500,"store":true}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/latest

# Download malware classification reports
curl -sS -o microsoft_malware_confusion_matrix.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_confusion_matrix.png
curl -sS -o microsoft_malware_metrics.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_metrics.png
curl -sS -o microsoft_malware_accuracy_curve.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_accuracy_curve.png
curl -sS -o microsoft_malware_training_summary.json $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_training_summary.json

Reporting Portal (Evidence/Custody/Report)

# Fetch timeline evidence (from CI/CD + Anomalies)
curl -sS $BASE/api/ml/reporting-portal/forensic-data

# Fetch chain of custody logs
curl -sS $BASE/api/ml/reporting-portal/custody

# Generate a full forensic report JSON
curl -sS -X POST $BASE/api/ml/reporting-portal/report/generate -H "$HJSON" -d '{"format":"json"}'

Reporting Portal — ML Models (Attack Type / Breach Type)

# Check status of reporting models
curl -sS $BASE/api/ml/reporting-portal/status

# Train Attack Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["attack_type"],"limit_attack":6000}'

# Train Breach Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["breach_type"]}'

# Infer Attack Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"attack_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=attack_type

# Infer Breach Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"breach_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=breach_type

# Infer single sample: Attack Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"attack_type","sample":{"Attack Name":"Malware","Description":"Ransomware infection spreading laterally in network"}}'

# Infer single sample: Breach Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"breach_type","sample":{"Organization":"Example Corp","Summary":"Unauthorized access and data exfiltration detected","Records_Lost":120000}}'

# Download reporting model graphs
curl -sS -o attack_type_metrics.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_metrics.png
curl -sS -o attack_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_confusion_matrix.png
curl -sS -o attack_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_learning_curve.png
curl -sS -o breach_type_metrics.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_metrics.png
curl -sS -o breach_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_confusion_matrix.png
curl -sS -o breach_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_learning_curve.png

11) Troubleshooting

404 Not Found

Usually means Nginx isn’t proxying /api/ml/ to the Flask backend.

curl -i https://your-host/api/ml/health

502 Bad Gateway

Backend process not running, wrong port, or PM2 stopped. Check PM2 logs.

cd /var/www/forensic-portal/backend
pm2 status
pm2 logs forensic-ml-api

504 Gateway Timeout

Training took longer than proxy timeout. Increase proxy timeouts for /api/ml/ or use lower limits.

Model not trained

Inference endpoints require a trained model. Train first, then infer, then fetch /latest.

Operator Handbook — 4 Components (Frontend + Backend + ML)

1) Architecture & Data Flow

1.1) Project File Structure

2) Flow Charts

Overall system flow

Model lifecycle (generic train → infer → reports)

3) Dashboard Layout (Tabs)

4) Component 1 — Forensic-Ready CI/CD Agents

UI sections

Buttons and what they do

Evidence collection

Evidence list

Model persistence and reports

How Training Works (Step-by-Step)

Data sources

Advanced / Internal Endpoints

5) Component 2 — Automated Forensic Anomaly Detection Engine

Tabs

Buttons and what they do

How Training Works (Step-by-Step)

Reports/graphs displayed

Data sources

Advanced / Internal Endpoints

6) Component 3 — Integrity Verification (SBOM / Hash / Sign / Merkle / Malware)

Tabs

Buttons and what they do

How Training Works (Step-by-Step)

Advanced / Internal Endpoints

7) Component 4 — Forensic Chain-of-Custody & Reporting Portal

Tabs

Buttons and what they do

Attack Type (from Attack_Dataset.csv)

Breach Type (from Cyber Security Breaches.csv)

How Training Works (Step-by-Step)

Reporting Portal ML datasets

Advanced / Internal Endpoints

8) API Index (Endpoints)

9) PM2 Runtime

10) curl Test Commands (Full Catalog)

Health

CI/CD Agents

CI/CD Agents — HDFS Log Anomaly Detection

CI/CD Agents — Hadoop Log Failure Classification

Anomaly Detection Engine

Integrity Verification (Hash / SBOM / Merkle / Ed25519)

Integrity Verification — Microsoft Malware Classification

Reporting Portal (Evidence/Custody/Report)

Reporting Portal — ML Models (Attack Type / Breach Type)

11) Troubleshooting