Forensic Portal

Operator Handbook — 4 Components (Frontend + Backend + ML)

This document describes the UI layout, every tab/button/input behavior, backend API wiring, model training/inference lifecycle, report/graph outputs, and runtime setup (PM2).

RP 41 SLIIT Single API: /api/ml/* PM2 backend process

Workspace: /var/www/forensic-portal

1) Architecture & Data Flow

The system is a React dashboard talking to a Flask ML API over /api/ml/. Models train, persist to .pkl, and generate graphs under reports/.

Frontend
/var/www/forensic-portal/src

React SPA with a tabbed dashboard in src/App.js. Each tab renders one of the four component folders under src/components/.

Backend
/var/www/forensic-portal/backend

Flask ML API in backend/api/ml_api.py. Models live under backend/ml_models/, with persisted .pkl models and generated graphs in reports/.

Frontend → Backend connection

All frontend network calls go through src/api/mlApi.js which fetches JSON from /api/ml/... on the same origin (Nginx reverse proxy).

Evidence pipeline

CI/CD evidence is stored in backend/data/cicd_state.json. The Reporting Portal builds its timeline + custody logs from that stored evidence and from the anomaly-detection latest results.

Request lifecycle (from a button click) UI → mlApi.js → Flask → Model → UI
expand
Step What happens Where in code
1 User clicks a UI button (or changes a tab/filter) src/components/**
2 Component calls a function from the API client src/api/mlApi.js
3 Browser sends HTTP request to /api/ml/... fetch()
4 Nginx proxies to Flask on port 5000 /etc/nginx/sites-enabled/*
5 Flask route validates input, loads model, trains/infer, persists state backend/api/ml_api.py
6 JSON response returned; UI updates state and re-renders React useState/useEffect

1.1) Project File Structure

Physical layout of the monorepo mapping code to components.

/var/www/forensic-portal/
├── backend/
│   ├── api/ml_api.py              # Main Flask entry point (All Routes)
│   ├── data/cicd_state.json       # JSON Database for Evidence & Logs
│   ├── ecosystem.config.js        # PM2 process config
│   └── ml_models/                 # Logic per domain
│       ├── anomaly_detection/     # Commit patterns, dependency checks, pipeline tampering
│       ├── cicd_agents/           # HDFS/Hadoop log analysis models
│       ├── integrity_verification/# Malware classifier, Ed25519, Merkle, hashing
│       └── reporting_portal/      # Attack/Breach classifiers & report generation
└── src/
    ├── api/mlApi.js               # Frontend-Backend bridge (fetch wrapper)
    ├── App.js                     # Main React layout & Tab Controller
    └── components/                # UI Views matching backend domains
        ├── AnomalyDetection/      # Graphs, Timelines, Training UI
        ├── ForensicReadyCICDAgents/ # Agent Status, Evidence Collection UI
        ├── ForensicReportingPortal/ # Final Reports, Chain of Custody UI
        └── IntegrityVerification/ # Crypto tools & Malware Verification UI
            

2) Flow Charts

These diagrams are “shapes” (SVG) so they render everywhere (no external assets). Use them as a mental model for how every button maps to API calls, persisted artifacts, and UI rendering.

Overall system flow

User Browser React Dashboard src/App.js + src/components/* Nginx Proxies /api/ml/* to Flask :5000 Flask ML API backend/api/ml_api.py Routes train/infer/latest/reports Artifacts .pkl models reports/*.png data/*.json state UI fetches status/latest + renders results/graphs Button click pattern (all components) UI Button → mlApi.js fetch('/api/ml/...') → Flask route → model train/infer → persist + return JSON → UI renders

Model lifecycle (generic train → infer → reports)

Train button POST /train Training pipeline fit() + metrics Saved artifacts model.pkl training_summary.json graphs (*.png) Infer button POST /infer* Results stored as “latest” GET /latest to reload

3) Dashboard Layout (Tabs)

The top-level tabs are implemented in src/App.js. Each tab renders a component:

Dashboard Tab Component File What it shows
CI/CD Agents src/components/ForensicReadyCICDAgents/ForensicReadyCICDAgents.js Agent status, evidence collection, agent configuration, HDFS anomaly model, Hadoop failure model
Anomaly Detection src/components/AnomalyDetection/AnomalyDetection.js ML detection for commit/dependency anomalies, timeline reconstruction, training reports/graphs
Integrity Verification src/components/IntegrityVerification/IntegrityVerification.js SBOM hashing, hash verification, Ed25519 signatures, Merkle root, malware model train/infer
Reporting Portal src/components/ForensicReportingPortal/ForensicReportingPortal.js Evidence visualization, integrity proofs, chain-of-custody logs, report generator, reporting ML models

The header status indicator (“Connected/Disconnected”) calls GET /api/ml/health.

Connected/Disconnected logic health check
expand
Source: src/App.js calls getMlHealth() once on load. If the request succeeds and returns {"success": true}, the UI shows Connected, otherwise it shows Disconnected.
GET /api/ml/health
Response:
{
  "success": true
}
                

4) Component 1 — Forensic-Ready CI/CD Agents

Frontend folder: src/components/ForensicReadyCICDAgents/

This tab is designed to: show connected CI/CD agents, collect evidence from datasets/models, let you set evidence capture configuration, and run two ML models (HDFS anomaly, Hadoop failure classification).

UI sections

Section File What it contains
Agent Status AgentStatus.js List of agents + metrics (total/active/performance overhead)
Evidence Collection EvidenceCollector.js Filters evidence items by type and triggers collection
Agent Configuration AgentConfig.js Checkboxes and numeric settings; save/reset
HDFS Log Anomaly Detection ForensicReadyCICDAgents.js Train / Run Detection / Refresh Results; list anomalies
Hadoop Log Failure Classification ForensicReadyCICDAgents.js Train / Run Detection / Refresh Results; list detections; shows Accuracy/F1/Precision/Recall

Buttons and what they do

Button / Control Where Backend endpoint(s) Behavior
Start Collection EvidenceCollector.js → onStartCollection POST /api/ml/cicd-agents/collect Triggers dataset-based evidence creation on the backend (uses trained HDFS/Hadoop models if present). Then refreshes evidence list.
Save Configuration AgentConfig.js POST /api/ml/cicd-agents/config Persists config to backend state.
Reset to Defaults AgentConfig.js POST /api/ml/cicd-agents/config/reset Resets config on backend to defaults.
Train Model (HDFS) ForensicReadyCICDAgents.js POST /api/ml/cicd-agents/hdfs/train Trains the HDFS TF-IDF + IsolationForest model and persists hdfs_log_anomaly_model.pkl.
Run Detection (HDFS) ForensicReadyCICDAgents.js POST /api/ml/cicd-agents/hdfs/infer-dataset Runs detection on log dataset and stores latest anomalies retrievable via /latest.
Refresh Results (HDFS) ForensicReadyCICDAgents.js GET /api/ml/cicd-agents/hdfs/latest Loads the latest stored anomalies.
Train Model (Hadoop) ForensicReadyCICDAgents.js POST /api/ml/cicd-agents/hadoop/train Trains the supervised Hadoop log classifier and persists hadoop_log_failure_model.pkl.
Run Detection (Hadoop) ForensicReadyCICDAgents.js POST /api/ml/cicd-agents/hadoop/infer-dataset Runs classification on dataset and stores latest “failures” as detections.
Refresh Results (Hadoop) ForensicReadyCICDAgents.js GET /api/ml/cicd-agents/hadoop/latest Loads the latest stored detections.
Request/Response details CI/CD Agents APIs
expand

Evidence collection

POST /api/ml/cicd-agents/collect
Body:
{
  "mode": "dataset"
}
Response:
{
  "success": true,
  "added": 2
}
                

The new evidence items are stored in backend/data/cicd_state.json and later consumed by the Reporting Portal timeline.

Evidence list

GET /api/ml/cicd-agents/evidence?limit=200
Response:
{
  "success": true,
  "evidence": [
    {
      "timestamp": "2025-12-13T02:32:20Z",
      "type": "build-logs",
      "description": "Hadoop job failure predicted",
      "metadata": {
        "app_id": "application_...",
        "confidence": 80.32,
        "label": 1
      }
    }
  ]
}
                

Model persistence and reports

Model Saved file Reports directory
HDFS Log Anomaly backend/ml_models/cicd_agents/hdfs_log_anomaly_model.pkl backend/ml_models/cicd_agents/reports/hdfs_log_anomaly/
Hadoop Log Failure backend/ml_models/cicd_agents/hadoop_log_failure_model.pkl backend/ml_models/cicd_agents/reports/hadoop_log_failure/
Agent Configuration (the “tick” checkboxes) explained what each toggle does
expand

These checkboxes are not placeholders. They are a real config object stored on the backend and used to control what evidence categories can be collected/recorded.

GET /api/ml/cicd-agents/config
Response:
{
  "success": true,
  "config": {
    "capture_git_diffs": true,
    "capture_build_logs": true,
    "capture_env_vars": true,
    "capture_secrets_access": true,
    "capture_artifacts": true,
    "capture_pipeline_configs": true,
    "auto_collect": false,
    "collection_interval_seconds": 30,
    "max_evidence_size_mb": 100,
    "encrypt_evidence": false
  }
}

POST /api/ml/cicd-agents/config
Body:
{
  "config": {
    "auto_collect": true,
    "collection_interval_seconds": 30
  }
}
Response:
{
  "success": true,
  "config": { "...merged_config" : true }
}
                
UI control Config field Effect
Capture Git Commit Diffs capture_git_diffs Allows evidence items of type git-diffs to be collected/stored.
Capture Build Logs capture_build_logs Allows evidence items of type build-logs to be collected/stored.
Capture Environment Variables capture_env_vars Allows evidence items of type env-vars to be collected/stored.
Capture Secrets Access Events capture_secrets_access Allows evidence items of type secrets-access to be collected/stored.
Capture Build Artifacts capture_artifacts Allows evidence items of type artifacts to be collected/stored.
Capture Pipeline Config Files capture_pipeline_configs Allows evidence items of type pipeline-config to be collected/stored.
Enable Automatic Collection auto_collect Backend will treat this as an enable flag for scheduled collection behavior.
Collection Interval (seconds) collection_interval_seconds Interval target for automatic collection scheduling.
Max Evidence Size (MB) max_evidence_size_mb Safety limit used to cap evidence payload sizes when storing.
Enable Encryption for Stored Evidence encrypt_evidence Controls whether stored evidence should be encrypted at rest.

How Training Works (Step-by-Step)

HDFS Log Anomaly (Unsupervised)
IsolationForest

Implementation: backend/ml_models/cicd_agents/hdfs_log_anomaly_detector.py

  1. Data Loading: Reads hdfs_log/hdfs.log/sorted.log.
  2. Parsing: Extracts event templates and groups logs by Block ID.
  3. Vectorization: Converts event sequences into numerical vectors using TF-IDF.
  4. Model Fitting: Trains an IsolationForest model (unsupervised) to identify outliers in the vector space.
  5. Persistence: Saves the model to hdfs_log_anomaly_model.pkl.
Hadoop Log Failure (Supervised)
LogisticRegression

Implementation: backend/ml_models/cicd_agents/hadoop_log_failure_classifier.py

  1. Data Loading: Reads log files and the ground truth labels from abnormal_label.txt.
  2. Feature Extraction: Maps application log patterns to known normal/failure sequences.
  3. Vectorization: Applies TF-IDF to the sequence of log events.
  4. Model Fitting: Trains a LogisticRegression classifier to distinguish between "Normal" and "Anomaly" classes.
  5. Persistence: Saves the model to hadoop_log_failure_model.pkl and generates classification metrics (F1, Precision, Recall).

Data sources

Dataset / File Used by Purpose
backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/hdfs_log/hdfs.log/sorted.log HDFS model Unsupervised anomaly detection on HDFS logs.
backend/ml_models/cicd_agents/HDFS Log Anomaly Detection/Hadoop_log/Hadoop_log/abnormal_label.txt Hadoop model Supervised labels for Hadoop application IDs.

Advanced / Internal Endpoints

Endpoint Purpose
POST /api/ml/cicd-agents/detect Runs pattern-based detection on raw evidence payloads (internal helper).

5) Component 2 — Automated Forensic Anomaly Detection Engine

Frontend folder: src/components/AnomalyDetection/

Tabs

Tab File Purpose
Anomaly Detector AnomalyDetection.js + AnomalyDetector.js Run ML inference; filter anomalies by category/severity; show totals and accuracy.
Timeline Reconstruction TimelineReconstruction.js Groups anomalies by incident and builds a timeline view.
Training Reports TrainingReports.js Shows training metrics and graphs served from backend reports directory.

Buttons and what they do

Button / Control Where Backend endpoint(s) Behavior
Train Models AnomalyDetection.js POST /api/ml/anomaly-detection/train Trains commit pattern + dependency anomaly models; generates report images; persists .pkl models.
Run Detection AnomalyDetector.js (button), handled in AnomalyDetection.js POST /api/ml/anomaly-detection/infer Runs inference on dataset and returns anomalies. UI maps confidence → severity.
Refresh Results AnomalyDetection.js GET /api/ml/anomaly-detection/latest Loads stored “latest” anomalies for selected model.
Commit Patterns / Dependency Anomalies (model switch) AnomalyDetection.js Affects which model key is used in requests Switches model parameter: commit_pattern or dependency_anomaly.
Reconstruct Timelines TimelineReconstruction.js No backend call Pure UI grouping of already-loaded anomalies into incident timelines.
Filters (Category / Severity) explained UI-only filtering
expand

Category and Severity filters do not call the backend. They filter the already-loaded anomalies list.

Filter What it matches Where it is applied
Category Values like commit-patterns, pipeline-tampering, dependency-anomalies AnomalyDetector.js
Severity Derived from confidence: critical/high/medium/low AnomalyDetection.js
How “Run Detection” works (end-to-end) mapping + severity
expand

The backend returns anomalies with confidence. The UI maps confidence to severity: critical ≥ 90, high ≥ 80, medium ≥ 70, else low.

POST /api/ml/anomaly-detection/infer
Body:
{
  "model": "commit_pattern",
  "limit": 500
}
Response:
{
  "success": true,
  "anomalies": [
    {
      "id": "commit_0",
      "category": "commit-patterns",
      "confidence": 94.15,
      "details": { "is_off_hours": 1, "files_changed": 0 }
    }
  ]
}
                

Stored results can be fetched via GET /api/ml/anomaly-detection/latest?model=commit_pattern.

How Training Works (Step-by-Step)

Commit Patterns (Unsupervised)
IsolationForest

Implementation: backend/ml_models/anomaly_detection/commit_pattern_analyzer.py

  1. Data Loading: Reads the NSL-KDD dataset (mapped to commit features like files_changed, lines_added, is_off_hours).
  2. Preprocessing: Scales features using StandardScaler to normalize ranges.
  3. Model Fitting: Trains an IsolationForest to learn the baseline of "normal" commit behavior.
  4. Persistence: Saves the model to commit_pattern_model.pkl and generates a confusion matrix using a holdout test set.
Dependency Anomalies (Unsupervised)
IsolationForest

Implementation: backend/ml_models/anomaly_detection/dependency_anomaly_detector.py

  1. Data Loading: Reads the UNSW-NB15 dataset (mapped to dependency graph features like depth, dev_dependency_count).
  2. Preprocessing: Scales features using StandardScaler.
  3. Model Fitting: Trains an IsolationForest to detect unusual dependency structures (e.g., extremely deep trees or massive dev-dependency bloat).
  4. Persistence: Saves the model to dependency_anomaly_model.pkl and generates a correlation heatmap.

Reports/graphs displayed

Training graphs are requested as images from the backend. Example: GET /api/ml/anomaly-detection/reports/commit_confusion_matrix.png

Data sources

Dataset Used by Purpose
NSL-KDD (Network Intrusion) Commit Pattern Analyzer Provides baseline patterns for anomaly detection (mapped to commit features).
UNSW-NB15 (Network Anomaly) Dependency Anomaly Detector Training set for detecting anomalous dependency structures.

Advanced / Internal Endpoints

Endpoint Purpose
POST /api/ml/anomaly-detection/commit-patterns Internal inference for commit anomalies (used by general infer route).
POST /api/ml/anomaly-detection/pipeline-tampering Detects unauthorized modifications to pipeline configurations (YAML/JSON).
POST /api/ml/anomaly-detection/dependency-anomalies Internal inference for dependency structure anomalies.

6) Component 3 — Integrity Verification (SBOM / Hash / Sign / Merkle / Malware)

Frontend folder: src/components/IntegrityVerification/

Tabs

Tab File What it does
SBOM Generator SBOMGenerator.js Computes SHA-256 for SBOM JSON using backend.
Hash Verification HashVerification.js Compute SHA-256; verify expected hash.
Digital Signatures DigitalSignatures.js Ed25519 keypair generation, signing, signature verification.
Merkle Tree MerkleTree.js Computes Merkle root from provided leaf hashes.
Malware Classification MicrosoftMalware.js Trains and infers a malware family classifier from dataset; shows metrics and predictions.

Buttons and what they do

Button Where Backend endpoint(s) Behavior
Compute Hash (SBOM) SBOMGenerator.js POST /api/ml/integrity-verification/sbom/hash Returns SBOM SHA-256 and dependency count (when derivable).
Compute Hash (content) HashVerification.js POST /api/ml/integrity-verification/hash Returns SHA-256 hash for the provided content.
Verify Hash HashVerification.js POST /api/ml/integrity-verification/hash/verify Returns match/actual hash result.
Generate Keypair DigitalSignatures.js POST /api/ml/integrity-verification/keys/ed25519 Returns Ed25519 public/private PEM strings.
Sign DigitalSignatures.js POST /api/ml/integrity-verification/sign/ed25519 Returns signature (base64) for the payload using the private key.
Verify (signature) DigitalSignatures.js POST /api/ml/integrity-verification/verify/ed25519 Returns valid true/false for payload+signature under public key.
Compute Root MerkleTree.js POST /api/ml/integrity-verification/merkle/root Computes Merkle root and leaf count.
Train Model (malware) MicrosoftMalware.js POST /api/ml/integrity-verification/microsoft-malware/train Trains malware classifier; persists microsoft_malware_model.pkl and writes reports/graphs.
Run Inference (malware) MicrosoftMalware.js POST /api/ml/integrity-verification/microsoft-malware/infer-dataset Runs inference on dataset subset and stores latest predictions.
Refresh (malware) MicrosoftMalware.js GET /api/ml/integrity-verification/microsoft-malware/latest Reloads latest stored predictions.
Ed25519 signature workflow (exact sequence) keys → sign → verify
expand
POST /api/ml/integrity-verification/keys/ed25519
Response:
{
  "success": true,
  "private_key_pem": "-----BEGIN PRIVATE KEY----- ...",
  "public_key_pem": "-----BEGIN PUBLIC KEY----- ..."
}

POST /api/ml/integrity-verification/sign/ed25519
Body:
{
  "payload": "hello",
  "private_key_pem": "-----BEGIN PRIVATE KEY----- ..."
}
Response:
{
  "success": true,
  "signature_b64": "..."
}

POST /api/ml/integrity-verification/verify/ed25519
Body:
{
  "payload": "hello",
  "public_key_pem": "-----BEGIN PUBLIC KEY----- ...",
  "signature_b64": "..."
}
Response:
{
  "success": true,
  "valid": true
}
                

How Training Works (Step-by-Step)

Microsoft Malware Classification (Supervised)
SGDClassifier

Implementation: backend/ml_models/integrity_verification/microsoft_malware_classifier.py

  1. Data Loading: Reads Microsoft Malware Classification Challenge/data.csv.
  2. Feature Selection: Selects relevant numerical columns (e.g., resource sizes, section counts) and the class label.
  3. Preprocessing: Scales features using StandardScaler.
  4. Model Fitting: Trains an SGDClassifier (Stochastic Gradient Descent) optimized for large datasets to classify malware families.
  5. Persistence: Saves the model to microsoft_malware_model.pkl and generates accuracy curves and confusion matrices.

Advanced / Internal Endpoints

Endpoint Purpose
POST /api/ml/integrity-verification/detect-tampering Checks artifacts for integrity violations against known signatures/hashes.

7) Component 4 — Forensic Chain-of-Custody & Reporting Portal

Frontend folder: src/components/ForensicReportingPortal/

This component consumes real evidence from the backend (CI/CD evidence state + anomaly latest results), constructs a timeline, provides custody logs, verifies integrity proofs via the hash API, generates reports, and includes an ML Models sub-tab using the Reporting Portal datasets.

Tabs

Tab Backend data source How it works
Evidence Visualization GET /api/ml/reporting-portal/forensic-data Shows evidence timeline with filters (type + timeframe) and metadata.
Integrity Proofs GET /api/ml/reporting-portal/forensic-data Shows proof list derived from timeline; verify button calls hash verify API.
Chain of Custody GET /api/ml/reporting-portal/custody Displays custody logs with expandable history + checksum.
Report Generator POST /api/ml/reporting-portal/report/generate Generates a report payload; export buttons download JSON to disk.
ML Models /api/ml/reporting-portal/* Train/infer the Reporting Portal classifiers and display their report graphs.

Buttons and what they do

Button / Control Where Backend endpoint(s) Behavior
Verify (integrity proof) IntegrityProofs.js POST /api/ml/integrity-verification/hash/verify Verifies that the backend recomputed SHA-256 of the stored proof payload matches the expected hash.
Generate Report ReportGenerator.js POST /api/ml/reporting-portal/report/generate Creates a report object containing sections, evidence count, timeline, integrity proofs, custody logs.
Export as PDF/JSON/XML ReportGenerator.js No backend call Downloads the generated report as a JSON blob (filename extension varies).
Train Model (Reporting ML) ReportingPortalModels.js POST /api/ml/reporting-portal/train Trains either attack_type or breach_type model; updates training_state and reports list.
Infer Dataset (Reporting ML) ReportingPortalModels.js POST /api/ml/reporting-portal/infer-dataset Runs inference on a dataset subset and stores latest predictions for that model.
Infer Sample (Reporting ML) ReportingPortalModels.js POST /api/ml/reporting-portal/infer-sample Predicts a single record provided as JSON in the UI textarea.
Reporting ML: “Infer Sample” JSON format required fields
expand

The UI textarea sends your JSON object directly to the backend. Use the dataset column names as keys.

Attack Type (from Attack_Dataset.csv)

POST /api/ml/reporting-portal/infer-sample
Body:
{
  "model": "attack_type",
  "sample": {
    "Attack Name": "Phishing",
    "Description": "Credential harvesting email campaign targeting employees"
  }
}
Response:
{
  "success": true,
  "prediction": "Phishing",
  "confidence": 0.81
}
                

Breach Type (from Cyber Security Breaches.csv)

POST /api/ml/reporting-portal/infer-sample
Body:
{
  "model": "breach_type",
  "sample": {
    "Organization": "Example Corp",
    "Type_of_Breach": "Hacking",
    "Summary": "Unauthorized access and data exfiltration detected",
    "Records_Lost": 120000
  }
}
                

For breach_type inference, Type_of_Breach is the training label; for real-time prediction, you can omit it (backend will treat it as unknown and predict it).

Where the Reporting Portal data comes from real evidence
expand

GET /api/ml/reporting-portal/forensic-data composes:

Source Backend storage Used to render
CI/CD evidence items backend/data/cicd_state.json Timeline events + custody logs
Anomaly Detection “latest” results backend memory (latest_results) Additional timeline anomalies
Integrity proofs Derived per timeline item Integrity Proofs tab

How Training Works (Step-by-Step)

Attack Type Classification (Supervised)
LogisticRegression + TF-IDF

Implementation: backend/ml_models/reporting_portal/attack_type_classifier.py

  1. Data Loading: Reads Attack_Dataset.csv.
  2. Text Construction: Concatenates text columns (Title, Description, Impact, etc.) into a single "feature text" string per row.
  3. Vectorization: Applies TF-IDF (up to 12,000 features, 1-2 n-grams) to convert text to vectors.
  4. Model Fitting: Trains a LogisticRegression model to classify the specific Attack Type.
  5. Persistence: Saves the model pipeline to attack_type_model.pkl.
Breach Type Classification (Supervised)
ColumnTransformer (Mixed Data)

Implementation: backend/ml_models/reporting_portal/breach_type_classifier.py

  1. Data Loading: Reads Cyber Security Breaches.csv.
  2. Feature Engineering: Uses a ColumnTransformer to handle mixed data types:
    • Text: TF-IDF on the Summary column.
    • Categorical: One-Hot Encoding on State, etc.
    • Numerical: Median imputation on Individuals_Affected, year, etc.
  3. Model Fitting: Trains a LogisticRegression classifier on the combined feature set to predict Type_of_Breach.
  4. Persistence: Saves the complex pipeline to breach_type_model.pkl.

Reporting Portal ML datasets

Dataset file Model Label column Reports directory
backend/ml_models/reporting_portal/Attack_Dataset.csv attack_type Attack Type backend/ml_models/reporting_portal/reports/attack_type/
backend/ml_models/reporting_portal/Cyber Security Breaches.csv breach_type Type_of_Breach backend/ml_models/reporting_portal/reports/breach_type/

Advanced / Internal Endpoints

Endpoint Purpose
POST /api/ml/reporting-portal/correlate Correlates discrete evidence items into a unified attack timeline.

8) API Index (Endpoints)

All endpoints are served by Flask in backend/api/ml_api.py.

GET  /api/ml/health

GET  /api/ml/cicd-agents/status
GET  /api/ml/cicd-agents/evidence?limit=200
POST /api/ml/cicd-agents/collect
GET  /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config
POST /api/ml/cicd-agents/config/reset
POST /api/ml/cicd-agents/register

GET  /api/ml/cicd-agents/hdfs/status
POST /api/ml/cicd-agents/hdfs/train
POST /api/ml/cicd-agents/hdfs/infer-dataset
GET  /api/ml/cicd-agents/hdfs/latest
GET  /api/ml/cicd-agents/hdfs/reports/<filename>

GET  /api/ml/cicd-agents/hadoop/status
POST /api/ml/cicd-agents/hadoop/train
POST /api/ml/cicd-agents/hadoop/infer-dataset
GET  /api/ml/cicd-agents/hadoop/latest
GET  /api/ml/cicd-agents/hadoop/reports/<filename>

GET  /api/ml/anomaly-detection/status
POST /api/ml/anomaly-detection/train
POST /api/ml/anomaly-detection/infer
POST /api/ml/anomaly-detection/infer-sample
GET  /api/ml/anomaly-detection/latest?model=commit_pattern
GET  /api/ml/anomaly-detection/reports/<filename>

POST /api/ml/integrity-verification/hash
POST /api/ml/integrity-verification/hash/verify
POST /api/ml/integrity-verification/sbom/hash
POST /api/ml/integrity-verification/merkle/root
POST /api/ml/integrity-verification/keys/ed25519
POST /api/ml/integrity-verification/sign/ed25519
POST /api/ml/integrity-verification/verify/ed25519

GET  /api/ml/integrity-verification/microsoft-malware/status
POST /api/ml/integrity-verification/microsoft-malware/train
POST /api/ml/integrity-verification/microsoft-malware/infer-dataset
GET  /api/ml/integrity-verification/microsoft-malware/latest
GET  /api/ml/integrity-verification/microsoft-malware/reports/<filename>

GET  /api/ml/reporting-portal/status
POST /api/ml/reporting-portal/train
POST /api/ml/reporting-portal/infer-sample
POST /api/ml/reporting-portal/infer-dataset
GET  /api/ml/reporting-portal/latest?model=attack_type
GET  /api/ml/reporting-portal/reports/<model>/<filename>
GET  /api/ml/reporting-portal/forensic-data
GET  /api/ml/reporting-portal/custody
POST /api/ml/reporting-portal/report/generate
            

9) PM2 Runtime

PM2 config is backend/ecosystem.config.js. The ML API process: forensic-ml-api runs backend/api/ml_api.py using the venv interpreter backend/venv/bin/python3, port 5000.

cd /var/www/forensic-portal/backend
pm2 start ecosystem.config.js
pm2 restart ecosystem.config.js --update-env
pm2 logs forensic-ml-api
pm2 save
            

10) curl Test Commands (Full Catalog)

Set BASE once and run any command. These are copy-pasteable smoke tests for every model and non-ML integrity function. All API routes are under /api/ml/.

BASE=https://monitoringsystem.space
HJSON='Content-Type: application/json'
            

Health

# Check if ML API is reachable; returns {"success": true}
curl -sS $BASE/api/ml/health
            

CI/CD Agents

# Get total agents and their statuses
curl -sS $BASE/api/ml/cicd-agents/status

# List recently collected evidence items (e.g., git-diffs, logs)
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=50

# View current agent configuration (what to capture)
curl -sS $BASE/api/ml/cicd-agents/config

# Register a new agent (simulated registration)
curl -sS -X POST $BASE/api/ml/cicd-agents/register -H "$HJSON" -d '{"name":"curl-agent-1","type":"Jenkins"}'

# Update configuration (disable auto_collect, set interval)
curl -sS -X POST $BASE/api/ml/cicd-agents/config -H "$HJSON" -d '{"config":{"auto_collect":false,"collection_interval_seconds":30}}'

# Reset configuration to defaults
curl -sS -X POST $BASE/api/ml/cicd-agents/config/reset -H "$HJSON" -d '{}'

# Trigger manual evidence collection from dataset (creates evidence items)
curl -sS -X POST $BASE/api/ml/cicd-agents/collect -H "$HJSON" -d '{"mode":"dataset"}'

# Verify new evidence was added
curl -sS $BASE/api/ml/cicd-agents/evidence?limit=20
            

CI/CD Agents — HDFS Log Anomaly Detection

# Check training status of HDFS model
curl -sS $BASE/api/ml/cicd-agents/hdfs/status

# Train HDFS model (unsupervised) on 20k log lines
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/train -H "$HJSON" -d '{"limit":20000}'

# Verify status updates to "trained"
curl -sS $BASE/api/ml/cicd-agents/hdfs/status

# Run inference on dataset (limit 5000 lines) and store results
curl -sS -X POST $BASE/api/ml/cicd-agents/hdfs/infer-dataset -H "$HJSON" -d '{"limit":5000,"store":true}'

# Retrieve the stored inference results (anomalies)
curl -sS $BASE/api/ml/cicd-agents/hdfs/latest

# Download generated graphs and training summary
curl -sS -o hdfs_dataset_matrix.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_dataset_matrix.png
curl -sS -o hdfs_score_distribution.png $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_score_distribution.png
curl -sS -o hdfs_training_summary.json $BASE/api/ml/cicd-agents/hdfs/reports/hdfs_training_summary.json
            

CI/CD Agents — Hadoop Log Failure Classification

# Check training status of Hadoop model
curl -sS $BASE/api/ml/cicd-agents/hadoop/status

# Train Hadoop supervised model on 12k log entries
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/train -H "$HJSON" -d '{"limit":12000}'

# Verify status is "trained"
curl -sS $BASE/api/ml/cicd-agents/hadoop/status

# Run classification on dataset and store failures
curl -sS -X POST $BASE/api/ml/cicd-agents/hadoop/infer-dataset -H "$HJSON" -d '{"limit":3000,"store":true}'

# Retrieve latest detected failures
curl -sS $BASE/api/ml/cicd-agents/hadoop/latest

# Download classification metrics and confusion matrix
curl -sS -o hadoop_confusion_matrix.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_confusion_matrix.png
curl -sS -o hadoop_metrics.png $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_metrics.png
curl -sS -o hadoop_training_summary.json $BASE/api/ml/cicd-agents/hadoop/reports/hadoop_training_summary.json
            

Anomaly Detection Engine

# Check status of both anomaly models
curl -sS $BASE/api/ml/anomaly-detection/status

# Train both Commit Pattern and Dependency Anomaly models
curl -sS -X POST $BASE/api/ml/anomaly-detection/train -H "$HJSON" -d '{"models":["commit_pattern","dependency_anomaly"],"limit":2000}'

# Verify both are trained
curl -sS $BASE/api/ml/anomaly-detection/status

# Run inference for Commit Patterns and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"commit_pattern","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=commit_pattern

# Run inference for Dependency Anomalies and store results
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer -H "$HJSON" -d '{"model":"dependency_anomaly","limit":500,"store":true}'
curl -sS $BASE/api/ml/anomaly-detection/latest?model=dependency_anomaly

# Infer a single sample (manual JSON input)
curl -sS -X POST $BASE/api/ml/anomaly-detection/infer-sample -H "$HJSON" -d '{"model":"commit_pattern","sample":{"commit_message_length":240,"files_changed":12,"lines_added":900,"lines_removed":20,"is_off_hours":1,"is_weekend":0,"timestamp_hour":2,"day_of_week":6,"time_since_last_commit":3600,"author_commit_count":1}}'

# Download generated anomaly reports
curl -sS -o commit_confusion_matrix.png $BASE/api/ml/anomaly-detection/reports/commit_confusion_matrix.png
curl -sS -o commit_metrics.png $BASE/api/ml/anomaly-detection/reports/commit_metrics.png
curl -sS -o dependency_heatmap.png $BASE/api/ml/anomaly-detection/reports/dependency_correlation_heatmap.png
curl -sS -o training_summary.json $BASE/api/ml/anomaly-detection/reports/anomaly_training_summary.json
            

Integrity Verification (Hash / SBOM / Merkle / Ed25519)

# Compute SHA-256 hash of a string
curl -sS -X POST $BASE/api/ml/integrity-verification/hash -H "$HJSON" -d '{"content":"hello"}'

# Verify content matches an expected hash
curl -sS -X POST $BASE/api/ml/integrity-verification/hash/verify -H "$HJSON" -d '{"content":"hello","expected_hash":"deadbeef"}'

# Compute hash of an SBOM JSON structure
curl -sS -X POST $BASE/api/ml/integrity-verification/sbom/hash -H "$HJSON" -d '{"sbom_json":"{\"name\":\"demo\",\"dependencies\":[{\"name\":\"a\",\"version\":\"1.0.0\"}]}"}'

# Compute Merkle Root for a list of leaf hashes
curl -sS -X POST $BASE/api/ml/integrity-verification/merkle/root -H "$HJSON" -d '{"hashes":["a3f1","b2c9","c0d1"]}'

# Ed25519: Generate Keypair, Sign, and Verify
KEYS=$(curl -sS -X POST $BASE/api/ml/integrity-verification/keys/ed25519 -H "$HJSON")
PRIV=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["private_key_pem"])')
PUB=$(printf '%s' "$KEYS" | python3 -c 'import sys,json;print(json.load(sys.stdin)["public_key_pem"])')
SIG=$(curl -sS -X POST $BASE/api/ml/integrity-verification/sign/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"private_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PRIV")}" | python3 -c 'import sys,json;print(json.load(sys.stdin)["signature_b64"])')
curl -sS -X POST $BASE/api/ml/integrity-verification/verify/ed25519 -H "$HJSON" -d "{\"payload\":\"hello\",\"public_key_pem\":$(python3 -c 'import json,sys;print(json.dumps(sys.stdin.read()))' <<<"$PUB"),\"signature_b64\":\"$SIG\"}"
            

Integrity Verification — Microsoft Malware Classification

# Check status of malware model
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status

# Train malware classifier (supervised SGD)
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/train -H "$HJSON" -d '{"limit":30000}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/status

# Run inference on dataset and store results
curl -sS -X POST $BASE/api/ml/integrity-verification/microsoft-malware/infer-dataset -H "$HJSON" -d '{"limit":500,"store":true}'
curl -sS $BASE/api/ml/integrity-verification/microsoft-malware/latest

# Download malware classification reports
curl -sS -o microsoft_malware_confusion_matrix.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_confusion_matrix.png
curl -sS -o microsoft_malware_metrics.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_metrics.png
curl -sS -o microsoft_malware_accuracy_curve.png $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_accuracy_curve.png
curl -sS -o microsoft_malware_training_summary.json $BASE/api/ml/integrity-verification/microsoft-malware/reports/microsoft_malware_training_summary.json
            

Reporting Portal (Evidence/Custody/Report)

# Fetch timeline evidence (from CI/CD + Anomalies)
curl -sS $BASE/api/ml/reporting-portal/forensic-data

# Fetch chain of custody logs
curl -sS $BASE/api/ml/reporting-portal/custody

# Generate a full forensic report JSON
curl -sS -X POST $BASE/api/ml/reporting-portal/report/generate -H "$HJSON" -d '{"format":"json"}'
            

Reporting Portal — ML Models (Attack Type / Breach Type)

# Check status of reporting models
curl -sS $BASE/api/ml/reporting-portal/status

# Train Attack Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["attack_type"],"limit_attack":6000}'

# Train Breach Type model
curl -sS -X POST $BASE/api/ml/reporting-portal/train -H "$HJSON" -d '{"models":["breach_type"]}'

# Infer Attack Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"attack_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=attack_type

# Infer Breach Type on dataset and store
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-dataset -H "$HJSON" -d '{"model":"breach_type","limit":200,"store":true}'
curl -sS $BASE/api/ml/reporting-portal/latest?model=breach_type

# Infer single sample: Attack Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"attack_type","sample":{"Attack Name":"Malware","Description":"Ransomware infection spreading laterally in network"}}'

# Infer single sample: Breach Type
curl -sS -X POST $BASE/api/ml/reporting-portal/infer-sample -H "$HJSON" -d '{"model":"breach_type","sample":{"Organization":"Example Corp","Summary":"Unauthorized access and data exfiltration detected","Records_Lost":120000}}'

# Download reporting model graphs
curl -sS -o attack_type_metrics.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_metrics.png
curl -sS -o attack_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_confusion_matrix.png
curl -sS -o attack_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/attack_type/attack_type_learning_curve.png
curl -sS -o breach_type_metrics.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_metrics.png
curl -sS -o breach_type_confusion_matrix.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_confusion_matrix.png
curl -sS -o breach_type_learning_curve.png $BASE/api/ml/reporting-portal/reports/breach_type/breach_type_learning_curve.png
            

11) Troubleshooting

404 Not Found

Usually means Nginx isn’t proxying /api/ml/ to the Flask backend.

curl -i https://your-host/api/ml/health
                
502 Bad Gateway

Backend process not running, wrong port, or PM2 stopped. Check PM2 logs.

cd /var/www/forensic-portal/backend
pm2 status
pm2 logs forensic-ml-api
                
504 Gateway Timeout

Training took longer than proxy timeout. Increase proxy timeouts for /api/ml/ or use lower limits.

Model not trained

Inference endpoints require a trained model. Train first, then infer, then fetch /latest.