# compliance-scan - Detailed Guide LLM-optimized developer reference for compliance scanning. **Quick Reference**: Entry points, architecture, workflows, compliance logic, and extension guide. ## Entry Points | Function | Module | Purpose | | ------------------------------- | ----------------------------- | --------------------------- | | `scan_tls(hostname, port)` | `sslysze_scan.scanner` | TLS/SSL scan | | `scan_ssh(hostname, port)` | `sslysze_scan.ssh_scanner` | SSH scan | | `write_scan_results(...)` | `sslysze_scan.db.writer` | Persist scan data (core) | | `fetch_scan_data(db_path, id)` | `sslysze_scan.reporter.query` | Retrieve scan for reporting | | `fetch_scans(db_path)` | `sslysze_scan.reporter.query` | List all scans | | `fetch_scan_metadata(db, id)` | `sslysze_scan.reporter.query` | Get scan metadata only | | `check_compliance(db_path, id)` | `sslysze_scan.db.compliance` | Validate against BSI/IANA | | `generate_csv_reports(...)` | `sslysze_scan.reporter` | Generate CSV reports | | `generate_markdown_report(...)` | `sslysze_scan.reporter` | Generate Markdown report | ## Architecture ``` CLI (commands/) → Scanner (scanner.py, ssh_scanner.py) ↓ Database (db/writer.py, db/compliance.py) ↓ Reporter (reporter/query.py, reporter/*.py) ``` **Key Modules:** - `commands/` - CLI command handlers (scan, report, update-iana) - `scanner.py` - TLS/SSL scanning via SSLyze - `ssh_scanner.py` - SSH scanning via ssh-audit - `db/writer.py` - Core database write operations (scan records, host info) - `db/tls_writer.py` - TLS-specific database operations - `db/compliance.py` - BSI/IANA compliance validation - `db/compliance_config.py` - Compliance check configurations - `db/generic_compliance.py` - Generic compliance logic - `reporter/query.py` - Database queries for reporting - `reporter/*.py` - Report generation (CSV, Markdown, reStructuredText) ## CLI Commands ### Scan ```bash compliance-scan scan :[,...] [--print] [-db ] ``` Examples: ```bash compliance-scan scan example.com:443,636 compliance-scan scan [2001:db8::1]:22 --print ``` ### Report ```bash compliance-scan report [scan_id] -t [-o ] [--output-dir ] ``` Examples: ```bash compliance-scan report -t md -o report.md compliance-scan report 5 -t csv --output-dir ./reports compliance-scan report --list ``` ### Update IANA ```bash compliance-scan update-iana [-db ] ``` ## Database ### Scan Data Tables | Table | Columns | Purpose | | -------------------------------- | ------------------------------------------------------- | ------------------------- | | `scans` | scan_id, hostname, ports, timestamp, duration | Scan metadata | | `scanned_hosts` | scan_id, fqdn, ipv4, ipv6 | Resolved addresses | | `scan_cipher_suites` | scan_id, port, tls_version, cipher_suite_name, accepted | TLS cipher results | | `scan_supported_groups` | scan_id, port, group_name | Elliptic curves/DH groups | | `scan_certificates` | scan_id, port, position, subject, key_type, key_bits | Certificate chain | | `scan_ssh_kex_methods` | scan_id, port, kex_method_name | SSH key exchange | | `scan_ssh_encryption_algorithms` | scan_id, port, encryption_algorithm_name | SSH encryption | | `scan_ssh_mac_algorithms` | scan_id, port, mac_algorithm_name | SSH MAC | | `scan_ssh_host_keys` | scan_id, port, host_key_algorithm, key_type, key_bits | SSH host keys | | `scan_compliance_status` | scan_id, port, check_type, item_name, passed | Compliance results | ### Reference Data Tables | Table | Source | Purpose | | --------------------------------- | --------------- | ---------------------------- | | `iana_tls_cipher_suites` | IANA TLS | Cipher suite recommendations | | `iana_tls_supported_groups` | IANA TLS | Group recommendations | | `iana_ssh_kex_methods` | IANA SSH | SSH KEX recommendations | | `bsi_compliance_rules` | BSI TR-02102-\* | Unified compliance rules | | `bsi_tr_02102_1_key_requirements` | BSI TR-02102-1 | Certificate key sizes | ### Unified BSI Schema The `bsi_compliance_rules` table consolidates all BSI TR-02102-2 and TR-02102-4 compliance data. | Column | Type | Description | | ------------------ | ------- | ------------------------------------- | | `standard` | TEXT | TR-02102-2, TR-02102-4 | | `category` | TEXT | cipher_suite, dh_group, ssh_kex, etc. | | `algorithm_name` | TEXT | Algorithm/cipher/method name | | `additional_param` | TEXT | Optional context (e.g., TLS version) | | `valid_from` | INTEGER | Start year | | `valid_until` | INTEGER | End year (NULL = no expiration) | | `specification` | TEXT | Reference (RFC, etc.) | | `notes` | TEXT | Additional remarks | ### Views | View | Purpose | | ---------------------------------------- | ------------------------------------ | | `v_compliance_tls_cipher_suites` | TLS cipher suites + compliance flags | | `v_compliance_tls_supported_groups` | TLS groups + compliance flags | | `v_compliance_tls_certificates` | Certificates + key size compliance | | `v_compliance_ssh_kex_methods` | SSH KEX + compliance flags | | `v_compliance_ssh_encryption_algorithms` | SSH encryption + compliance flags | | `v_compliance_ssh_mac_algorithms` | SSH MAC + compliance flags | | `v_compliance_ssh_host_keys` | SSH host keys + compliance flags | | `v_summary_port_compliance` | Aggregated compliance per port | | `v_summary_missing_bsi_groups` | Missing BSI-approved groups | | `v_summary_missing_iana_groups` | Missing IANA-recommended groups | ## Workflows ### Scan ```python # 1. Parse CLI hostname, ports = parse_host_ports("example.com:443,22") # 2. Perform scans tls_result, tls_duration = scan_tls(hostname, 443) ssh_result, ssh_duration = scan_ssh(hostname, 22) scan_results = {443: tls_result, 22: ssh_result} # 3. Write to database scan_id = write_scan_results( db_path="compliance_status.db", hostname=hostname, ports=[443, 22], scan_results=scan_results, scan_start_time=datetime.now(UTC), scan_duration=tls_duration + ssh_duration ) # 4. Check compliance check_compliance(db_path, scan_id) ``` ### Report ```python # 1. Fetch data (uses views internally) data = fetch_scan_data(db_path, scan_id) # 2. Generate report generate_csv_reports(db_path, scan_id, output_dir="./reports") generate_markdown_report(db_path, scan_id, output_file="report.md") ``` ## Compliance ### Configuration Compliance checks are defined in `db/compliance_config.py` via `COMPLIANCE_CONFIGS`: - `cipher_suites` - TLS cipher suite validation (only accepted cipher suites, unique per port) - `supported_groups` - TLS group validation (all discovered groups) - `ssh_kex` - SSH key exchange validation - `ssh_encryption` - SSH encryption validation - `ssh_mac` - SSH MAC validation - `ssh_host_keys` - SSH host key validation Each config maps scan tables to IANA/BSI reference tables. **Filtering**: Cipher suites use `scan_filter_column: "accepted"` with `scan_filter_value: 1` to check only accepted cipher suites. Other checks evaluate all discovered items. **Duplicate Prevention**: The compliance query uses `DISTINCT` on `(port, algorithm_name)` to count unique algorithms per port. Cipher suites tested across multiple TLS versions are counted once. ### Validation Logic Function `check_compliance_generic()` in `db/generic_compliance.py`: 1. Query unique items from scan tables using `DISTINCT` 2. Join with IANA and BSI reference tables 3. Check BSI approval first (higher priority) 4. Verify validity period if BSI-approved 5. Fall back to IANA recommendation if not BSI-approved 6. Assign severity: `info` (passed), `warning` (deprecated), `critical` (failed) 7. Store one result per unique item in `scan_compliance_status` table **SSH Duplicate Prevention**: SSH scanner (`ssh_scanner.py`) uses `set()` to track unique encryption and MAC algorithms. Only `client_to_server` lists are populated and stored. Database writer (`scan_data_types.py`) no longer concatenates `client_to_server` and `server_to_client` lists to avoid duplicates. ### Certificate Validation Function `check_certificate_compliance()` validates against BSI TR-02102-1: 1. Extract key type (RSA, ECDSA, DSA) 2. Query `bsi_tr_02102_1_key_requirements` for minimum key size 3. Verify key size and algorithm validity period 4. Validate signature hash algorithm 5. Store result in `scan_compliance_status` table ## Standards ### BSI TR-02102-1 - Certificates | Algorithm | Min Bits | Valid Until | | --------- | -------- | ----------------- | | RSA | 3000 | - | | ECDSA | 250 | - | | DSA | 3072 | 2029 (deprecated) | ### BSI TR-02102-2 - TLS Stored in `bsi_compliance_rules`: - `category='cipher_suite'` - Approved TLS cipher suites - `category='dh_group'` - Approved elliptic curves and DH groups - `valid_until` - End year (NULL = no expiration) ### BSI TR-02102-4 - SSH Stored in `bsi_compliance_rules`: - `category='ssh_kex'` - Approved key exchange methods - `category='ssh_encryption'` - Approved encryption algorithms - `category='ssh_mac'` - Approved MAC algorithms - `category='ssh_host_key'` - Approved host key types ### IANA Recommendations Column `recommended` values: - `Y` - Recommended - `N` - Not recommended - `D` - Deprecated ## Testing ### Test Structure ``` tests/ ├── cli/ # CLI parsing ├── scanner/ # TLS/SSH scan functions ├── db/ # Database queries ├── compliance/ # BSI/IANA validation, duplicate detection ├── iana/ # IANA import/validation ├── reporter/ # CSV/MD/reST export ├── fixtures/ # Test data └── conftest.py # Shared fixtures ``` **Compliance Tests**: - `test_no_duplicates.py` - Verifies no duplicate compliance checks - `test_compliance_with_realistic_data.py` - Realistic scan scenarios - `test_plausible_compliance.py` - Plausibility checks - `test_summary_ssh_duplicates.py` - Verifies SSH algorithms counted once (no duplicates) ### Run Tests ```bash # All tests poetry run pytest tests/ -v # Specific category poetry run pytest tests/scanner/ -v # With coverage poetry run pytest tests/ --cov=src/sslysze_scan ``` ## Development ### Code Style ```bash poetry run ruff check src/ tests/ poetry run ruff format src/ tests/ ``` ### Requirements - Python 3.13+ - SSLyze 6.0.0+ - ssh-audit (external tool) - Poetry - Ruff ### Module Sizes | File | Lines | Purpose | | -------------------------- | ----- | ------------------------ | | `scanner.py` | ~225 | TLS scanning logic | | `ssh_scanner.py` | ~240 | SSH scanning logic | | `db/writer.py` | ~172 | Core database operations | | `db/tls_writer.py` | ~700 | TLS-specific write ops | | `reporter/query.py` | ~850 | Database read operations | | `db/compliance.py` | ~205 | Compliance checking | | `db/compliance_config.py` | ~80 | Compliance configuration | | `db/generic_compliance.py` | ~236 | Generic compliance logic | ## Extending ### Add Compliance Standard 1. Insert data into `bsi_compliance_rules` with new category 2. Add entry to `COMPLIANCE_CONFIGS` in `db/compliance_config.py` 3. Create test in `tests/compliance/` 4. Create database view for reporting if needed ### Add Report Format 1. Create `reporter/format_export.py` with `generate_format_report()` function 2. Use `fetch_scan_data()` for data retrieval 3. Register in `reporter/generate.py` dispatcher ### Add Scanner Feature 1. Extend `scanner.py` or `ssh_scanner.py` 2. Update `db/writer.py` to persist new data 3. Create database table and view 4. Add compliance rules to `bsi_compliance_rules` if applicable **SSH Parser Notes**: - SSH host key bits are parsed from ssh-audit output using regex `(\d+)-?bit` - SSH encryption/MAC algorithms use `set()` for duplicate detection - Only `client_to_server` lists are populated (not both directions) ## Database Management ### Template Database File: `src/sslysze_scan/data/crypto_standards.db` Contains reference data (IANA, BSI), schema definitions, and views. Schema changes are applied directly to template. New scan databases are created by copying template. ### CSV Headers File: `data/csv_headers.json` Defines CSV column headers for all compliance views. Headers are stored in `csv_export_metadata` table in the database. **SSH Host Keys CSV**: Columns are Algorithm, Type, Bits, BSI Approved, BSI Valid Until, Compliant (no fingerprint column). ## Summary Statistics Summary calculation in `reporter/query.py` via `_calculate_summary()`: **TLS Metrics**: - `ports_with_tls` - Ports with TLS/SSL support - `total_cipher_suites` - Accepted cipher suites checked - `compliant_cipher_suites` - Cipher suites passing BSI/IANA validation - `total_groups` - Supported groups discovered - `compliant_groups` - Groups passing BSI/IANA validation **SSH Metrics**: - `ports_with_ssh` - Ports with SSH support - `total_ssh_kex` - KEX methods discovered (unique per port) - `compliant_ssh_kex` - KEX methods passing BSI/IANA validation - `total_ssh_encryption` - Encryption algorithms discovered (unique, no duplicates) - `compliant_ssh_encryption` - Encryption algorithms passing validation - `total_ssh_mac` - MAC algorithms discovered (unique, no duplicates) - `compliant_ssh_mac` - MAC algorithms passing validation - `total_ssh_host_keys` - Host keys discovered - `compliant_ssh_host_keys` - Host keys passing validation - `total_ssh_items` - Sum of all SSH items - `ssh_overall_percentage` - Overall SSH compliance rate **Summary CSV Output** (`csv_export.py`): - Includes both TLS and SSH metrics - Shows counts and percentages for each category - Reports critical vulnerabilities count ## Query Optimization **Modular Design** (`reporter/query.py`): `fetch_scan_data()` delegates to 12 focused helper functions: - `_fetch_tls_cipher_suites()` - TLS cipher suites with version detection - `_fetch_tls_supported_groups()` - TLS elliptic curves and DH groups - `_fetch_tls_certificates()` - Certificate chain with compliance - `_fetch_vulnerabilities()` - Known vulnerabilities - `_fetch_protocol_features()` - Protocol-level features - `_fetch_session_features()` - Session resumption data - `_fetch_http_headers()` - HTTP security headers - `_fetch_compliance_summary()` - Per-port compliance stats - `_fetch_ssh_kex_methods()` - SSH key exchange algorithms - `_fetch_ssh_encryption()` - SSH encryption algorithms - `_fetch_ssh_mac()` - SSH MAC algorithms - `_fetch_ssh_host_keys()` - SSH host keys with compliance Each helper function: - Has single responsibility - Returns structured data (dict/list) - Uses database views for compliance joins - Minimal coupling to main function **Benefits**: - Main function reduced from 387 to ~35 lines - Easy to test individual data fetchers - Clear separation between TLS and SSH queries - Consistent error handling per data type ## Writer Modularization **Separation of Concerns** (`db/writer.py` → `db/writer.py` + `db/tls_writer.py`): Original `writer.py` (929 lines) split into: **`db/writer.py`** (172 lines) - Core operations: - `write_scan_results()` - Main entry point - `_insert_scan_record()` - Scan metadata - `_resolve_hostname()` - DNS resolution - `_save_host_info()` - Host information - `_save_ssh_scan_results()` - SSH wrapper (delegates to generic_writer) **`db/tls_writer.py`** (700 lines) - TLS-specific operations: - `save_cipher_suites()` - TLS cipher suite persistence - `save_supported_groups()` - Elliptic curves and DH groups - `save_dhe_groups_from_cipher_suites()` - DHE group extraction - `save_certificates()` - Certificate chain storage - `save_vulnerabilities()` - Heartbleed, ROBOT, CCS injection - `save_protocol_features()` - Compression, early data, fallback SCSV - `save_session_features()` - Renegotiation and resumption - `save_http_headers()` - Security headers (HSTS, HPKP, Expect-CT) - FFDHE helper functions (group name/IANA mapping) **Integration**: - `generic_writer.py` imports from `tls_writer` instead of `writer` - Clean module boundaries: Core vs Protocol-specific - TLS functions now reusable across modules **Benefits**: - 81% reduction in writer.py size (929 → 172 lines) - Clear separation: Core logic vs TLS logic vs SSH logic - Easier navigation and maintenance - Independent TLS module can be tested/modified separately