Files
compliance-scan/docs/detailed-guide.md
Heiko f60de7c2da Add SSH scan support with BSI TR-02102-4 compliance
- SSH scanning via ssh-audit (KEX, encryption, MAC, host keys)
- BSI TR-02102-4 and IANA compliance validation for SSH
- CSV/Markdown/reST reports for SSH results
- Unified compliance schema and database views
- Code optimization: modular query/writer architecture
2026-01-23 11:05:01 +01:00

467 lines
17 KiB
Markdown

# compliance-scan - Detailed Guide
LLM-optimized developer reference for compliance scanning.
**Quick Reference**: Entry points, architecture, workflows, compliance logic, and extension guide.
## Entry Points
| Function | Module | Purpose |
| ------------------------------- | ----------------------------- | --------------------------- |
| `scan_tls(hostname, port)` | `sslysze_scan.scanner` | TLS/SSL scan |
| `scan_ssh(hostname, port)` | `sslysze_scan.ssh_scanner` | SSH scan |
| `write_scan_results(...)` | `sslysze_scan.db.writer` | Persist scan data (core) |
| `fetch_scan_data(db_path, id)` | `sslysze_scan.reporter.query` | Retrieve scan for reporting |
| `fetch_scans(db_path)` | `sslysze_scan.reporter.query` | List all scans |
| `fetch_scan_metadata(db, id)` | `sslysze_scan.reporter.query` | Get scan metadata only |
| `check_compliance(db_path, id)` | `sslysze_scan.db.compliance` | Validate against BSI/IANA |
| `generate_csv_reports(...)` | `sslysze_scan.reporter` | Generate CSV reports |
| `generate_markdown_report(...)` | `sslysze_scan.reporter` | Generate Markdown report |
## Architecture
```
CLI (commands/) → Scanner (scanner.py, ssh_scanner.py)
Database (db/writer.py, db/compliance.py)
Reporter (reporter/query.py, reporter/*.py)
```
**Key Modules:**
- `commands/` - CLI command handlers (scan, report, update-iana)
- `scanner.py` - TLS/SSL scanning via SSLyze
- `ssh_scanner.py` - SSH scanning via ssh-audit
- `db/writer.py` - Core database write operations (scan records, host info)
- `db/tls_writer.py` - TLS-specific database operations
- `db/compliance.py` - BSI/IANA compliance validation
- `db/compliance_config.py` - Compliance check configurations
- `db/generic_compliance.py` - Generic compliance logic
- `reporter/query.py` - Database queries for reporting
- `reporter/*.py` - Report generation (CSV, Markdown, reStructuredText)
## CLI Commands
### Scan
```bash
compliance-scan scan <hostname>:<port>[,<port>...] [--print] [-db <path>]
```
Examples:
```bash
compliance-scan scan example.com:443,636
compliance-scan scan [2001:db8::1]:22 --print
```
### Report
```bash
compliance-scan report [scan_id] -t <csv|md|rest> [-o <file>] [--output-dir <dir>]
```
Examples:
```bash
compliance-scan report -t md -o report.md
compliance-scan report 5 -t csv --output-dir ./reports
compliance-scan report --list
```
### Update IANA
```bash
compliance-scan update-iana [-db <path>]
```
## Database
### Scan Data Tables
| Table | Columns | Purpose |
| -------------------------------- | ------------------------------------------------------- | ------------------------- |
| `scans` | scan_id, hostname, ports, timestamp, duration | Scan metadata |
| `scanned_hosts` | scan_id, fqdn, ipv4, ipv6 | Resolved addresses |
| `scan_cipher_suites` | scan_id, port, tls_version, cipher_suite_name, accepted | TLS cipher results |
| `scan_supported_groups` | scan_id, port, group_name | Elliptic curves/DH groups |
| `scan_certificates` | scan_id, port, position, subject, key_type, key_bits | Certificate chain |
| `scan_ssh_kex_methods` | scan_id, port, kex_method_name | SSH key exchange |
| `scan_ssh_encryption_algorithms` | scan_id, port, encryption_algorithm_name | SSH encryption |
| `scan_ssh_mac_algorithms` | scan_id, port, mac_algorithm_name | SSH MAC |
| `scan_ssh_host_keys` | scan_id, port, host_key_algorithm, key_type, key_bits | SSH host keys |
| `scan_compliance_status` | scan_id, port, check_type, item_name, passed | Compliance results |
### Reference Data Tables
| Table | Source | Purpose |
| --------------------------------- | --------------- | ---------------------------- |
| `iana_tls_cipher_suites` | IANA TLS | Cipher suite recommendations |
| `iana_tls_supported_groups` | IANA TLS | Group recommendations |
| `iana_ssh_kex_methods` | IANA SSH | SSH KEX recommendations |
| `bsi_compliance_rules` | BSI TR-02102-\* | Unified compliance rules |
| `bsi_tr_02102_1_key_requirements` | BSI TR-02102-1 | Certificate key sizes |
### Unified BSI Schema
The `bsi_compliance_rules` table consolidates all BSI TR-02102-2 and TR-02102-4 compliance data.
| Column | Type | Description |
| ------------------ | ------- | ------------------------------------- |
| `standard` | TEXT | TR-02102-2, TR-02102-4 |
| `category` | TEXT | cipher_suite, dh_group, ssh_kex, etc. |
| `algorithm_name` | TEXT | Algorithm/cipher/method name |
| `additional_param` | TEXT | Optional context (e.g., TLS version) |
| `valid_from` | INTEGER | Start year |
| `valid_until` | INTEGER | End year (NULL = no expiration) |
| `specification` | TEXT | Reference (RFC, etc.) |
| `notes` | TEXT | Additional remarks |
### Views
| View | Purpose |
| ---------------------------------------- | ------------------------------------ |
| `v_compliance_tls_cipher_suites` | TLS cipher suites + compliance flags |
| `v_compliance_tls_supported_groups` | TLS groups + compliance flags |
| `v_compliance_tls_certificates` | Certificates + key size compliance |
| `v_compliance_ssh_kex_methods` | SSH KEX + compliance flags |
| `v_compliance_ssh_encryption_algorithms` | SSH encryption + compliance flags |
| `v_compliance_ssh_mac_algorithms` | SSH MAC + compliance flags |
| `v_compliance_ssh_host_keys` | SSH host keys + compliance flags |
| `v_summary_port_compliance` | Aggregated compliance per port |
| `v_summary_missing_bsi_groups` | Missing BSI-approved groups |
| `v_summary_missing_iana_groups` | Missing IANA-recommended groups |
## Workflows
### Scan
```python
# 1. Parse CLI
hostname, ports = parse_host_ports("example.com:443,22")
# 2. Perform scans
tls_result, tls_duration = scan_tls(hostname, 443)
ssh_result, ssh_duration = scan_ssh(hostname, 22)
scan_results = {443: tls_result, 22: ssh_result}
# 3. Write to database
scan_id = write_scan_results(
db_path="compliance_status.db",
hostname=hostname,
ports=[443, 22],
scan_results=scan_results,
scan_start_time=datetime.now(UTC),
scan_duration=tls_duration + ssh_duration
)
# 4. Check compliance
check_compliance(db_path, scan_id)
```
### Report
```python
# 1. Fetch data (uses views internally)
data = fetch_scan_data(db_path, scan_id)
# 2. Generate report
generate_csv_reports(db_path, scan_id, output_dir="./reports")
generate_markdown_report(db_path, scan_id, output_file="report.md")
```
## Compliance
### Configuration
Compliance checks are defined in `db/compliance_config.py` via `COMPLIANCE_CONFIGS`:
- `cipher_suites` - TLS cipher suite validation (only accepted cipher suites, unique per port)
- `supported_groups` - TLS group validation (all discovered groups)
- `ssh_kex` - SSH key exchange validation
- `ssh_encryption` - SSH encryption validation
- `ssh_mac` - SSH MAC validation
- `ssh_host_keys` - SSH host key validation
Each config maps scan tables to IANA/BSI reference tables.
**Filtering**: Cipher suites use `scan_filter_column: "accepted"` with `scan_filter_value: 1` to check only accepted cipher suites. Other checks evaluate all discovered items.
**Duplicate Prevention**: The compliance query uses `DISTINCT` on `(port, algorithm_name)` to count unique algorithms per port. Cipher suites tested across multiple TLS versions are counted once.
### Validation Logic
Function `check_compliance_generic()` in `db/generic_compliance.py`:
1. Query unique items from scan tables using `DISTINCT`
2. Join with IANA and BSI reference tables
3. Check BSI approval first (higher priority)
4. Verify validity period if BSI-approved
5. Fall back to IANA recommendation if not BSI-approved
6. Assign severity: `info` (passed), `warning` (deprecated), `critical` (failed)
7. Store one result per unique item in `scan_compliance_status` table
**SSH Duplicate Prevention**: SSH scanner (`ssh_scanner.py`) uses `set()` to track unique encryption and MAC algorithms. Only `client_to_server` lists are populated and stored. Database writer (`scan_data_types.py`) no longer concatenates `client_to_server` and `server_to_client` lists to avoid duplicates.
### Certificate Validation
Function `check_certificate_compliance()` validates against BSI TR-02102-1:
1. Extract key type (RSA, ECDSA, DSA)
2. Query `bsi_tr_02102_1_key_requirements` for minimum key size
3. Verify key size and algorithm validity period
4. Validate signature hash algorithm
5. Store result in `scan_compliance_status` table
## Standards
### BSI TR-02102-1 - Certificates
| Algorithm | Min Bits | Valid Until |
| --------- | -------- | ----------------- |
| RSA | 3000 | - |
| ECDSA | 250 | - |
| DSA | 3072 | 2029 (deprecated) |
### BSI TR-02102-2 - TLS
Stored in `bsi_compliance_rules`:
- `category='cipher_suite'` - Approved TLS cipher suites
- `category='dh_group'` - Approved elliptic curves and DH groups
- `valid_until` - End year (NULL = no expiration)
### BSI TR-02102-4 - SSH
Stored in `bsi_compliance_rules`:
- `category='ssh_kex'` - Approved key exchange methods
- `category='ssh_encryption'` - Approved encryption algorithms
- `category='ssh_mac'` - Approved MAC algorithms
- `category='ssh_host_key'` - Approved host key types
### IANA Recommendations
Column `recommended` values:
- `Y` - Recommended
- `N` - Not recommended
- `D` - Deprecated
## Testing
### Test Structure
```
tests/
├── cli/ # CLI parsing
├── scanner/ # TLS/SSH scan functions
├── db/ # Database queries
├── compliance/ # BSI/IANA validation, duplicate detection
├── iana/ # IANA import/validation
├── reporter/ # CSV/MD/reST export
├── fixtures/ # Test data
└── conftest.py # Shared fixtures
```
**Compliance Tests**:
- `test_no_duplicates.py` - Verifies no duplicate compliance checks
- `test_compliance_with_realistic_data.py` - Realistic scan scenarios
- `test_plausible_compliance.py` - Plausibility checks
- `test_summary_ssh_duplicates.py` - Verifies SSH algorithms counted once (no duplicates)
### Run Tests
```bash
# All tests
poetry run pytest tests/ -v
# Specific category
poetry run pytest tests/scanner/ -v
# With coverage
poetry run pytest tests/ --cov=src/sslysze_scan
```
## Development
### Code Style
```bash
poetry run ruff check src/ tests/
poetry run ruff format src/ tests/
```
### Requirements
- Python 3.13+
- SSLyze 6.0.0+
- ssh-audit (external tool)
- Poetry
- Ruff
### Module Sizes
| File | Lines | Purpose |
| -------------------------- | ----- | ------------------------ |
| `scanner.py` | ~225 | TLS scanning logic |
| `ssh_scanner.py` | ~240 | SSH scanning logic |
| `db/writer.py` | ~172 | Core database operations |
| `db/tls_writer.py` | ~700 | TLS-specific write ops |
| `reporter/query.py` | ~850 | Database read operations |
| `db/compliance.py` | ~205 | Compliance checking |
| `db/compliance_config.py` | ~80 | Compliance configuration |
| `db/generic_compliance.py` | ~236 | Generic compliance logic |
## Extending
### Add Compliance Standard
1. Insert data into `bsi_compliance_rules` with new category
2. Add entry to `COMPLIANCE_CONFIGS` in `db/compliance_config.py`
3. Create test in `tests/compliance/`
4. Create database view for reporting if needed
### Add Report Format
1. Create `reporter/format_export.py` with `generate_format_report()` function
2. Use `fetch_scan_data()` for data retrieval
3. Register in `reporter/generate.py` dispatcher
### Add Scanner Feature
1. Extend `scanner.py` or `ssh_scanner.py`
2. Update `db/writer.py` to persist new data
3. Create database table and view
4. Add compliance rules to `bsi_compliance_rules` if applicable
**SSH Parser Notes**:
- SSH host key bits are parsed from ssh-audit output using regex `(\d+)-?bit`
- SSH encryption/MAC algorithms use `set()` for duplicate detection
- Only `client_to_server` lists are populated (not both directions)
## Database Management
### Template Database
File: `src/sslysze_scan/data/crypto_standards.db`
Contains reference data (IANA, BSI), schema definitions, and views.
Schema changes are applied directly to template.
New scan databases are created by copying template.
### CSV Headers
File: `data/csv_headers.json`
Defines CSV column headers for all compliance views. Headers are stored in `csv_export_metadata` table in the database.
**SSH Host Keys CSV**: Columns are Algorithm, Type, Bits, BSI Approved, BSI Valid Until, Compliant (no fingerprint column).
## Summary Statistics
Summary calculation in `reporter/query.py` via `_calculate_summary()`:
**TLS Metrics**:
- `ports_with_tls` - Ports with TLS/SSL support
- `total_cipher_suites` - Accepted cipher suites checked
- `compliant_cipher_suites` - Cipher suites passing BSI/IANA validation
- `total_groups` - Supported groups discovered
- `compliant_groups` - Groups passing BSI/IANA validation
**SSH Metrics**:
- `ports_with_ssh` - Ports with SSH support
- `total_ssh_kex` - KEX methods discovered (unique per port)
- `compliant_ssh_kex` - KEX methods passing BSI/IANA validation
- `total_ssh_encryption` - Encryption algorithms discovered (unique, no duplicates)
- `compliant_ssh_encryption` - Encryption algorithms passing validation
- `total_ssh_mac` - MAC algorithms discovered (unique, no duplicates)
- `compliant_ssh_mac` - MAC algorithms passing validation
- `total_ssh_host_keys` - Host keys discovered
- `compliant_ssh_host_keys` - Host keys passing validation
- `total_ssh_items` - Sum of all SSH items
- `ssh_overall_percentage` - Overall SSH compliance rate
**Summary CSV Output** (`csv_export.py`):
- Includes both TLS and SSH metrics
- Shows counts and percentages for each category
- Reports critical vulnerabilities count
## Query Optimization
**Modular Design** (`reporter/query.py`):
`fetch_scan_data()` delegates to 12 focused helper functions:
- `_fetch_tls_cipher_suites()` - TLS cipher suites with version detection
- `_fetch_tls_supported_groups()` - TLS elliptic curves and DH groups
- `_fetch_tls_certificates()` - Certificate chain with compliance
- `_fetch_vulnerabilities()` - Known vulnerabilities
- `_fetch_protocol_features()` - Protocol-level features
- `_fetch_session_features()` - Session resumption data
- `_fetch_http_headers()` - HTTP security headers
- `_fetch_compliance_summary()` - Per-port compliance stats
- `_fetch_ssh_kex_methods()` - SSH key exchange algorithms
- `_fetch_ssh_encryption()` - SSH encryption algorithms
- `_fetch_ssh_mac()` - SSH MAC algorithms
- `_fetch_ssh_host_keys()` - SSH host keys with compliance
Each helper function:
- Has single responsibility
- Returns structured data (dict/list)
- Uses database views for compliance joins
- Minimal coupling to main function
**Benefits**:
- Main function reduced from 387 to ~35 lines
- Easy to test individual data fetchers
- Clear separation between TLS and SSH queries
- Consistent error handling per data type
## Writer Modularization
**Separation of Concerns** (`db/writer.py``db/writer.py` + `db/tls_writer.py`):
Original `writer.py` (929 lines) split into:
**`db/writer.py`** (172 lines) - Core operations:
- `write_scan_results()` - Main entry point
- `_insert_scan_record()` - Scan metadata
- `_resolve_hostname()` - DNS resolution
- `_save_host_info()` - Host information
- `_save_ssh_scan_results()` - SSH wrapper (delegates to generic_writer)
**`db/tls_writer.py`** (700 lines) - TLS-specific operations:
- `save_cipher_suites()` - TLS cipher suite persistence
- `save_supported_groups()` - Elliptic curves and DH groups
- `save_dhe_groups_from_cipher_suites()` - DHE group extraction
- `save_certificates()` - Certificate chain storage
- `save_vulnerabilities()` - Heartbleed, ROBOT, CCS injection
- `save_protocol_features()` - Compression, early data, fallback SCSV
- `save_session_features()` - Renegotiation and resumption
- `save_http_headers()` - Security headers (HSTS, HPKP, Expect-CT)
- FFDHE helper functions (group name/IANA mapping)
**Integration**:
- `generic_writer.py` imports from `tls_writer` instead of `writer`
- Clean module boundaries: Core vs Protocol-specific
- TLS functions now reusable across modules
**Benefits**:
- 81% reduction in writer.py size (929 → 172 lines)
- Clear separation: Core logic vs TLS logic vs SSH logic
- Easier navigation and maintenance
- Independent TLS module can be tested/modified separately