perf-compare

import { Aside } from ‘@astrojs/starlight/components’;

perf-compare compares two sets of performance test results and determines whether a statistically significant regression has occurred. It integrates with perf-results-db to pull run data automatically.

Tier	Method	Price
Community	`--method simple` (percentage threshold)	Free
Professional	`--method statistical` (Mann-Whitney U + Cohen’s d)	£99/year

Install:

npm install -g @martkos-it/perf-compare
# or without global install:
npx @martkos-it/perf-compare --help

Simple Comparison (Free)

Compares two runs by percentage difference against a threshold:

perf-compare \
  --url http://localhost:4000 \
  --project your-project-uuid \
  --method simple \
  --threshold 0.10   # 10% degradation = regression

Output:

Comparison: simple (threshold: 10.0%)
─────────────────────────────────────
Metric         Baseline   Current    Delta    Status
p50_ms         142        149        +5.0%    OK
p95_ms         387        431        +11.4%   REGRESSION
p99_ms         612        598        -2.3%    OK
error_rate     0.1%       0.2%       +100%    REGRESSION
throughput     312        308        -1.3%    OK

Result: FAIL (2 regressions detected)

Exit codes: 0 pass, 1 regression, 2 tool error.

Statistical Comparison (Pro)

Uses Mann-Whitney U test (non-parametric) and Cohen’s d (effect size) for robust regression detection that accounts for variance in the data:

perf-compare \
  --url http://localhost:4000 \
  --project your-project-uuid \
  --method statistical \
  --baseline 10 \   # use last 10 runs as baseline
  --current 5 \     # compare against last 5 runs
  --alpha 0.05      # significance level (default: 0.10)

Output:

Comparison: statistical (α=0.05)
─────────────────────────────────────────────────────────────────
Metric      Baseline μ  Current μ  Δ%      p-value  Effect  Status
p50_ms      142.3       149.1      +4.8%   0.031    small   REGRESSION
p95_ms      387.4       430.8      +11.2%  0.004    medium  REGRESSION
p99_ms      611.7       597.2      -2.4%   0.412    —       OK
error_rate  0.10%       0.22%      +120%   0.001    large   REGRESSION

Result: FAIL (3 regressions at α=0.05)

The p-value tells you whether the difference is statistically significant. Cohen’s d tells you whether it’s practically significant (small < 0.5, medium < 0.8, large ≥ 0.8).

License Activation

export PERF_COMPARE_LICENSE_KEY=your-license-key
# or
perf-compare --license your-key --method statistical ...

License is validated against updates.martkos-it.co.uk with a 24-hour file cache and 7-day offline grace period.

Configuration

perf-ecosystem.yml (auto-discovered by walking up to .git):

services:
  perf_results_db:
    url: "http://localhost:4000"
    api_key: "${PERF_RESULTS_DB_API_KEY}"
    project_id: "${PERF_RESULTS_DB_PROJECT_ID}"

When present, --url and --project can be omitted.

Alternatively, environment variables:

export PERF_RESULTS_DB_API_KEY=prdb_your_key
export PERF_RESULTS_DB_URL=http://localhost:4000

JSON Output

perf-compare --method statistical --json

{
  "result": "fail",
  "method": "statistical",
  "alpha": 0.05,
  "regressions": [
    {
      "metric": "p95_ms",
      "baselineMean": 387.4,
      "currentMean": 430.8,
      "deltaPercent": 11.2,
      "pValue": 0.004,
      "effectSize": "medium"
    }
  ]
}

CI/CD Integration

GitHub Actions

- name: Check for regressions
  env:
    PERF_RESULTS_DB_API_KEY: ${{ secrets.PERF_RESULTS_DB_API_KEY }}
    PERF_COMPARE_LICENSE_KEY: ${{ secrets.PERF_COMPARE_LICENSE_KEY }}
    PERF_COMPARE_CONFIG_DIR: /tmp/perf-compare-cache
  run: |
    npx @martkos-it/perf-compare \
      --url ${{ vars.PERF_RESULTS_DB_URL }} \
      --project ${{ vars.PERF_RESULTS_DB_PROJECT_ID }} \
      --method statistical \
      --baseline 10 --current 3

The PERF_COMPARE_CONFIG_DIR env var sets the license cache directory (useful in ephemeral CI environments).

GitLab CI

regression-check:
  script:
    - npx @martkos-it/perf-compare
        --url $PERF_RESULTS_DB_URL
        --project $PERF_RESULTS_DB_PROJECT_ID
        --method statistical
        --baseline 10 --current 3 --alpha 0.05

All Options

Flag	Default	Description
`--url <url>`	from config	perf-results-db base URL
`--project <uuid>`	from config	Project UUID
`--method <simple\|statistical>`	required	Comparison method
`--threshold <float>`	`0.1`	Simple: max % degradation
`--baseline <n>`	`10`	Statistical: number of baseline runs
`--current <n>`	`5`	Statistical: number of current runs
`--alpha <float>`	`0.10`	Statistical: significance level
`--json` / `-j`	false	JSON output
`--license <key>`	env var	License key override