Missing policy
Core idea
Section titled “Core idea”In credit risk, missing values can carry their own signal. A missing field may
reflect data source, channel, operational policy, or a change in capture. For
that reason, silently applying fillna(...) before binning can hide a relevant
decision.
missing_policy makes that decision explicit:
from riskbands import RiskBands
binner = RiskBands(missing_policy="separate_bin")Policies
Section titled “Policies”| Policy | What it does | When to use |
|---|---|---|
standard | Preserves the current compatible behavior. | Reproducing existing flows and compatibility. |
separate_bin | Creates an explicit Missing bin for missing values in selected features. | Auditable analysis of missing values as their own group. |
forbid | Fails during fit or transform when missing values are found. | Governance that requires upstream treatment before binning. |
merge | Learns a regular destination bin for the missing group during fit. | When missing should remain audited but be routed to the closest bin. |
legacy may appear in old metadata for compatibility, but it is not a new
recommendation. The current canonical name is standard.
Auditable merge
Section titled “Auditable merge”missing_policy="merge" is opt-in and requires an explicit criterion:
binner = RiskBands( missing_policy="merge", missing_merge_criterion="nearest_event_rate", missing_merge_fallback="separate_bin",)binner = RiskBands( missing_policy="merge", missing_merge_criterion="nearest_woe", missing_merge_fallback="raise",)nearest_event_rate selects the regular bin with the smallest absolute event-rate
distance from the missing group during fit. nearest_woe uses the smallest
absolute WoE distance from the same fit profile.
Merge is not opaque imputation. The decision is stored in
missing_decision_log_, candidates are stored in missing_merge_candidates_,
and the learned routing map is stored in missing_merge_map_. transform(...)
uses only the fit-time decision; it does not learn a new rule from application
data.
Compare merge against separate_bin before accepting the decision. The
missing_policy_comparison_demo.py
script builds a table with IV, number of bins, missing event rate, action,
selected bin, distance, candidates, fallback, and a simple period metric. The
credit_risk_missing_merge_demo.py
script shows the same flow on synthetic credit data with bureau_score,
income, internal_rating, channel, product, vintage, and target.
pandas example
Section titled “pandas example”The complete script is available at
examples/missing_policy/missing_policy_pandas_demo.py.
import numpy as npimport pandas as pd
from riskbands import RiskBands
df = pd.DataFrame( { "score": [410.0, 450.0, np.nan, 620.0, 710.0] * 6, "rating": ["A", "B", None, "C", "D"] * 6, "target": [0, 0, 1, 1, 1] * 6, })
binner = RiskBands( max_bins=4, min_event_rate_diff=0.0, force_categorical=["rating"], missing_policy="separate_bin",)
binner.fit(df, y="target", columns=["score", "rating"], validate=True)df_binned = binner.transform(df[["score", "rating"]], validate=True)
print(df_binned.head())print(binner.missing_profile_)print(binner.missing_decision_log_)Run locally:
python examples/missing_policy/missing_policy_pandas_demo.pyPySpark example
Section titled “PySpark example”PySpark is an optional extra. The base installation does not install Spark.
pip install "riskbands[spark]"The complete script is available at
examples/missing_policy/missing_policy_pyspark_demo.py.
It uses:
- a small local
SparkSessionwithlocal[2]; spark.sql.shuffle.partitions=2;- a small synthetic dataset;
missing_policy="separate_bin";transform(validate=True);missing_policy="forbid"producing a clear error;- an explicit boundary for
missing_policy="merge"in PySpark; - no UDF.
python examples/missing_policy/missing_policy_pyspark_demo.pyIf PySpark is not installed, the example prints how to install the extra and exits without making Spark a base dependency.
What to inspect
Section titled “What to inspect”After fit(...), look at:
missing_policy_effective_missing_policy_missing_profile_missing_decision_log_fit_profile_,reference_profile_, andapplication_profile_when validation is enabled.
missing_profile_ shows volume, share, events, event rate, backend, context,
and whether the row represents a missing bin. missing_decision_log_ records
the action taken per variable.
Bundle and reporting
Section titled “Bundle and reporting”export_bundle(...) persists the missing-values trail:
missing_policyeffective_missing_policymissing_profilemissing_decision_logmissing_merge_criterionmissing_merge_fallbackmissing_merge_candidatesmissing_merge_map
from riskbands.reporting import load_bundle
binner.export_bundle("riskbands_bundle")bundle = load_bundle("riskbands_bundle")
print(bundle["missing_policy"])print(bundle["missing_profile"])Old bundles without these fields continue to load as standard.
What is not implemented
Section titled “What is not implemented”This page documents the current contract. There are not yet:
temporal_stableas a merge criterion;monotonic_neighboras a merge criterion;- merge criteria beyond
nearest_event_rateandnearest_woe; - complete PySpark merge support;
- intelligent imputation inside RiskBands;
- fully distributed statistical fitting in Spark.
RiskBands helps make the decision defensible and auditable, but it does not automatically guarantee regulatory compliance.