Skip to content

How are samples drawn from conformal prediction model / quantiles? #2830

@JuanCruzC97

Description

@JuanCruzC97

Hi, I'm using regression models together with a ConformalNaive model to generate probabilistic forecasts. I'm experimenting with two approaches:

  1. Generating historical quantile forecasts using predict_likelihood_parameters=True.
  2. Generating historical forecasts drawing a sampled distribution with num_samples=1000.

However, I'm confused about how the samples are being generated. For the same prediction date, I get the following quantiles:

q0.05 = 359.11175744  
q0.25 = 380.32542817  
q0.50 = 407.34136384  
q0.75 = 434.35729951  
q0.95 = 455.57097025

But when I inspect the sampled distribution, I get the following stats:

count    1000.000000  
mean      409.045300  
std        30.632939  
min       359.111757  
25%       382.522803  
50%       409.312385  
75%       436.266100  
max       455.570970

The 0.05 and 0.95 quantiles match the minimum and maximum of the sampled values, and their frequencies are unusually high:

455.570970    59  
359.111757    43  
(other values) ~1 each

This leads to a histogram with spikes at the ends of the distribution.

Image

My questions:

  • How are the samples generated from the quantiles?
  • Is it expected that the extreme quantiles are overrepresented like this in the sample?
  • Shouldn’t the samples more closely reflect a smooth distribution? I'm expecting the 0.05 quantile and the 0.95 quantile to cover the 90% of the sampled distribution but this will get me always 100% coverage.
  • Am I doing something wrong here?

Reproducible Example

import pandas as pd

from darts import concatenate, metrics, TimeSeries
from darts.datasets import AirPassengersDataset
from darts.models import ConformalNaiveModel, LinearRegressionModel

series = AirPassengersDataset().load()

train_start = pd.Timestamp("1949-01-01")
cal_start = pd.Timestamp("1957-01-01")
test_start = pd.Timestamp("1959-01-01")
test_end = pd.Timestamp("1960-12-01")

train = series[train_start : cal_start - series.freq]
cal = series[cal_start : test_start - series.freq]
test = series[test_start:test_end]
cal_test = concatenate([cal, test])

multi_horizon = 3
quantiles = [0.05, 0.25, 0.50, 0.75, 0.95]
input_length = 10

model = LinearRegressionModel(
    lags=input_length, 
    output_chunk_length=multi_horizon, 
    use_static_covariates=False
)

model.fit(train)

cp_model = ConformalNaiveModel(
    model=model, 
    quantiles=quantiles
)

comformal_samples = cp_model.historical_forecasts(
    series=cal_test,
    start=test_start,
    forecast_horizon=multi_horizon,
    retrain=False,
    num_samples=1000,
    predict_likelihood_parameters=False,
    last_points_only=True,
)

comformal_quantiles = cp_model.historical_forecasts(
    series=cal_test,
    start=test_start,
    forecast_horizon=multi_horizon,
    retrain=False,
    num_samples=1,
    predict_likelihood_parameters=True,
    last_points_only=True,    
)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    q&aFrequent question & answerquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions