-
Notifications
You must be signed in to change notification settings - Fork 952
Open
Labels
q&aFrequent question & answerFrequent question & answerquestionFurther information is requestedFurther information is requested
Description
Hi, I'm using regression models together with a ConformalNaive
model to generate probabilistic forecasts. I'm experimenting with two approaches:
- Generating historical quantile forecasts using
predict_likelihood_parameters=True
. - Generating historical forecasts drawing a sampled distribution with
num_samples=1000
.
However, I'm confused about how the samples are being generated. For the same prediction date, I get the following quantiles:
q0.05 = 359.11175744
q0.25 = 380.32542817
q0.50 = 407.34136384
q0.75 = 434.35729951
q0.95 = 455.57097025
But when I inspect the sampled distribution, I get the following stats:
count 1000.000000
mean 409.045300
std 30.632939
min 359.111757
25% 382.522803
50% 409.312385
75% 436.266100
max 455.570970
The 0.05 and 0.95 quantiles match the minimum and maximum of the sampled values, and their frequencies are unusually high:
455.570970 59
359.111757 43
(other values) ~1 each
This leads to a histogram with spikes at the ends of the distribution.
My questions:
- How are the samples generated from the quantiles?
- Is it expected that the extreme quantiles are overrepresented like this in the sample?
- Shouldn’t the samples more closely reflect a smooth distribution? I'm expecting the 0.05 quantile and the 0.95 quantile to cover the 90% of the sampled distribution but this will get me always 100% coverage.
- Am I doing something wrong here?
Reproducible Example
import pandas as pd
from darts import concatenate, metrics, TimeSeries
from darts.datasets import AirPassengersDataset
from darts.models import ConformalNaiveModel, LinearRegressionModel
series = AirPassengersDataset().load()
train_start = pd.Timestamp("1949-01-01")
cal_start = pd.Timestamp("1957-01-01")
test_start = pd.Timestamp("1959-01-01")
test_end = pd.Timestamp("1960-12-01")
train = series[train_start : cal_start - series.freq]
cal = series[cal_start : test_start - series.freq]
test = series[test_start:test_end]
cal_test = concatenate([cal, test])
multi_horizon = 3
quantiles = [0.05, 0.25, 0.50, 0.75, 0.95]
input_length = 10
model = LinearRegressionModel(
lags=input_length,
output_chunk_length=multi_horizon,
use_static_covariates=False
)
model.fit(train)
cp_model = ConformalNaiveModel(
model=model,
quantiles=quantiles
)
comformal_samples = cp_model.historical_forecasts(
series=cal_test,
start=test_start,
forecast_horizon=multi_horizon,
retrain=False,
num_samples=1000,
predict_likelihood_parameters=False,
last_points_only=True,
)
comformal_quantiles = cp_model.historical_forecasts(
series=cal_test,
start=test_start,
forecast_horizon=multi_horizon,
retrain=False,
num_samples=1,
predict_likelihood_parameters=True,
last_points_only=True,
)
Thanks!
Metadata
Metadata
Assignees
Labels
q&aFrequent question & answerFrequent question & answerquestionFurther information is requestedFurther information is requested