Forecasting

A PassengerSim forecast is not just an attempt to predict the future – it is a very specific format for a set of predictions of some very specific things. The default implementation in PassengerSim is a ForecastGroup, but it can be replaced by any other code that exposes the same API. The forecast is generated for one or more booking classes (hence, “group”) on a common simulation element. This simulation element can be a Leg representing a particular flight, which also has a collection of Bucket objects corresponding to various booking classes. Alternatively, the simulation element can be a Path representing a sequence of one or more flights sold to customers as a non-stop itinerary (for a single leg path) or a connecting itinerary (for multi-leg paths), which also has a collection of PathClass objects corresponding to various booking classes.

As an example, let’s consider a single Leg from an active simulation.

leg
<passengersim.core.Leg 42: ZA:42 LGA-ORD>

This leg has 6 booking classes, each associated with a Bucket.

list(leg.buckets)
[<passengersim.core.Bucket: Y0 in Y>,
 <passengersim.core.Bucket: Y1 in Y>,
 <passengersim.core.Bucket: Y2 in Y>,
 <passengersim.core.Bucket: Y3 in Y>,
 <passengersim.core.Bucket: Y4 in Y>,
 <passengersim.core.Bucket: Y5 in Y>]

We can index into the buckets via regular positional indexing,

leg.buckets[0]
<passengersim.core.Bucket: Y0 in Y>

Or by selecting for the specific booking class we want.

leg.buckets.select(booking_class="Y4")
<passengersim.core.Bucket: Y4 in Y>

Histories

Each bucket has a history, which contains the sales and closure data associated with the most recent 26 sample days. (1)

  1. The length of the stored history is configurable, but 26 is the default

Each History is updated dynamically as the simulation runs. At the very beginning of a simulation trial, there won’t be any data in the history, which is part of why we include a “burn” period. For the first samples, the history gets populated by adding new rows for each sample, up until the history is “full” with 26 sample days. Thereafter, for every new sample the oldest record in the history is discarded to make room for the new one.

leg.buckets.select(booking_class="Y4").history
<passengersim.core.History at 0x7fa3c4e18, n_dep=26 n_tf=16 len=26>

The data in each history can be retrieved using the History.as_arrays method. The leg we are inspecting is in a state as if it was within a simulation, so it already has some history defined.

h_data = leg.buckets.select(booking_class="Y4").history.as_arrays()

print(h_data.keys())
print("sold:", h_data["sold"].shape, "\n", h_data["sold"][:3], "\n  ...")
print("sold_priceable:", h_data["sold_priceable"].shape, "\n", h_data["sold_priceable"][:3], "\n  ...")
print("closed_flags:", h_data["closed_flags"].shape, "\n", h_data["closed_flags"][:3], "\n  ...")
dict_keys(['sold', 'sold_priceable', 'closed_flags'])
sold: (26, 16) 
 [[ 8.  6.  6.  3.  2.  0.  4.  2.  4.  4.  0.  0.  0.  0.  0.  0.]
 [ 9.  0.  8.  4.  3.  3.  6.  3.  3.  3.  0.  0.  0.  0.  0.  0.]
 [11.  1.  4.  4.  5.  0.  3.  2.  4.  1.  0.  0.  0.  0.  0.  0.]] 
  ...
sold_priceable: (26, 16) 
 [[0. 0. 0. 0. 0. 0. 0. 0. 4. 4. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 3. 3. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 4. 1. 0. 0. 0. 0. 0. 0.]] 
  ...
closed_flags: (26, 16) 
 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 2 3 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]] 
  ...

We can see the history’s stored data consists of several arrays. Each array is the same size, with rows for historical sample days and columns for timeframes. The arrays available include:

  • sold, which contains the total number of bookings by sample day by timeframe,

  • sold_priceable, which identifies how many of those bookings were made at moments when the booking class was the lowest price option available at that moment, and

  • closed_flags, which indicates whether the booking class was not available for sale at the beginning of the timeframe (1), at the end of the timeframe (2), or both (3).

When served as “raw” data like this, all of the values in sold and sold_priceable are integers, but they are stored as floating point values, to facilitate compatability with downstream algorithms, such as detruncation, which may create or use fractional values in an otherwise identically structured array.

For simple forecasting approaches, it is possible to work with each booking class as an independent entity without any linkages to demand levels in the other booking classes. However, many modern approaches relax this independence assumption in one way or another, so the primary forecasting interface in PassengerSim is built one level of aggregation up, at the Leg or Path level.

Each Leg or Path has a forecast accessor tied directly to the object. Each can also have one or more separate ForecastGroup objects associated with it, which share linkages to the same history but can provide alternative forecasts. For most situations generally there will only be one active forecast for each, unless the user is intentionally studying, comparing, or mixing different forecasts. Various RM optimization algorithms will require either a leg-based or a path-based forecast to operate, although in theory it is possible to work with both, or other variations.

Note

The Leg.forecast and Path.forecast are technically implemented as ForecastAccessors that link directly to the forecast(s) stored within that leg or path, but provide a nearly identical API to the ForecastGroup. Most of the time, users can interact with the forecast through the accessor without needing to know about the underlying structure of the forecast group. The accessor is designed to be a convenient interface for users to access the forecasts and their outputs, while still allowing for flexibility in how the forecasts are structured and implemented under the hood.

leg.forecast.get_advpurch_max_tf_indexes()
{'Y0': 999, 'Y1': 999, 'Y2': 14, 'Y3': 12, 'Y4': 10, 'Y5': 8}

ForecastGroup

A ForecastGroup represents the forecasting interface for PassengerSim. It is populated with some input data, the foundation of which is the set of History objects stored on each Bucket or PathClass.

leg.forecast.get_history()
{'Y0': <passengersim.core.History at 0x7fa3c4618, n_dep=26 n_tf=16 len=26>,
 'Y1': <passengersim.core.History at 0x7fa3c4818, n_dep=26 n_tf=16 len=26>,
 'Y2': <passengersim.core.History at 0x7fa3c4a18, n_dep=26 n_tf=16 len=26>,
 'Y3': <passengersim.core.History at 0x7fa3c4c18, n_dep=26 n_tf=16 len=26>,
 'Y4': <passengersim.core.History at 0x7fa3c4e18, n_dep=26 n_tf=16 len=26>,
 'Y5': <passengersim.core.History at 0x7fa3c5018, n_dep=26 n_tf=16 len=26>}

The leg.forecast is the primary forecast bound to the leg, but we can also create additional separate ForecastGroup instances for other forecasts for the same leg:

f = leg.new_forecast()
f
<passengersim.core.ForecastGroup at 0x117edc120>

The additional forecast shares the same input histories as the primary forecast, as you can observe the histories point to the exact same memory addresses as those above.

f.get_history()
{'Y0': <passengersim.core.History at 0x7fa3c4618, n_dep=26 n_tf=16 len=26>,
 'Y1': <passengersim.core.History at 0x7fa3c4818, n_dep=26 n_tf=16 len=26>,
 'Y2': <passengersim.core.History at 0x7fa3c4a18, n_dep=26 n_tf=16 len=26>,
 'Y3': <passengersim.core.History at 0x7fa3c4c18, n_dep=26 n_tf=16 len=26>,
 'Y4': <passengersim.core.History at 0x7fa3c4e18, n_dep=26 n_tf=16 len=26>,
 'Y5': <passengersim.core.History at 0x7fa3c5018, n_dep=26 n_tf=16 len=26>}

Additional Inputs By Booking Class

In addition to the History for each booking class, a selection of other relevant attributes of each booking class are also available as inputs to forecast algorithms:

  • advpurch_max_tf_index, the maximum timeframe position where each booking class remains available for customers to purchase. After this timeframe, the booking class is closed to all further purchases by an advance purchase restriction. For booking classes with no advance purchase restriction, this should be set to a sufficiently large value (by default, 9999).

leg.forecast.get_advpurch_max_tf_indexes()
{'Y0': 999, 'Y1': 999, 'Y2': 14, 'Y3': 12, 'Y4': 10, 'Y5': 8}
  • customer_price, the price that a customer pays to book this booking class. This input may not be needed for all forecasting algorithms.

leg.forecast.get_customer_prices()
{'Y0': 300.0, 'Y1': 240.0, 'Y2': 190.0, 'Y3': 140.0, 'Y4': 115.0, 'Y5': 100.0}

Additional Inputs in Aggregate

In addition to the mapping of data by booking class described above, there is also some input data that is generic across all booking classes:

  • dcp_days_prior, a strictly monotonically decreasing vector of int, giving the number of days prior to departure at each DCP. This vector should not include 0 as the final value, as there is no forecasting at the moment of departure. Since the 0 is not included, the size of this vector will be exactly equal to the total number of timeframes. This vector can be used as the index for the timeframe dimension of the history input arrays, as well as the output vectors of the forecast.

leg.forecast.dcp_days_prior
(63, 56, 49, 42, 35, 31, 28, 24, 21, 17, 14, 10, 7, 5, 3, 1)

Detruncation

Many forecasts start with “detruncation”. This processes estimates the quantity of demand for each booking class that is unobserved in each timeframe due to that booking class being closed at the time. In essence, we want to swap out our array of observed sales for the estimate of potential demand, so out later algorithms know what to expect if they open booking classes that were (sometimes) closed in our historical data.

PassengerSim implements the expectation-maximization (“EM”) algorithm for detruncation. It can be called on a forecast using the detruncate_demand method.

leg.forecast.detruncate_demand(dcp_index=0, algorithm="EM", which_data="total")

When running detruncation, you can do so for the “total” sales, or just for the “yieldable” portion of demand. The latter is useful if you are handling the forecasting of priceable demand separately. Once you’ve run the detruncation step, the detruncated version of history is also available on the forecast.

hd = leg.forecast.get_history(detruncated="total")

print(hd.keys())
print("Y0:", hd["Y0"].shape, "\n", hd["Y0"][:4], "\n  ...")
print("Y1:", hd["Y1"].shape, "\n", hd["Y1"][:4], "\n  ...")
print("... Y2, Y3, ...")
print("Y4:", hd["Y4"].shape, "\n", hd["Y4"][:4], "\n  ...")
print("Y5:", hd["Y5"].shape, "\n", hd["Y5"][:4], "\n  ...")
dict_keys(['Y0', 'Y1', 'Y2', 'Y3', 'Y4', 'Y5'])
Y0: (26, 16) 
 [[1.         1.         2.         1.         0.         1.
  3.         1.         1.         0.         1.         2.
  0.         2.         7.         1.        ]
 [1.         0.         1.         1.         0.         0.
  4.         0.         2.         3.         3.         1.
  1.         2.         4.         2.        ]
 [2.         1.         1.         3.         1.         0.
  0.         2.         0.         1.         2.         3.
  0.         4.         3.         1.        ]
 [1.         0.         2.         0.         2.         0.
  0.         1.         0.         1.         1.         0.
  0.         6.         8.03678481 1.52786928]] 
  ...
Y1: (26, 16) 
 [[0.         0.         2.         2.         1.         1.
  0.         1.         2.         1.         1.         3.
  0.         0.         4.         3.        ]
 [0.         0.         0.         1.         1.         0.
  0.         0.         2.         0.         0.         1.
  2.         1.         3.         1.        ]
 [2.         2.         1.         1.         0.         0.
  0.         0.         0.         1.         1.         5.
  0.         1.         2.         0.        ]
 [1.         0.         0.         2.         0.         0.
  0.         1.         3.         0.         2.         1.
  4.         3.         3.98337339 1.99186367]] 
  ...
... Y2, Y3, ...
Y4: (26, 16) 
 [[ 8.          6.          6.          3.          2.          0.
   4.          2.          4.          4.          0.          0.
   0.          0.          0.          0.        ]
 [ 9.          0.          8.          4.          3.          3.
   6.          3.          3.          4.15033985  0.          0.
   0.          0.          0.          0.        ]
 [11.          1.          4.          4.          5.          0.
   3.          2.          4.          1.          0.          0.
   0.          0.          0.          0.        ]
 [ 5.          2.          6.          5.          2.          5.
   4.          4.          4.          3.          0.          0.
   0.          0.          0.          0.        ]] 
  ...
Y5: (26, 16) 
 [[5. 3. 0. 1. 2. 0. 1. 3. 0. 0. 0. 0. 0. 0. 0. 0.]
 [6. 2. 5. 2. 1. 0. 3. 3. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 2. 0. 2. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [3. 2. 2. 2. 1. 0. 3. 6. 0. 0. 0. 0. 0. 0. 0. 0.]] 
  ...

Note that when retrieving this detruncated data, you don’t get all the other arrays from each history, you only get the thing you created by detruncation. Further, this data is linked only to the one forecast where it was created. Other forecasts that share the same underlying raw history will not share this detruncated data.

f.get_history(detruncated="total")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[18], line 1
----> 1 f.get_history(detruncated="total")

ValueError: missing fcst for booking class Y0

Custom Algorithms

If you prefer to implement your own detruncation process, you can take the history arrays and do so, using whatever algorithm you choose. Let’s make up an algorithm where we detruncate all censored values to be 99. (1)

  1. I got 99 problems, and a bad detruncation algorithm ain’t one. Because it’s all 99. Like, seriously, don’t do this. It’s just a demo, ok?

import numpy as np

bad_detruncation = {}
for booking_class, history in f.get_history().items():
    arrays = history.as_arrays()
    bad_detruncation[booking_class] = np.where(arrays["closed_flags"], 99, arrays["sold"])

This exogenously created detruncated demand array can be fed to your own custom forecasting algorithm, or fed back into PassengerSim’s workflow like this:

f.detruncate_demand(
  dcp_index=0, algorithm="external", which_data="total", external=bad_detruncation
)

Creating Forecasts

Just like detruncation, the processing of all these inputs into a forecast can be more or less any algorithm you care to implement, but PassengerSim includes some default implementations, including additive pickup and exponential smoothing. In addition to the inputs discussed above, individual Forecast methods may define additional parameters that serve as inputs to the forecasting process. The simplest algorithm is the “additive pickup” algorithm, which computes a forecast by considering the average of how much more demand is expected to be “picked up” between any given point on the booking curve and the departure.

leg.forecast.compute_forecasts(
  dcp_index=0, algorithm="additive_pickup", which_data="total", recompute=True
)

The compute_forecasts method doesn’t return anything directly. But it does the computational work internally, and caches various computed values in the object, so that the required dimensions of the forecast can be accessed later.

Outputs

As noted above, most of the methods that do the computational work of creating a forecast don’t return any specific output. Instead, that output is cached internally in the forecast object, and various aspects of the output can be accessed via specific output methods.

Expected Future Demand Vectors

A primary output for a forecast is the expected demand. A forecast should ultimately provide a set of vectors for each booking class:

  • mean_to_departure, accessed via get_mean_to_departure, which gives the expected demand from the beginning of each timeframe through departure, by timeframe.

leg.forecast.get_mean_to_departure()
{'Y0': array([24.12187956, 22.50649495, 21.1603411 , 20.04495649, 19.08341803,
        18.54495649, 18.19880264, 17.27572572, 16.50649495, 15.62187956,
        14.6603411 , 12.08341803,  9.35264879,  7.70808042,  5.42153486,
         1.46245527]),
 'Y1': array([17.20195224, 16.35579839, 15.58656762, 14.77887532, 13.74041378,
        13.47118301, 13.1634907 , 12.97118301, 12.1634907 , 11.58656762,
        10.93272147,  9.39425993,  7.19499126,  5.70111262,  4.12529062,
         1.76704022]),
 'Y2': array([10.4904192 ,  9.9904192 ,  9.72118843,  9.33657305,  8.87503459,
         8.72118843,  8.56734228,  8.14426536,  7.95195766,  7.60580382,
         6.83657305,  5.34278054,  3.69494324,  2.14350449,  0.        ,
         0.        ]),
 'Y3': array([5.25970843, 5.22124689, 5.10586228, 5.0289392 , 4.99047766,
        4.95201612, 4.79816997, 4.6707204 , 4.34707885, 3.86527634,
        3.46692847, 1.24236375, 0.        , 0.        , 0.        ,
        0.        ]),
 'Y4': array([41.01328067, 32.35943452, 28.35943452, 23.05174221, 18.62866529,
        15.51276866, 13.7085169 , 10.14221079,  6.44631692,  3.00751721,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ]),
 'Y5': array([15.75812322, 12.14273861, 10.31948144,  8.24849588,  6.16011418,
         4.99227006,  4.10417194,  2.00952949,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ])}
  • mean_in_timeframe, accessed via get_mean_in_timeframe, which gives the expected demand within each individual timeframe.

leg.forecast.get_mean_in_timeframe()
{'Y0': array([1.61538462, 1.34615385, 1.11538462, 0.96153846, 0.53846154,
        0.34615385, 0.92307692, 0.76923077, 0.88461538, 0.96153846,
        2.57692308, 2.73076923, 1.64456838, 2.28654556, 3.95907959,
        1.46245527]),
 'Y1': array([0.84615385, 0.76923077, 0.80769231, 1.03846154, 0.26923077,
        0.30769231, 0.19230769, 0.80769231, 0.57692308, 0.65384615,
        1.53846154, 2.19926867, 1.49387864, 1.575822  , 2.3582504 ,
        1.76704022]),
 'Y2': array([0.5       , 0.26923077, 0.38461538, 0.46153846, 0.15384615,
        0.15384615, 0.42307692, 0.19230769, 0.34615385, 0.76923077,
        1.49379251, 1.6478373 , 1.55143875, 2.14350449, 0.        ,
        0.        ]),
 'Y3': array([0.03846154, 0.11538462, 0.07692308, 0.03846154, 0.03846154,
        0.15384615, 0.12744956, 0.32364155, 0.48180252, 0.39834786,
        2.22456472, 1.24236375, 0.        , 0.        , 0.        ,
        0.        ]),
 'Y4': array([8.65384615, 4.        , 5.30769231, 4.42307692, 3.11589662,
        1.80425176, 3.56630611, 3.69589387, 3.43879971, 3.00751721,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]),
 'Y5': array([3.61538462, 1.82325717, 2.07098556, 2.0883817 , 1.16784412,
        0.88809812, 2.09464245, 2.00952949, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ])}
  • stdev_to_departure, accessed via get_stdev_to_departure, which gives the standard deviation of demand from the beginning of each timeframe through departure, by timeframe.

leg.forecast.get_stdev_to_departure()
{'Y0': array([7.44062911, 6.78130674, 6.28989281, 6.08234586, 5.63502121,
        5.54259414, 5.4788825 , 5.18696904, 4.97923982, 4.72512255,
        4.49887288, 3.63204519, 3.34785012, 2.79486068, 2.12221765,
        0.82085782]),
 'Y1': array([5.64494026, 5.40310451, 5.28480539, 5.04320267, 4.84345453,
        4.8032929 , 4.69048291, 4.60440688, 4.30313508, 4.25749614,
        4.0504397 , 3.70504088, 3.51091939, 2.90806135, 2.34782323,
        1.30521166]),
 'Y2': array([3.96490901, 4.05618891, 3.64742546, 3.64972185, 3.63860536,
        3.60380889, 3.68000002, 3.45911273, 3.38798398, 3.48852008,
        3.26175709, 2.90016067, 1.72166882, 1.3357445 , 0.        ,
        0.        ]),
 'Y3': array([3.95342292, 3.9488252 , 4.00904896, 4.0302925 , 4.04524837,
        4.03010348, 3.6850913 , 3.6223341 , 2.94159006, 2.26245962,
        2.06786991, 1.16193513, 0.        , 0.        , 0.        ,
        0.        ]),
 'Y4': array([7.70536217, 6.28907802, 6.20348273, 4.78259276, 3.7331001 ,
        3.19622964, 2.77903279, 2.44674203, 2.07856978, 1.43887122,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]),
 'Y5': array([6.84575403, 5.68757694, 5.42256447, 4.53106541, 2.90309693,
        2.54283714, 2.19270859, 1.53861426, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ])}
  • stdev_in_timeframe, accessed via get_stdev_in_timeframe, which gives the standard deviation of demand within each individual timeframe.

leg.forecast.get_stdev_in_timeframe()
{'Y0': array([1.62670029, 1.29436649, 0.99305279, 1.07631851, 0.70601809,
        0.56159115, 1.35419576, 0.95111271, 0.81618248, 1.03849003,
        1.92193812, 1.48479473, 1.50101609, 1.50937561, 2.01501455,
        0.82085782]),
 'Y1': array([0.96715284, 0.90808336, 0.98058068, 0.91567545, 0.53349357,
        0.47067872, 0.40191848, 0.80096096, 0.80860754, 0.79711017,
        1.13950057, 1.70537868, 1.37578431, 0.95737008, 1.50905976,
        1.30521166]),
 'Y2': array([0.64807407, 0.66679486, 0.57109881, 0.90468864, 0.46409548,
        0.36794648, 0.64330875, 0.49146563, 0.56159115, 0.99227788,
        1.12927138, 1.80707639, 1.05421071, 1.3357445 , 0.        ,
        0.        ]),
 'Y3': array([0.19611614, 0.32581259, 0.27174649, 0.19611614, 0.19611614,
        0.61268639, 0.32717411, 0.99564218, 1.0600049 , 0.67804035,
        1.35969911, 1.16193513, 0.        , 0.        , 0.        ,
        0.        ]),
 'Y4': array([3.30989193, 2.24499443, 2.88123905, 1.83680324, 1.94659516,
        1.45968209, 1.67636431, 1.73748616, 1.69618456, 1.43887122,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]),
 'Y5': array([2.15549388, 1.30156187, 1.72078301, 2.31333422, 1.01704627,
        0.97757443, 0.98672964, 1.53861426, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ])}

Each vector should be of length equal to the number of timeframes. If a forecast is generated part way through the booking process, this does not change the shape of the vector; both past and future timeframes remain a part of the vector. It is generally not necessary or useful to update the portion of the vector that pertains the to the past, but the allocated size of the vector remains the same.

Rather than accessing each component of the forecast element by element, there are also tools to extract the entire forecast and format it as a single pandas DataFrame.

leg.forecast.get_forecast_data().to_dataframe()
mean_to_departure stdev_to_departure ... mean_in_timeframe stdev_in_timeframe
booking_class Y0 Y1 Y2 Y3 Y4 Y5 Y0 Y1 Y2 Y3 ... Y2 Y3 Y4 Y5 Y0 Y1 Y2 Y3 Y4 Y5
days_prior
63 24.121880 17.201952 10.490419 5.259708 41.013281 15.758123 7.440629 5.644940 3.964909 3.953423 ... 0.500000 0.038462 8.653846 3.615385 1.626700 0.967153 0.648074 0.196116 3.309892 2.155494
56 22.506495 16.355798 9.990419 5.221247 32.359435 12.142739 6.781307 5.403105 4.056189 3.948825 ... 0.269231 0.115385 4.000000 1.823257 1.294366 0.908083 0.666795 0.325813 2.244994 1.301562
49 21.160341 15.586568 9.721188 5.105862 28.359435 10.319481 6.289893 5.284805 3.647425 4.009049 ... 0.384615 0.076923 5.307692 2.070986 0.993053 0.980581 0.571099 0.271746 2.881239 1.720783
42 20.044956 14.778875 9.336573 5.028939 23.051742 8.248496 6.082346 5.043203 3.649722 4.030293 ... 0.461538 0.038462 4.423077 2.088382 1.076319 0.915675 0.904689 0.196116 1.836803 2.313334
35 19.083418 13.740414 8.875035 4.990478 18.628665 6.160114 5.635021 4.843455 3.638605 4.045248 ... 0.153846 0.038462 3.115897 1.167844 0.706018 0.533494 0.464095 0.196116 1.946595 1.017046
31 18.544956 13.471183 8.721188 4.952016 15.512769 4.992270 5.542594 4.803293 3.603809 4.030103 ... 0.153846 0.153846 1.804252 0.888098 0.561591 0.470679 0.367946 0.612686 1.459682 0.977574
28 18.198803 13.163491 8.567342 4.798170 13.708517 4.104172 5.478883 4.690483 3.680000 3.685091 ... 0.423077 0.127450 3.566306 2.094642 1.354196 0.401918 0.643309 0.327174 1.676364 0.986730
24 17.275726 12.971183 8.144265 4.670720 10.142211 2.009529 5.186969 4.604407 3.459113 3.622334 ... 0.192308 0.323642 3.695894 2.009529 0.951113 0.800961 0.491466 0.995642 1.737486 1.538614
21 16.506495 12.163491 7.951958 4.347079 6.446317 0.000000 4.979240 4.303135 3.387984 2.941590 ... 0.346154 0.481803 3.438800 0.000000 0.816182 0.808608 0.561591 1.060005 1.696185 0.000000
17 15.621880 11.586568 7.605804 3.865276 3.007517 0.000000 4.725123 4.257496 3.488520 2.262460 ... 0.769231 0.398348 3.007517 0.000000 1.038490 0.797110 0.992278 0.678040 1.438871 0.000000
14 14.660341 10.932721 6.836573 3.466928 0.000000 0.000000 4.498873 4.050440 3.261757 2.067870 ... 1.493793 2.224565 0.000000 0.000000 1.921938 1.139501 1.129271 1.359699 0.000000 0.000000
10 12.083418 9.394260 5.342781 1.242364 0.000000 0.000000 3.632045 3.705041 2.900161 1.161935 ... 1.647837 1.242364 0.000000 0.000000 1.484795 1.705379 1.807076 1.161935 0.000000 0.000000
7 9.352649 7.194991 3.694943 0.000000 0.000000 0.000000 3.347850 3.510919 1.721669 0.000000 ... 1.551439 0.000000 0.000000 0.000000 1.501016 1.375784 1.054211 0.000000 0.000000 0.000000
5 7.708080 5.701113 2.143504 0.000000 0.000000 0.000000 2.794861 2.908061 1.335745 0.000000 ... 2.143504 0.000000 0.000000 0.000000 1.509376 0.957370 1.335745 0.000000 0.000000 0.000000
3 5.421535 4.125291 0.000000 0.000000 0.000000 0.000000 2.122218 2.347823 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 2.015015 1.509060 0.000000 0.000000 0.000000 0.000000
1 1.462455 1.767040 0.000000 0.000000 0.000000 0.000000 0.820858 1.305212 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.820858 1.305212 0.000000 0.000000 0.000000 0.000000

16 rows × 24 columns

Or to generate a visualization of the data.

leg.forecast.get_forecast_data(summarize_history=True).dashboard()

Expected Future Demand Now

From a computational perspective, it is generally most efficient to generate demand forecasts all at once for the entire vector of timeframes in the simulation. But many optimization algorithms don’t care about the whole vector of future demand by timeframe, they just want to know what is the mean (and standard deviation) of the demand from “now” through departure. The timing of “now” might be at one of the DCPs, or it might be at some point between DCPs. To satisfy this, in addition to the vector forecasts outlined above we also offer from the ForecastGroup the “now” forecasts for mean and standard deviation through departure.

leg.forecast.get_current_mean()
{'Y0': 24.121879563516945,
 'Y1': 17.201952239483767,
 'Y2': 10.490419202989923,
 'Y3': 5.259708430074137,
 'Y4': 41.01328067185432,
 'Y5': 15.758123224279785}
leg.forecast.get_current_stdev()
{'Y0': 7.440629106355628,
 'Y1': 5.6449402566812195,
 'Y2': 3.9649090058263154,
 'Y3': 3.9534229178236657,
 'Y4': 7.705362172856466,
 'Y5': 6.845754031216025}

When we used the compute_forecasts method above, we set the dcp_index to zero, which told that method two things: to compute the vector of forecasts from the first DCP all the way through departure, and to set “now” to be that first DCP. For subsequent DCPs, if we don’t want to do any recomputation but just move the “now” forward along the vector, we can use move_forecast_pointers.

leg.forecast.move_forecast_pointers(dcp_index=1)
leg.forecast.get_current_mean()
{'Y0': 22.506494948132328,
 'Y1': 16.35579839332992,
 'Y2': 9.990419202989923,
 'Y3': 5.221246891612599,
 'Y4': 32.35943451800817,
 'Y5': 12.14273860889517}

To get the “now” values to be interpolated forecasts in between the DCP points along the vector of forecasts, we can set days_prior instead of dcp_index, which allows us to give a point in between DCPs.

leg.forecast.move_forecast_pointers(days_prior=62)
leg.forecast.get_current_mean()
{'Y0': 23.891110332747715,
 'Y1': 17.081073118604646,
 'Y2': 10.418990631561352,
 'Y3': 5.2542139245796315,
 'Y4': 39.77701693559058,
 'Y5': 15.241639707796269}

This works even for fractional values for days_prior.

leg.forecast.move_forecast_pointers(days_prior=62.5)
leg.forecast.get_current_mean()
{'Y0': 24.006494948132328,
 'Y1': 17.141512679044205,
 'Y2': 10.454704917275636,
 'Y3': 5.256961177326884,
 'Y4': 40.39514880372245,
 'Y5': 15.499881466038028}

Marginal Revenue

In addition to a forecast of demand, the forecast may also provide a measure of marginal revenue per unit of forecasted demand in each booking class. In simple standard forecasts in fully restricted markets, this marginal revenue requires no adjustments and is equal simply to the customer price. In this case it is not generally calculated in the forecast. If you try to access the adjusted marginal revenue when it’s not set, you’ll just get empty arrays.

leg.forecast.get_adjusted_marginal_revenue()
{'Y0': array([], dtype=float64),
 'Y1': array([], dtype=float64),
 'Y2': array([], dtype=float64),
 'Y3': array([], dtype=float64),
 'Y4': array([], dtype=float64),
 'Y5': array([], dtype=float64)}

However, for more sophisticated forecasts that incorporate elasticity, this marginal revenue may be different, as the forecast may include some probability that some customers will buy-up to higher classes if their preferred class is not available. We will want to adjust the marginal revenue to account for this. Moreover, since the sellup rates will vary over timeframes, the adjusted_marginal_revenue will also, so this output is a vector for each booking class (with length equal to the number of timeframes), not just a single value.

Here, we’ll construct a hybrid-conditional forecast as an example.

f5 = core.Frat5(
    name="demo",
    values={
        63: 1.2,
        56: 1.2,
        49: 1.3,
        42: 1.3,
        35: 1.4,
        31: 1.5,
        28: 1.5,
        24: 1.6,
        21: 2.0,
        17: 2.3,
        14: 2.7,
        10: 2.8,
        7: 2.9,
        5: 2.9,
        3: 3.0,
        1: 3.0,
    },
)
leg.forecast.detruncate_demand(dcp_index=0, algorithm="EM", which_data="yieldable")
leg.forecast.compute_simple_fare_adjustments(algorithm="mr", frat5=f5, scale_factor=0.5)
leg.forecast.compute_conditional_q_forecast(dcp_index=0, frat5=f5)
leg.forecast.allocate_q_demand(f5, 0, allocation_algorithm="tf")
leg.forecast.compute_forecasts(dcp_index=0, algorithm="additive_pickup", recompute=True, which_data="yieldable")
leg.forecast.compute_fare_adjustments("mr", f5, weighted_by_ratio=True, scale_factor=0.5)
leg.forecast.combine_forecasts(0, rollup_algorithm="tf")

Now we will have meaningful dynamic adjusted marginal revenues in the output.

leg.forecast.get_adjusted_marginal_revenue()
{'Y0': array([300., 300., 300., 300., 300., 300., 300., 300., 300., 300., 300.,
        300., 300., 300., 300., 300.]),
 'Y1': array([237.23555007, 237.09874184, 235.42724404, 235.22539869,
        233.18982771, 231.45061659, 231.37291206, 229.95155125,
        223.01924828, 220.22476162, 216.75057595, 214.34341875,
        204.24020592, 191.31886509,  95.73049591,  95.73049591]),
 'Y2': array([184.67662544, 184.50496602, 181.5919185 , 181.46293851,
        178.24491155, 175.42607326, 175.45009463, 172.67364764,
        162.84623515, 160.05444723, 154.08275402, 138.85026287,
         52.94397112,  52.94397112,          nan,          nan]),
 'Y3': array([127.35190083, 127.49510441, 121.03496114, 121.03165209,
        114.88229528, 108.70328617, 108.32519265, 101.36216264,
         77.50399202,  59.79323384,  17.37092152,  10.15744632,
                 nan,          nan,          nan,          nan]),
 'Y4': array([113.04452027, 112.99714146, 111.96909976, 111.72806257,
        110.23478744, 108.32948408, 107.56516824, 103.14379759,
         42.86524796,  21.22482234,          nan,          nan,
                 nan,          nan,          nan,          nan]),
 'Y5': array([85.57304959, 85.57304959, 78.35957439, 78.35957439, 71.14609918,
        63.93262398, 63.93262398, 56.71914877,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan])}

Modifiers

Sometimes, we want to leave a forecasting algorithm basically untouched, but apply some modifications to the forecast. Currently, PassengerSim offers one “baked in” modification: a forecast multiplier. The multiplier is applied to the ForecastGroup separate from generating the forecast, and it’s “sticky”, in that it will stay until cleared.

Let’s consider an example. Before we begin adding multipliers, we can check what a piece of the existing forecast looks like:

leg.forecast.get_mean_to_departure()['Y0'][:5]
array([32.48563051, 30.86671524, 29.51883363, 28.38391808, 27.40503438])

Now we’ll add a forecast multiplier, using adjust_forecast_means.

leg.forecast.adjust_forecast_means(ratio=1.1)
1

If we check again what that piece of the forecast looks like, we’ll see it’s gone up by the ratio we set.

leg.forecast.get_mean_to_departure()['Y0'][:5]
array([35.73419356, 33.95338677, 32.470717  , 31.22230989, 30.14553782])

We can now make changes to the forecast, setting or overriding values as desired.

leg.forecast.set_mean_to_departure(
    {'Y0': np.array([10, 9.5, 9, 8.5, 8, 7.5, 7, 6.5, 6, 5.5, 5, 4.5, 4, 3.5, 3, 2.5])},
    partial_ok=True
)

Whatever the forecast is, whether set by an internal PassengerSim algorithm or manually by the user, the output of the forecast is scaled by the assigned multiplier. So what we get out now that the multiplier is set is not the same as what we put in, it’s scaled up.

leg.forecast.get_mean_to_departure()['Y0'][:5]
array([11.  , 10.45,  9.9 ,  9.35,  8.8 ])

This also applies to the “now” values given by get_current_mean and get_current_stdev. You’ll recall we had pinned the current time on this forecast to 62.5 days prior to departure, and that timing pin still remains, so we get the scaled version of the interpolated value when we access the current mean:

leg.forecast.get_current_mean()['Y0']
10.960714285714285

If we want to observe what the current forecast multipliers are, separately from the forecast values, we can do so.

leg.forecast.get_forecast_means_adjustment()
{'Y0': 1.1, 'Y1': 1.1, 'Y2': 1.1, 'Y3': 1.1, 'Y4': 1.1, 'Y5': 1.1}

By default, the adjust_forecast_means will stack the multipliers, so if we run the same adjustment again we will get an even larger multiplier.

leg.forecast.adjust_forecast_means(ratio=1.1)
leg.forecast.get_forecast_means_adjustment()
{'Y0': 1.2100000000000002,
 'Y1': 1.2100000000000002,
 'Y2': 1.2100000000000002,
 'Y3': 1.2100000000000002,
 'Y4': 1.2100000000000002,
 'Y5': 1.2100000000000002}

We can also prevent the stacking if desired, and simply assign a specific value to the multiplier.

leg.forecast.adjust_forecast_means(ratio=1.618, stack=False)
leg.forecast.get_forecast_means_adjustment()
{'Y0': 1.618, 'Y1': 1.618, 'Y2': 1.618, 'Y3': 1.618, 'Y4': 1.618, 'Y5': 1.618}

When a forecast multiplier is set, it will scale the mean and std dev of the outputs of forecasts.

leg.forecast.get_mean_to_departure()['Y0']
array([16.18 , 15.371, 14.562, 13.753, 12.944, 12.135, 11.326, 10.517,
        9.708,  8.899,  8.09 ,  7.281,  6.472,  5.663,  4.854,  4.045])

By passing use_multiplier=False to the getter, we can access the original unscaled forecasts.

leg.forecast.get_mean_to_departure(use_multiplier=False)['Y0']
array([10. ,  9.5,  9. ,  8.5,  8. ,  7.5,  7. ,  6.5,  6. ,  5.5,  5. ,
        4.5,  4. ,  3.5,  3. ,  2.5])

If we want to wipe out the forecast means adjustment, we can clear it like this:

leg.forecast.clear_forecast_means_adjustment()

Then, as you might expect, there won’t be any “adjustments” on this forecast.

leg.forecast.get_forecast_means_adjustment()
{}

If you are manipulating forecast multipliers within a sample day based on partial sales data from that day (for example see the booked load factor heuristic) then you will want to be sure to call clear_forecast_means_adjustment at the beginning of each new sample. If on the other hand you are making adjustments because you want to introduce a persistent bias up or down on the forecasts, then you can choose not to reset between sample days.

Custom Forecast

Just like for the custom detruncation, a user can implement any custom forecast desired. Simply build the forecast using any algorithm you choose, and any available input you choose – the history on the Leg or Path, on other similar legs or paths, or other collected simulation data. Once constructed, you’ll need to have a set of forecast “outputs” in exactly the same format as the outputs that a regular PassengerSim forecast provides. You can inject those outputs into the relevant ForecastGroup object, so they can be used by downstream optimization engines that expect to have forecast outputs available. For each get_* output method, there is an equivalent set_* method that allows you to inject these values.

For example, we can inject some crazy mean_to_departure values into our leg forecast.

leg.forecast.set_mean_to_departure(
    {
        "Y0": np.log(np.arange(18, 2, -1)),
        "Y1": np.log(np.arange(18, 2, -1) + 2),
        "Y2": [8.1] * 2 + [0.8] * 7 + [6.1] * 7,
        "Y3": (np.cos(np.arange(16) / 4 * np.pi) + 2.2) * 4,
        "Y4": np.maximum(np.linspace(12, -7, 16), 0),
        "Y5": np.maximum(np.arange(16, -16, -2), 0),
    }
)

Then if we subsequently go to access these values through the usual output getters, they are returned to us.

leg.forecast.get_mean_to_departure()
{'Y0': array([2.89037176, 2.83321334, 2.77258872, 2.7080502 , 2.63905733,
        2.56494936, 2.48490665, 2.39789527, 2.30258509, 2.19722458,
        2.07944154, 1.94591015, 1.79175947, 1.60943791, 1.38629436,
        1.09861229]),
 'Y1': array([2.99573227, 2.94443898, 2.89037176, 2.83321334, 2.77258872,
        2.7080502 , 2.63905733, 2.56494936, 2.48490665, 2.39789527,
        2.30258509, 2.19722458, 2.07944154, 1.94591015, 1.79175947,
        1.60943791]),
 'Y2': array([8.1, 8.1, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 6.1, 6.1, 6.1, 6.1,
        6.1, 6.1, 6.1]),
 'Y3': array([12.8       , 11.62842712,  8.8       ,  5.97157288,  4.8       ,
         5.97157288,  8.8       , 11.62842712, 12.8       , 11.62842712,
         8.8       ,  5.97157288,  4.8       ,  5.97157288,  8.8       ,
        11.62842712]),
 'Y4': array([12.        , 10.73333333,  9.46666667,  8.2       ,  6.93333333,
         5.66666667,  4.4       ,  3.13333333,  1.86666667,  0.6       ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ]),
 'Y5': array([16., 14., 12., 10.,  8.,  6.,  4.,  2.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.])}
leg.forecast.get_forecast_data(summarize_history=True).dashboard()

Just as for the other forecast outputs, we could also construct our own estimates of adjusted marginal revenue and inject them for use by downstream RM steps.

leg.forecast.set_adjusted_marginal_revenue(
    {
        "Y0": np.array([300.0] * 16),
        "Y1": np.array([237.0] * 16),
        "Y2": np.array([184.0] * 14 + [np.nan] * 2),
        "Y3": np.array([123.0] * 12 + [np.nan] * 4),
        "Y4": np.array([113.0] * 10 + [np.nan] * 6),
        "Y5": np.array([85.0] * 8 + [np.nan] * 8),
    }
)
leg.forecast.get_adjusted_marginal_revenue()
{'Y0': array([300., 300., 300., 300., 300., 300., 300., 300., 300., 300., 300.,
        300., 300., 300., 300., 300.]),
 'Y1': array([237., 237., 237., 237., 237., 237., 237., 237., 237., 237., 237.,
        237., 237., 237., 237., 237.]),
 'Y2': array([184., 184., 184., 184., 184., 184., 184., 184., 184., 184., 184.,
        184., 184., 184.,  nan,  nan]),
 'Y3': array([123., 123., 123., 123., 123., 123., 123., 123., 123., 123., 123.,
        123.,  nan,  nan,  nan,  nan]),
 'Y4': array([113., 113., 113., 113., 113., 113., 113., 113., 113., 113.,  nan,
         nan,  nan,  nan,  nan,  nan]),
 'Y5': array([85., 85., 85., 85., 85., 85., 85., 85., nan, nan, nan, nan, nan,
        nan, nan, nan])}

You might notice that the value we put in for mean_to_departure are really crazy: they don’t monotonically decrease as the departure approaches as we would normally expect. And they are entirely inconsistent with the mean_in_timeframe results, which we didn’t change. This flexibility is by design – the onus is on the user to ensure consistent and appropriate values, or to decide for whatever reason to violate consistency or appropriateness. Plenty of revenue management tools have been built to implement algorithms that are intentionally broken, simply because they ultimately work. The joy of PassengerSim is that you can do exactly that, and then dissect the results to try to figure out why.