Missing data imputation

The following functions are provided to impute missing time series data:

torchtime.impute.replace_missing(input, fill, select=None)[source]

Replace missing data with a fixed value by channel.

Imputes missing data by replacing all NaNs with a fixed value by channel. Fill values are specified by the fill argument. All channels are imputed by default, however a subset can be imputed by passing the indices to select.

A common choice of fill is the mean of each channel in the training data. Under this approach, no knowledge of the time series at times t > i is required when imputing values at time i. This is essential if you are developing a model that will make online predictions.

Parameters:
  • input (Tensor) – The tensor to impute. The final dimension must hold channel data.

  • fill (Tensor) – Fill values for each channel in the same order as the data. fill must be the same length as the number of channels to be imputed i.e. the number of channels in the data or the length of select if shorter.

  • select (Optional[Tensor]) – Indices for the channels to be imputed (by default all channels are imputed).

Return type:

Tensor

Returns:

Imputed time series.

torchtime.impute.forward_impute(input, fill=None, select=None)[source]

Replace missing data with last observation carried forward.

Missing data (NaNs) are replaced by the previous observation in the channel.

If the initial value(s) of a channel is NaN this is replaced with the respective value in fill (only required if an initial value is NaN). All channels are imputed by default, however a subset can be imputed by passing the indices to select.

A common choice of fill is the mean of each channel in the training data. Under this approach, no knowledge of the time series at times t > i is required when imputing values at time i. This is essential if you are developing a model that will make online predictions.

Note

Only input tensors with 3 or fewer dimensions are currently supported. The final dimension must hold channel data.

Parameters:
  • input (Tensor) – The tensor to impute. The final dimension must hold channel data.

  • fill (Optional[Tensor]) – Fill values for each channel in the same order as the data. fill must be the same length as the number of channels to be imputed i.e. the number of channels in the data or the length of select if shorter.

  • select (Optional[Tensor]) – Indices for the channels to be imputed (by default all channels are imputed).

Return type:

Tensor

Returns:

Imputed time series.