BaseITS package

Submodules

BaseITS.custom_transform module

class BaseITS.custom_transform.CustomTransform(columns: list, seasonally_adjusted: bool = True, var_name: str = 'month', nfreq: int = 2, period: int = 12, fit_intercept: bool = False)[source]

Bases: BaseEstimator, TransformerMixin

Class to transform dataframe for Poisson Regression by adding harmonic.

Parameters
  • BaseEstimator (Sklearn) – Base class for all estimators in scikit-learn.

  • TransformerMixin (Sklearn) – Mixin class for all transformers in scikit-learn.

fit(X: DataFrame, y: Optional[Series] = None)[source]

Function to fit the model

Parameters
  • X (_type_) – _description_

  • y (_type_, optional) – _description_. Defaults to None.

Returns

_description_

Return type

_type_

transform(X: DataFrame, y: Optional[Series] = None)[source]

Function to transform the variables

Parameters
  • X (pd.DataFrame) – Dataframe with the harmonic inputs

  • y (pd.Series, optional) – Series of the outcome variable. Defaults to None.

Returns

_description_

Return type

pd.DataFrame

BaseITS.metrics module

class BaseITS.metrics.Metrics[source]

Bases: object

Class to generate metrics from the outputs of the models.

Returns

Mean absolute percentage error before the intervention mape_after: Mean absolute percentage error after the intervention

actual_mean_before: Actual mean before the intervention predicted_mean_before: Predicted mean before the intervention actual_mean_after: Actual mean after the intervention predicted_mean_after: Predicted mean after the intervention

actual_median_before: Actual median before the intervention predicted_median_before: Predicted median before the intervention actual_median_after: Actual median after the intervention predicted_median_after: Predicted median after the intervention

mean_change_before: Mean change before the intervention wilcoxon_change_before: Wilcoxon change before the intervention mean_change_after: Mean change after the intervention wilcoxon_change_after: Wilcoxon change afte the intervention

change_conf_int_before: Confidence interval change before the intervention change_conf_int_after: Confidence interval change before the intervention

mean_percent_change_before: Mean percentage change before the intervention wilcoxon_percent_change_before: Wilcoxon percentage change before the intervention mean_percent_change_after: Mean percentage change after the intervention wilcoxon_percent_change_after: Wilcoxon percentage change after the intervention

percent_change_conf_int_before: Confidence interval percentage change before the intervention percent_change_conf_int_after: Confidence interval percentage change after the intervention

Return type

mape_before

get_forecast_metrics(location: str, outcome: str, forecast: {}, prediction_start_date: str, prediction_end_date: str)[source]

Function to get the metrics from the forecast.

Parameters
  • location (str) – Geographical unit

  • outcome (str) – Outcome measure

  • forecast (dict) – dictionary with the prophet forecast output for the geographical unit and outcome

  • prediction_start_date (datetime) – Prediction start date

  • prediction_end_date (datetime) – Prediction end date

Returns

Dictionary with generated metrics

Return type

dict

BaseITS.model_tuning module

class BaseITS.model_tuning.ModelTuning(cutoff_start: str = '2019-02-28', cutoff_end: str = '2019-10-31', param_grid: dict = {'changepoint_prior_scale': [0.001, 0.05], 'seasonality_mode': ['additive', 'multiplicative'], 'seasonality_prior_scale': [0.1, 10.0]})[source]

Bases: object

Class for tuning Hyperparameters for prophet model. No implementation for Poisson Regression because it’s a basic linear regression model.

Parameters
  • cutoff_start (str, optional) – start date for tuning data . Defaults to “2019-02-28”.

  • cutoff_end (str, optional) – end date for tuning data. Defaults to “2019-10-31”.

  • param_grid (dict, optional) – Dictionary with the parameters to be tuned. Defaults to { “changepoint_prior_scale”: [0.001, 0.05], “seasonality_prior_scale”: [0.1, 10.0], “seasonality_mode”: [“additive”, “multiplicative”], }.

tune_hyperparameters(df: DataFrame, param_grid: Optional[dict] = None)[source]

Function to tune the hyperparameters

Parameters
  • df (pd.DataFrame) – Dataframe with the data to be tuned

  • param_grid (dict, optional) – Parameters to be tuned. If None, defaults to the one provided in init(). Defaults to None.

Returns

Dataframe with the optimal parameters.

Return type

pd.DataFrame

BaseITS.plotting module

class BaseITS.plotting.Plotting(intervention_end_date: str, forecast: {}, data: DataFrame, outcome_labels: {} = {}, file_path: str = 'plots/')[source]

Bases: object

Class to handle the plotting given a fixed expectation of inputs. This class is an addition, Users can implement more plotting functions to visualize their data & outputs.

plot_sphaghetti(...)[source]

Line plot with all the provided outcomes and locations.

plot_cumulative(...)[source]

Cumulative plots.

plot_count_its(...)[source]

Counts plots.

plot_percent_change(...)[source]

Percentage plots.

plot_diff(...)[source]

Difference of values plots.

plot_boxplots(...)[source]

Boxplots

plots_metrics_distribution(...)[source]

Kernel Density Estimate plot

plot_boxplots(x: str, y: str, title: str)[source]

Function to plot boxplots

Parameters
  • x (str) – date column

  • y (str) – data value column to be plotted

  • title (str) – title of plot

plot_count_its()[source]

Function to plot the forecast counts.

@params: unit: Geographical unit (string) outcome: Outcome measure (string) forecast: Dataframe with the prophet forecast output for the geographical unit and outcome(pd.DataFrame) prediction_start_date: Prediction start date (datetime) prediction_end_date: Prediction end date (datetime) normalise: Data normalised (boolean)

plot_cumulative()[source]

Function to plot the cumulative forecast.

@params: unit: Geographical/Organizational unit (string) outcome: Outcome measure (string) forecast: Dataframe with the prophet forecast output for the geographical/organizational unit and outcome (pd.DataFrame) prediction_start_date: Prediction start date (datetime) prediction_end_date: Prediction end date (datetime) normalise: Data normalised (boolean)

plot_cv_metric(location: str, outcome: str, df_cv: DataFrame, prediction_start_date: str, outcome_labels: dict)[source]

Function to plot the cross-validation metrics.

@params: location: Geographical unit (string) outcome: Outcome measure (string) df_cv: Dataframe with date and the outcome value for the geographical unit and outcome. (pd.DataFrame) seasonality_mode: tuned prophet model parameter (string) changepoint_prior_scale: tuned prophet model parameter (double) seasonality_prior_scale: tuned prophet model parameter (double) cutoff_start: Prediction start date (datetime)

plot_diff(dataset: DataFrame, prediction_start_date: str)[source]

Function to plot the difference of the outcomes and the predicted values

@params: data: Dataframe with the metrics from the forecast (pd.DataFrame) prediction_start_date: Prediction start date (str)

plot_percent_change(dataset: DataFrame, prediction_start_date: str)[source]

Function to plot the percentage change of the outcomes and the predicted values

@params: data: Dataframe with the selected metrics from the forecast (pd.DataFrame) prediction_start_date: Prediction start date (str)

plot_sphaghetti(id_var: str, x_var: str, y_var: str, title: str)[source]

Plot sphaghetti

@params data: data in wide format (dataframe) id_var: column name of unique ids of individuals (str) i.e Regions, Districts x_var: name of x variable (str) i.e Date column name y_var: name of y variable (str) i.e outcome measure to be plotted column name i.e Diabetes

plots_metrics_distribution(prediction_start_date: str, prediction_end_date: str)[source]

Function to plot metrics distribution before and after interruption

@params:

prediction_start_date: Prediction start date (str) prediction_end_date: Prediction end date (str)

BaseITS.poisson_regression module

class BaseITS.poisson_regression.PoissonITS(fit_intercept=True)[source]

Bases: BaseEstimator

An SK-learn wrapper class to statsmodel’s poisson regression.

Parameters
  • fit_intercept – boolean, default= True

  • fitted – boolean, parameter to indicate if intercept should be fitted.

fit(...)[source]
predict(...)[source]
fit_predict(...)[source]
summary(...)[source]
fit(X: DataFrame, y: Series, offset: Optional[float] = None)[source]

A reference implementation of a fitting function.

Parameters
  • X – {array-like, sparse matrix}, shape (n_samples, n_features) The training input samples.

  • y – array-like, shape (n_samples,) or (n_samples, n_outputs) The target values (class labels in classification, real numbers in regression).

  • offset – _description_. Defaults to None.

fit_predict()[source]
predict(X: DataFrame, prediction_df: DataFrame, factor: int = 1)[source]

A reference implementation of a predicting function.

Parameters
  • X – (array-like, sparse matrix), shape (n_samples, n_features) The training input samples tranformed earlier.

  • prediction_df (pd.DataFrame) – Prediction dataframe before transformation.

  • factor (int) – value to standardize the data

Returns

ndarray, shape (n_samples,)

Returns an array of ones.

Return type

y

summary()[source]

BaseITS.pre_processing module

BaseITS.pre_processing.aggregation_long_df_type(df: DataFrame, location_col_name: str, date_col_name: str, outcome_col_name: str, outcome_value_col_name: str)[source]

Function to aggregate outcome values in a long dataframe type based on the date, outcome and location

Parameters
  • df (pd.DataFrame) – Long dataframe with the data

  • location_col_name (str) – column name of the locations in the dataframe

  • date_col_name (str) – column name of the date in the dataframe

  • outcome_col_name (str) – column name of the outcome in the dataframe

  • outcome_value_col_name (str) – column name of the outcome values in the dataframe

Returns

Dataframe with aggregated counts per location, date and outcome

Return type

pd.DataFrmae

BaseITS.pre_processing.aggregation_wide_df_type(df: DataFrame, location_col_name: str, date_col_name: str, outcome_cols: list)[source]

Function to aggregate outcome values in a wide dataframe type based on the date, outcome and location

Parameters
  • df (pd.DataFrame) – Wide dataframe type

  • location_col_name (str) – column name of the location in the dataframe

  • date_col_name (str) – date column name in the dataframe

  • outcome_cols (list) – list of the outcome column names

Returns

Dataframe with aggregated counts per location, date and outcome

Return type

pd.DataFrame

BaseITS.pre_processing.align_prophet_naming_convection(df: DataFrame, date_col_name: str, y_col_name: str, verbose=False)[source]

Function to align column names with ones expected by prophet model

Parameters
  • df (pd.DataFrame) – dataframe with the columns

  • x_col_name (str) – outcome column

  • y_col_name (str) – date column

Returns

dataframe with renamed columns to the expected prophet naming convection

Return type

pd.DataFrame

BaseITS.pre_processing.create_log_offset(df: DataFrame, ofset_column: str)[source]

Create offset for the poisson regression forecast model: (log)

Parameters
  • df (pd.DataFrame) – dataframe with the offset column

  • ofset_column (str) – column of the offset

Returns

calcutated offset

Return type

pd.Series

BaseITS.pre_processing.dates_validation(df: DataFrame, date_col_name: str)[source]

Function to validate dates to datetime format

Parameters
  • df (pd.DataFrame) – Dataframe with the data

  • date_col_name (str) – column with the dates

Returns

Series with the date_col_name with datetime datatype

Return type

pd.series

BaseITS.pre_processing.str_date_validate(date_text: str)[source]

Function to validate strings that they are in the correct datetime format for conversion.

Parameters

date_text (str) – String with the date

Raises

ValueError – Raises an error incase wrong string date format is provided

Returns

Datetime converted value in the format ( ‘%Y-%m-%d’)

Return type

datetime

BaseITS.prophet_model module

class BaseITS.prophet_model.ProphetITS(df=None)[source]

Bases: object

A wrapper class that uses Prophet https://pypi.org/project/prophet/ to forecast

fit(df: DataFrame, seasonality_mode: str = 'additive', seasonality_prior_scale: float = 0.5, changepoint_prior_scale: float = 0.5)[source]

Fit function of the wrapper Prophet class

Parameters
  • df (pd.DataFrame) – DataFrame with the training data.

  • seasonality_mode (str, optional) – Seasonality mode experienced by the data . Defaults to “additive”.

  • seasonality_prior_scale (float, optional) – seasonality prior scale that has been optimised for the dataset. Defaults to 0.5.

  • changepoint_prior_scale (float, optional) – changepoint prior scale that has been optimised for the dataset. Defaults to 0.5.

Returns

instance of the class that is fitted using the prophet model.

Return type

self(ProphetITS)

fit_predict(df: DataFrame, seasonality_mode='additive', seasonality_prior_scale=0.5, changepoint_prior_scale=0.5)[source]
predict(df: DataFrame)[source]

A reference implementation of a predicting function. :param df: :type df: (pd.DataFrame) DataFrame with the testing data. Both X and y

Returns

forecast

Return type

(pd.DataFrame) Returns a dataframe with forecast

summary()[source]

BaseITS.utils module

BaseITS.utils.check_dataset_format(df: DataFrame, outcomes: list, col_name: str = 'trial')[source]

Function to check dataset format. Not used as of now. Only long dataframe accepted in V1

Parameters
  • df (pd.DataFrame) – DataFrame with the data

  • outcomes (list) – outcomes to be used

  • col_name (str, optional) – column name provided in the init. Defaults to “trial”.

Raises

Exception – Excepetion rasied if outcomes not in the dataframe

Returns

dataset_format (long,wide)

Return type

str

BaseITS.utils.extract_inputs(data: dict)[source]

Function to extract inputs

Parameters

data (dict) – Dictionary with the looped results.

Returns

tuple with the keys of the dictionary

Return type

tuple

BaseITS.utils.save_plots(file_path: str, plot_name: str)[source]

Function to save plots

Parameters
  • file_path (str) – File path

  • plot_name (str) – name of the plot

BaseITS.wrapper_class module

class BaseITS.wrapper_class.BaseITS(outcome: Optional[list] = None, location: Optional[list] = None, interruption_date: Optional[list] = None, model: Optional[list] = None, verbose: bool = False)[source]

Bases: object

This class assumes that the data is already pre-processed to the required format. If not pre-processed, the user should use the functions in pre-processing & custom_transform to pre-process their data Class supports wide-format datasets only. Refer to readme on how to structure your dataset.

Args:

outcome (list, optional): List of the outcome labels. Defaults to None. location (list, optional): List of the location labels. Defaults to None. interruption_date (list, optional): List of the interruption dates. Defaults to None. model (list, optional): List of the models labels. Defaults to None. verbose (bool, optional): Boolean variable to log outputs. Defaults to False.

Methods:

fit(df: pd.DataFrame, X: pd.Series, y: pd.Series, offset: float = None) predict( df: pd.DataFrame, X: pd.Series, y: pd.Series, offset: float = None) fit_predict(df: pd.DataFrame, X: pd.Series, y: pd.Series)

fit(df: DataFrame, X: Series, y: Series, offset: Optional[float] = None)[source]

Function called by user to fit their models.(prophet or poisson)

Parameters
  • df (pd.DataFrame) – DataFrame with the data

  • offset (float, optional) – _description_. Defaults to None.

Returns

Fitted object of the class.

Return type

BaseITS

fit_predict(df: DataFrame, X: Series, y: Series)[source]

Function to simultaneously fit and predict a function using the prophet model. This function does not work for poisson-regression as the data needs to be pre-processed using the custom-transform class. TODO Implement this fuction to check if user has already preprocessed the poisson data.

Parameters

df (pd.DataFrame) – Dataset

Raises
  • NotImplementedError – Raises this error if user tries to use the poisson regression model

  • ValueError – Raises an error if the user provided model not in this list[prophet, poisson]

Returns

DataFrame with the forecasted results.

Return type

pd.DataFrame

pool_fit_predict(df: DataFrame, num_threads: int = 12)[source]

Private function to fit and predictmore than one instance of outcomes, locations, models, intervention_dates

Parameters
  • num_threads (cpu_count) – cpu_count

  • df (pd.DataFrame) – Dataset

Returns

DataFrame with the forecast results

Return type

pd.DataFrame

predict(df: DataFrame, X: Series, y: Series, offset: Optional[float] = None)[source]

Function used to forecast using the previously fitted models

Parameters
  • df (pd.DataFrame) – Dataset with the data to be used

  • offset (float, optional) – _description_. Defaults to None.

summary()[source]

Function to generate the summary of the models

Returns

summary returned by the model

Return type

_type_

Module contents