BaseITS package
Submodules
BaseITS.custom_transform module
- class BaseITS.custom_transform.CustomTransform(columns: list, seasonally_adjusted: bool = True, var_name: str = 'month', nfreq: int = 2, period: int = 12, fit_intercept: bool = False)[source]
Bases:
BaseEstimator
,TransformerMixin
Class to transform dataframe for Poisson Regression by adding harmonic.
- Parameters
BaseEstimator (Sklearn) – Base class for all estimators in scikit-learn.
TransformerMixin (Sklearn) – Mixin class for all transformers in scikit-learn.
BaseITS.metrics module
- class BaseITS.metrics.Metrics[source]
Bases:
object
Class to generate metrics from the outputs of the models.
- Returns
Mean absolute percentage error before the intervention mape_after: Mean absolute percentage error after the intervention
actual_mean_before: Actual mean before the intervention predicted_mean_before: Predicted mean before the intervention actual_mean_after: Actual mean after the intervention predicted_mean_after: Predicted mean after the intervention
actual_median_before: Actual median before the intervention predicted_median_before: Predicted median before the intervention actual_median_after: Actual median after the intervention predicted_median_after: Predicted median after the intervention
mean_change_before: Mean change before the intervention wilcoxon_change_before: Wilcoxon change before the intervention mean_change_after: Mean change after the intervention wilcoxon_change_after: Wilcoxon change afte the intervention
change_conf_int_before: Confidence interval change before the intervention change_conf_int_after: Confidence interval change before the intervention
mean_percent_change_before: Mean percentage change before the intervention wilcoxon_percent_change_before: Wilcoxon percentage change before the intervention mean_percent_change_after: Mean percentage change after the intervention wilcoxon_percent_change_after: Wilcoxon percentage change after the intervention
percent_change_conf_int_before: Confidence interval percentage change before the intervention percent_change_conf_int_after: Confidence interval percentage change after the intervention
- Return type
mape_before
- get_forecast_metrics(location: str, outcome: str, forecast: {}, prediction_start_date: str, prediction_end_date: str)[source]
Function to get the metrics from the forecast.
- Parameters
location (str) – Geographical unit
outcome (str) – Outcome measure
forecast (dict) – dictionary with the prophet forecast output for the geographical unit and outcome
prediction_start_date (datetime) – Prediction start date
prediction_end_date (datetime) – Prediction end date
- Returns
Dictionary with generated metrics
- Return type
dict
BaseITS.model_tuning module
- class BaseITS.model_tuning.ModelTuning(cutoff_start: str = '2019-02-28', cutoff_end: str = '2019-10-31', param_grid: dict = {'changepoint_prior_scale': [0.001, 0.05], 'seasonality_mode': ['additive', 'multiplicative'], 'seasonality_prior_scale': [0.1, 10.0]})[source]
Bases:
object
Class for tuning Hyperparameters for prophet model. No implementation for Poisson Regression because it’s a basic linear regression model.
- Parameters
cutoff_start (str, optional) – start date for tuning data . Defaults to “2019-02-28”.
cutoff_end (str, optional) – end date for tuning data. Defaults to “2019-10-31”.
param_grid (dict, optional) – Dictionary with the parameters to be tuned. Defaults to { “changepoint_prior_scale”: [0.001, 0.05], “seasonality_prior_scale”: [0.1, 10.0], “seasonality_mode”: [“additive”, “multiplicative”], }.
- tune_hyperparameters(df: DataFrame, param_grid: Optional[dict] = None)[source]
Function to tune the hyperparameters
- Parameters
df (pd.DataFrame) – Dataframe with the data to be tuned
param_grid (dict, optional) – Parameters to be tuned. If None, defaults to the one provided in init(). Defaults to None.
- Returns
Dataframe with the optimal parameters.
- Return type
pd.DataFrame
BaseITS.plotting module
- class BaseITS.plotting.Plotting(intervention_end_date: str, forecast: {}, data: DataFrame, outcome_labels: {} = {}, file_path: str = 'plots/')[source]
Bases:
object
Class to handle the plotting given a fixed expectation of inputs. This class is an addition, Users can implement more plotting functions to visualize their data & outputs.
- plot_boxplots(x: str, y: str, title: str)[source]
Function to plot boxplots
- Parameters
x (str) – date column
y (str) – data value column to be plotted
title (str) – title of plot
- plot_count_its()[source]
Function to plot the forecast counts.
@params: unit: Geographical unit (string) outcome: Outcome measure (string) forecast: Dataframe with the prophet forecast output for the geographical unit and outcome(pd.DataFrame) prediction_start_date: Prediction start date (datetime) prediction_end_date: Prediction end date (datetime) normalise: Data normalised (boolean)
- plot_cumulative()[source]
Function to plot the cumulative forecast.
@params: unit: Geographical/Organizational unit (string) outcome: Outcome measure (string) forecast: Dataframe with the prophet forecast output for the geographical/organizational unit and outcome (pd.DataFrame) prediction_start_date: Prediction start date (datetime) prediction_end_date: Prediction end date (datetime) normalise: Data normalised (boolean)
- plot_cv_metric(location: str, outcome: str, df_cv: DataFrame, prediction_start_date: str, outcome_labels: dict)[source]
Function to plot the cross-validation metrics.
@params: location: Geographical unit (string) outcome: Outcome measure (string) df_cv: Dataframe with date and the outcome value for the geographical unit and outcome. (pd.DataFrame) seasonality_mode: tuned prophet model parameter (string) changepoint_prior_scale: tuned prophet model parameter (double) seasonality_prior_scale: tuned prophet model parameter (double) cutoff_start: Prediction start date (datetime)
- plot_diff(dataset: DataFrame, prediction_start_date: str)[source]
Function to plot the difference of the outcomes and the predicted values
@params: data: Dataframe with the metrics from the forecast (pd.DataFrame) prediction_start_date: Prediction start date (str)
- plot_percent_change(dataset: DataFrame, prediction_start_date: str)[source]
Function to plot the percentage change of the outcomes and the predicted values
@params: data: Dataframe with the selected metrics from the forecast (pd.DataFrame) prediction_start_date: Prediction start date (str)
- plot_sphaghetti(id_var: str, x_var: str, y_var: str, title: str)[source]
Plot sphaghetti
@params data: data in wide format (dataframe) id_var: column name of unique ids of individuals (str) i.e Regions, Districts x_var: name of x variable (str) i.e Date column name y_var: name of y variable (str) i.e outcome measure to be plotted column name i.e Diabetes
BaseITS.poisson_regression module
- class BaseITS.poisson_regression.PoissonITS(fit_intercept=True)[source]
Bases:
BaseEstimator
An SK-learn wrapper class to statsmodel’s poisson regression.
- Parameters
fit_intercept – boolean, default= True
fitted – boolean, parameter to indicate if intercept should be fitted.
- fit(X: DataFrame, y: Series, offset: Optional[float] = None)[source]
A reference implementation of a fitting function.
- Parameters
X – {array-like, sparse matrix}, shape (n_samples, n_features) The training input samples.
y – array-like, shape (n_samples,) or (n_samples, n_outputs) The target values (class labels in classification, real numbers in regression).
offset – _description_. Defaults to None.
- predict(X: DataFrame, prediction_df: DataFrame, factor: int = 1)[source]
A reference implementation of a predicting function.
- Parameters
X – (array-like, sparse matrix), shape (n_samples, n_features) The training input samples tranformed earlier.
prediction_df (pd.DataFrame) – Prediction dataframe before transformation.
factor (int) – value to standardize the data
- Returns
- ndarray, shape (n_samples,)
Returns an array of ones.
- Return type
y
BaseITS.pre_processing module
- BaseITS.pre_processing.aggregation_long_df_type(df: DataFrame, location_col_name: str, date_col_name: str, outcome_col_name: str, outcome_value_col_name: str)[source]
Function to aggregate outcome values in a long dataframe type based on the date, outcome and location
- Parameters
df (pd.DataFrame) – Long dataframe with the data
location_col_name (str) – column name of the locations in the dataframe
date_col_name (str) – column name of the date in the dataframe
outcome_col_name (str) – column name of the outcome in the dataframe
outcome_value_col_name (str) – column name of the outcome values in the dataframe
- Returns
Dataframe with aggregated counts per location, date and outcome
- Return type
pd.DataFrmae
- BaseITS.pre_processing.aggregation_wide_df_type(df: DataFrame, location_col_name: str, date_col_name: str, outcome_cols: list)[source]
Function to aggregate outcome values in a wide dataframe type based on the date, outcome and location
- Parameters
df (pd.DataFrame) – Wide dataframe type
location_col_name (str) – column name of the location in the dataframe
date_col_name (str) – date column name in the dataframe
outcome_cols (list) – list of the outcome column names
- Returns
Dataframe with aggregated counts per location, date and outcome
- Return type
pd.DataFrame
- BaseITS.pre_processing.align_prophet_naming_convection(df: DataFrame, date_col_name: str, y_col_name: str, verbose=False)[source]
Function to align column names with ones expected by prophet model
- Parameters
df (pd.DataFrame) – dataframe with the columns
x_col_name (str) – outcome column
y_col_name (str) – date column
- Returns
dataframe with renamed columns to the expected prophet naming convection
- Return type
pd.DataFrame
- BaseITS.pre_processing.create_log_offset(df: DataFrame, ofset_column: str)[source]
Create offset for the poisson regression forecast model: (log)
- Parameters
df (pd.DataFrame) – dataframe with the offset column
ofset_column (str) – column of the offset
- Returns
calcutated offset
- Return type
pd.Series
- BaseITS.pre_processing.dates_validation(df: DataFrame, date_col_name: str)[source]
Function to validate dates to datetime format
- Parameters
df (pd.DataFrame) – Dataframe with the data
date_col_name (str) – column with the dates
- Returns
Series with the date_col_name with datetime datatype
- Return type
pd.series
- BaseITS.pre_processing.str_date_validate(date_text: str)[source]
Function to validate strings that they are in the correct datetime format for conversion.
- Parameters
date_text (str) – String with the date
- Raises
ValueError – Raises an error incase wrong string date format is provided
- Returns
Datetime converted value in the format ( ‘%Y-%m-%d’)
- Return type
datetime
BaseITS.prophet_model module
- class BaseITS.prophet_model.ProphetITS(df=None)[source]
Bases:
object
A wrapper class that uses Prophet https://pypi.org/project/prophet/ to forecast
- fit(df: DataFrame, seasonality_mode: str = 'additive', seasonality_prior_scale: float = 0.5, changepoint_prior_scale: float = 0.5)[source]
Fit function of the wrapper Prophet class
- Parameters
df (pd.DataFrame) – DataFrame with the training data.
seasonality_mode (str, optional) – Seasonality mode experienced by the data . Defaults to “additive”.
seasonality_prior_scale (float, optional) – seasonality prior scale that has been optimised for the dataset. Defaults to 0.5.
changepoint_prior_scale (float, optional) – changepoint prior scale that has been optimised for the dataset. Defaults to 0.5.
- Returns
instance of the class that is fitted using the prophet model.
- Return type
self(ProphetITS)
- fit_predict(df: DataFrame, seasonality_mode='additive', seasonality_prior_scale=0.5, changepoint_prior_scale=0.5)[source]
BaseITS.utils module
- BaseITS.utils.check_dataset_format(df: DataFrame, outcomes: list, col_name: str = 'trial')[source]
Function to check dataset format. Not used as of now. Only long dataframe accepted in V1
- Parameters
df (pd.DataFrame) – DataFrame with the data
outcomes (list) – outcomes to be used
col_name (str, optional) – column name provided in the init. Defaults to “trial”.
- Raises
Exception – Excepetion rasied if outcomes not in the dataframe
- Returns
dataset_format (long,wide)
- Return type
str
BaseITS.wrapper_class module
- class BaseITS.wrapper_class.BaseITS(outcome: Optional[list] = None, location: Optional[list] = None, interruption_date: Optional[list] = None, model: Optional[list] = None, verbose: bool = False)[source]
Bases:
object
This class assumes that the data is already pre-processed to the required format. If not pre-processed, the user should use the functions in pre-processing & custom_transform to pre-process their data Class supports wide-format datasets only. Refer to readme on how to structure your dataset.
- Args:
outcome (list, optional): List of the outcome labels. Defaults to None. location (list, optional): List of the location labels. Defaults to None. interruption_date (list, optional): List of the interruption dates. Defaults to None. model (list, optional): List of the models labels. Defaults to None. verbose (bool, optional): Boolean variable to log outputs. Defaults to False.
- Methods:
fit(df: pd.DataFrame, X: pd.Series, y: pd.Series, offset: float = None) predict( df: pd.DataFrame, X: pd.Series, y: pd.Series, offset: float = None) fit_predict(df: pd.DataFrame, X: pd.Series, y: pd.Series)
- fit(df: DataFrame, X: Series, y: Series, offset: Optional[float] = None)[source]
Function called by user to fit their models.(prophet or poisson)
- Parameters
df (pd.DataFrame) – DataFrame with the data
offset (float, optional) – _description_. Defaults to None.
- Returns
Fitted object of the class.
- Return type
- fit_predict(df: DataFrame, X: Series, y: Series)[source]
Function to simultaneously fit and predict a function using the prophet model. This function does not work for poisson-regression as the data needs to be pre-processed using the custom-transform class. TODO Implement this fuction to check if user has already preprocessed the poisson data.
- Parameters
df (pd.DataFrame) – Dataset
- Raises
NotImplementedError – Raises this error if user tries to use the poisson regression model
ValueError – Raises an error if the user provided model not in this list[prophet, poisson]
- Returns
DataFrame with the forecasted results.
- Return type
pd.DataFrame
- pool_fit_predict(df: DataFrame, num_threads: int = 12)[source]
Private function to fit and predictmore than one instance of outcomes, locations, models, intervention_dates
- Parameters
num_threads (cpu_count) – cpu_count
df (pd.DataFrame) – Dataset
- Returns
DataFrame with the forecast results
- Return type
pd.DataFrame