pylift.methods package

Submodules

pylift.methods.base module

class pylift.methods.base.BaseProxyMethod(df, transform_func, untransform_func, col_treatment='Treatment', col_outcome='Outcome', col_transformed_outcome='TransformedOutcome', col_policy=None, continuous_outcome='infer', random_state=2701, test_size=0.2, stratify=None, scoring_cutoff=1, scoring_method='aqini', sklearn_model=<Mock name='mock.XGBRegressor' id='139946520507336'>)

Bases: object

Provide common functionalities for all label transformation methods.

Requires an input function transform_func that transforms treatment and outcome into a single transformed_outcome. This is typically the TOT transformation, but can be whatever you want.

Also complete a number of tasks that enable use of the proxy method: save dataframe and important dataframe column names to class object, calculate the transformed outcome, create an untransform method that undoes transform.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing the features, test/control flag, and outcome.
  • transform_func (function) – Function that takes two keyword arguments, treatment and outcome, and outputs a transformed outcome.
  • untransform_func (function) – Function that inverts transform_func.
  • col_treatment (string, optional) – Name of the treatment column. Depends on input dataframe.
  • col_outcome (string, optional) – Name of the original outcome column. Depends on input dataframe.
  • col_transformed_outcome (string, optional) – Name of the new, transformed outcome. Can be whatever you want.
  • col_policy (string or float, optional) – Name of the column that indicates treatment policy (probability of treatment). If a float is given, the treatment policy is assumed to be even across all rows. If not given, it is assumed that application of treatment was randomly assigned with the same probability across the entire population.
  • continuous_outcome (“infer” or Bool, optional) – Flag that indicates whether or not the Outcome column is continuous. Inferred by default.
  • random_state (int) – Random seed for deterministic behavior.
  • test_size (float) – test_size parameter for skleran.metrics.train_test_split.
  • stratify (column name or anything that can be passed to parameter of same)
  • name in train_test_split, optional – If not None, stratify is used as input into train_test_split.
  • scoring_method (string or list, optional) – Either qini, aqini, cgains or max_ prepended to any of the previous values. Any strings available to the parameter scoring in sklearn.model_selection.RandomizedSearchCV can also be passed.
  • scoring_cutoff (float or dict, optional) – The fraction of observations used to score qini for hyperparam searches. E.g. if 0.4, the 40% of observations with the highest predicted uplift are used to determine the frost score in the randomized search scoring function. If a list of scoring_methods is passed, a dictionary can also be passed here, where the keys are the scoring_method strings and the values are the scoring cutoff for those specific methods.
  • sklearn_model (scikit-learn regressor) – Model used for grid searching and fitting.
NIV(feats_to_use=None, n_bins=10, n_iter=3)

Net information value, calculated for each feature averaged over n_iter bootstrappings of df.

Parameters:
  • feats_to_use (list, optional) – A list of features to use. If no list is specified, all features are used.
  • n_bins (int, optional) – Number of bins to use.
  • n_iter (int, optional) – Number of iterations.
Returns:

ax – The matplotlib axes handle.

Return type:

matplotlib.axes._subplots.AxesSubplot

NWOE(feats_to_use=None, n_bins=10)

Net weight of evidence.

Parameters:
  • feats_to_use (list, optional) – A list of features to use. If no list is specified, all features are used.
  • n_bins (int, optional) – Number of bins to use.
Returns:

ax – The matplotlib axes handle.

Return type:

matplotlib.axes._subplots.AxesSubplot

Grid search using skopt.BayesSearchCV

Any parameters typically associated with BayesSearchCV (see Scikit-Optimize documentation) can be passed as keyword arguments to this function.

The final dictionary used for the grid search is saved to

self.bayes_search_params. This is updated with any parameters that are passed.

Examples

# Passing kwargs. self.bayes_search(search_spaces={‘max_depth’:Integer(4, 6)}, refit=True)

fit(productionize=False, **kwargs)

A fit wrapper around any sklearn Regressor.

Any parameters typically associated with the model can be passed as keyword arguments to this function.

The sklearn model object is saved to self.model, or if productionize=True, self.model_final.

Parameters:productionize (boolean, optional) – If False, fits the model over the train set only. Otherwise, fits to all data available.

Grid search using sklearn.model_selection.GridSearchCV.

Any parameters typically associated with GridSearchCV (see sklearn documentation) can be passed as keyword arguments to this function.

The final dictionary used for the grid search is saved to self.grid_search_params. This is updated with any parameters that are passed.

Examples

# Passing kwargs. self.grid_search(param_grid={‘max_depth’:[2,3,5,10]}, refit=True)

noise_fit(iterations=10, n_bins=10, **kwargs)

Shuffle predictions to get a sense of the range of possible curves you might expect from fitting to noise.

Parameters:
  • iterations (int, optional) – Number of times to shuffle the data and retrain.
  • n_bins (int, optional) – Number of bins to use when calculating the qini curves.
plot(plot_type='cgains', ax=None, n_bins=None, show_noise_fits=False, noise_lines_kwargs={}, noise_band_kwargs={}, show_shuffle_fits=False, shuffle_lines_kwargs={}, shuffle_band_kwargs={}, shuffle_avg_line_kwargs={}, *args, **kwargs)

Function to plot all curves.

args and kwargs are passed to the default plot function, inherited from the UpliftEval class.

Parameters:
  • plot_type (string, optional) – Either ‘qini’, ‘aqini’, ‘uplift’, ‘cuplift’, or ‘balance’. ‘aqini’ refers to an adjusted qini plot, ‘cuplift’ gives a cumulative uplift plot. ‘balance’ gives the test-control balance for each of the bins. All others are self-explanatory.
  • ax (matplotlib.Axes) – Pass axes to allow for overlaying on top of existing plots.
  • n_bins (int, optional) – Number of bins to use for the main plot. This has no bearing on the shuffle or shuffle plots, which have to be calculated through their respective methods.
  • show_noise_fits (bool, optional) – Toggle the display of fits to random noise.
  • noise_lines_kwargs (dict, optional) – Kwargs to be passed to the lines that display the different curves for each noise fit iteration.
  • noise_band_kwargs (dict, optional) – Kwargs to be passed to the colored band that displays the standard deviation of the noise fit iterations.
  • show_shuffle_fits (bool, optional) – Toggle the display of fits with different train test split seeds.
  • shuffle_lines_kwargs (dict, optional) – Kwargs to be passed to the lines that display the different curves for each shuffle fit iteration.
  • shuffle_band_kwargs (dict, optional) – Kwargs to be passed to the colored band that displays the standard deviation of the shuffleped fit iterations.
  • shuffle_avg_line_kwargs (dict, optional) – Kwargs to be passed to the average line that displays the average value of the shuffleped fit iterations.

Randomized search using sklearn.model_selection.RandomizedSearchCV.

Any parameters typically associated with RandomizedSearchCV (see sklearn documentation) can be passed as keyword arguments to this function.

The final dictionary used for the randomized search is saved to self.randomized_search_params. This is updated with any parameters that are passed.

Examples

# Passing kwargs. self.randomized_search(param_distributions={‘max_depth’:[2,3,5,10]}, refit=True)

shuffle_fit(iterations=10, n_bins=20, params=None, transform_train=None, clear=False, plot_type='cgains', stratify=None, starting_seed=0, **kwargs)

Try the train-test split iterations times, and fit a model using params.

Parameters:
  • iterations (int) – Number of shuffle-fit sequences to run.
  • n_bins (int) – Number of bins for the resulting curve to have.
  • params (dict) – Dictionary of parameters to pass to each fit. If not given, will default to self.rand_search_.best_params_.
  • transform_train (func, optional) – A function that will be applied to the training data only. Extended functionality that may be useful if the distribution of the y-variable is heavy-tailed, and a transformation would produce a better model, but you still want to evaluate on the untransformed data.
  • clear (boolean, optional) – Data for the shuffle fits is saved in self.shuffle_fit_. If clear is True, this data is rewritten with each shuffle_fit iteration.
  • plot_type (string, optional) – Type of plot to show. Can be aqini, qini, cgains.
  • stratify (anything that can be passed to parameter of same name in train_test_split, optional) – If not None, stratify is used as input into train_test_split.
  • starting_seed (the random seed used for the first iteration of train_test_split. All subsequent iterations increment from this value.)

pylift.methods.derivatives module

class pylift.methods.derivatives.FlaggedFloat

Bases: float

Float subclass that retains a Treatment flag property.

save(outcome, treatment, p)
class pylift.methods.derivatives.TransformedOutcome(df, col_treatment='Treatment', col_outcome='Outcome', col_transformed_outcome='TransformedOutcome', col_policy=None, continuous_outcome='infer', random_state=2701, test_size=0.2, stratify=None, scoring_cutoff=1, sklearn_model=<Mock name='mock.XGBRegressor' id='139946520507336'>, scoring_method='cgains')

Bases: pylift.methods.base.BaseProxyMethod

Implement Transformed Outcome [Trees] method.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing the features, test/control flag, and outcome.
  • col_treatment (string, optional) – Name of the treatment column. Depends on input dataframe.
  • col_outcome (string, optional) – Name of the original outcome column. Depends on input dataframe.
  • col_transformed_outcome (string, optional) – Name of the new, transformed outcome. Can be whatever you want.
  • col_policy (string or float, optional) – Name of the column that indicates treatment policy (probability of treatment). If a float is given, the treatment policy is assumed to be even across all rows. If not given, it is assumed that application of treatment was randomly assigned with the same probability across the entire population.
  • continuous_outcome (“infer” or Bool, optional) – Flag that indicates whether or not the Outcome column is continuous. Inferred by default.
  • random_state (int) – Random seed for deterministic behavior.
  • test_size (float) – test_size parameter for skleran.metrics.train_test_split.
  • stratify (string or same format as parameter of same name in)
  • train_test_split, optional – If not None, stratify is used as input into train_test_split.
  • scoring_method (string or list, optional) – Either qini, aqini, cgains or max_ prepended to any of the previous values. Any strings available to the parameter scoring in sklearn.model_selection.RandomizedSearchCV can also be passed.
  • scoring_cutoff (float or dict, optional) – The fraction of observations used to score qini for hyperparam searches. E.g. if 0.4, the 40% of observations with the highest predicted uplift are used to determine the frost score in the randomized search scoring function. If a list of scoring_methods is passed, a dictionary can also be passed here, where the keys are the scoring_method strings and the values are the scoring cutoff for those specific methods.
  • sklearn_model (sklearn regressor class, optional) – Sklearn model object to for all successive operations (don’t pass any parameters).
pylift.methods.derivatives.custom_objective(dtrain, preds)

Module contents