pylift package

Submodules

pylift.eval module

class pylift.eval.UpliftEval(treatment, outcome, prediction, p='infer', n_bins=20)

Bases: object

Calculate qini and uplift curves given some input data.

Requires three input vectors: treatment, outcome, prediction. Generates based qini and uplift curves where populations are ranked by prediction, and uplift is calculated over treatment and outcome.

This class can be used independently of the methods in this package, i.e. if you want to evaluate the performance of an externally generated model.

NOTE: the maximal Qini curves do not currently work with continuous outcomes.

Parameters:
  • treatment (array-like) – Array of 1s and 0s indicating whether a treatment was served.
  • outcome (array-like) – Arrays of nonzero values and zeros indicating whether a response occurred.
  • prediction (array-like) – Predicted value used to rank.
  • p (float, None, or array-like) – The treatment policy, P(treatment==1). Can be a float if uniform across all individuals; or an array if individual-dependent.
calc(plot_type, n_bins=20)

Calculate the different curve types.

Parameters:
  • plot_type (string) – Type of curve to calculate. Options: qini, aqini, cgains, cuplift, balance, uplift.
  • n_bins (int, optional) – Number of bins to use.
Returns:

  • percentile (list) – The percentile of the population, calculated from the bins.
  • qini_y (list) – The qini value for each of the percentile points.

plot(plot_type='cgains', ax=None, show_theoretical_max=False, show_practical_max=False, show_random_selection=True, show_no_dogs=False, **kwargs)

Plots the different kinds of percentage-targeted curves.

Parameters:
  • plot_type (string, optional) – Either ‘qini’, ‘aqini’, ‘uplift’, ‘cuplift’, or ‘balance’. ‘aqini’ refers to an adjusted qini plot, ‘cuplift’ gives a cumulative uplift plot. ‘balance’ gives the test-control balance for each of the bins. All others are self-explanatory.
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – A matplotlib axis referencing where to plot.
  • show_theoretical_max (boolean, optional) – Toggle theoretical maximal qini curve, if overfitting to treatment/control. Only works for Qini-style curves.
  • show_practical_max (boolean, optional) – Toggle theoretical maximal qini curve, if not overfitting to treatment/control. Only works for Qini-style curves.
  • show_no_dogs (boolean, optional) – Toggle theoretical maximal qini curve, if you believe there are no sleeping dogs. Only works for Qini-style curves.
  • show_random_selection (boolean, optional) – Toggle straight line indicating a random ordering. Only works for Qini-style curves.
pylift.eval.get_scores(treatment, outcome, prediction, p, scoring_range=(0, 1), plot_type='all')

Calculate AUC scoring metrics.

Parameters:
  • treatment (array-like)
  • outcome (array-like)
  • prediction (array-like)
  • p (array-like) – Treatment policy (probability of treatment for each row).
  • scoring_range (2-tuple) – Fractional range over which frost score is calculated. First element must be less than second, and both must be less than 1.
Returns:

scores – A dictionary containing the following values. Each is also appended with _cgains and _aqini for the corresponding values for the cumulative gains curve and adjusted qini curve, respectively.

q1: Traditional Q score normalized by the theoretical maximal qini. Note the theoretical max here goes up with a slope of 2.

q2: Traditional Q score normalized by the practical maximal qini. This curve increases with a slope of 1.

Q: Area between qini curve and random selection line. This is named after the notation in Radcliffe & Surry 2011, but note that they normalize their curves differently.

Q_max: Maximal possible qini score, which is used for normalization of qini to get frost score. Only obtainable by overfitting.

Q_practical_max: Practical maximal qini score, if you are not overfitting. This assumes that all (outcome, treatment) = (1,1) were persuadables, but that there are also an equal number of persuadables in the control group. This is the best possible scenario, but likely assumes too few “sure things”.

overall_lift: The lift expected from random application of treatment.

Return type:

dict

pylift.generate_data module

pylift.generate_data.dgp(N=1000, n_features=3, beta=[1,-2,3,-0.8], error_std=0.5, tau=3, discrete_outcome=False)

Generates random data with a ground truth data generating process.

Draws random values for features from [0, 1), errors from a 0-centered distribution with std error_std, and creates an outcome y.

Parameters:
  • N (int, optional) – Number of observations.
  • n_features (int, optional) – Number of features.
  • beta (np.array, optional) – Array of beta coefficients to multiply by X to get y.
  • error_std (float, optional) – Standard deviation (scale) of distribution from which errors are drawn.
  • tau (float, optional) – Effect of treatment.
  • tau_std (float, optional) – When not None, draws tau from a normal distribution centered around tau with standard deviation tau_std rather than just using a constant value of tau.
  • discrete_outcome (boolean, optional) – If True, outcomes are 0 or 1; otherwise continuous.
  • seed (int, optional) – Random seed fed to np.random.seed to allow for deterministic behavior.
  • Output
  • ——
  • df (pd.DataFrame) – A DataFrame containing the generated data.
pylift.generate_data.sim_pte(N=1000, p=20, rho=0, sigma=np.sqrt(2), beta_den=4)

Numerical simulation for treatment effect heterogeneity estimation as described in Tian et al. (2012) Translated from the R uplift package (Leo Guelman <leo.guelman@gmail.com>).

Parameters:
  • N (int, optional) – Number of observations.
  • n_features (int, optional) – Number of features.
  • beta (np.array, optional) – Array of beta coefficients to multiply by X to get y.
  • rho (covariance matrix between predictors.)
  • sigma (multiplier of error term.)
  • beta_den (size of main effects relative to interaction effects.)
  • discrete_outcome (boolean, optional) – If True, outcomes are 0 or 1; otherwise continuous.
  • seed (int, optional) – Random seed fed to np.random.seed to allow for deterministic behavior.
  • Output
  • ——
  • A data frame including the response variable (Y), the treatment (treat=1)
  • and control (treat=-1) assignment, the predictor variables (X) and the “true”
  • treatment effect score (ts).

pylift.style module

Module contents