Rexmex¶

Contents

Score Cards
Metric Sets
Ranking Metrics
Classification Metrics
Coverage Metrics
Rating Metrics
Utility
Synthetic Datasets

Score Cards ¶

class CoverageScoreCard(metric_set: Mapping[str, Callable[[numpy.array, numpy.array], float]], all_users: Collection[str], all_items: Collection[str])[source]¶

Coverage scorecard can be used to aggregate coverage-related metrics, plot those, and generate performance reports.

generate_report(recs_to_evaluate: pandas.core.frame.DataFrame, grouping: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]¶

A method to calculate (aggregated) coverage/performance metrics based on a dataframe of predictions. It assumes that the dataframe has the user and item keys in the dataframe.

Parameters

recs_to_evaluate (pd.DataFrame) – A dataframe holding the recommendations (users, items). Contains
user and item. (columns) –
grouping (list) – A list of performance grouping variable names (e.g., different recommender settings).

Returns

The performance report.

Return type

report (pd.DataFrame)

get_coverage_metrics(recommendations: List[Tuple]) → pandas.core.frame.DataFrame[source]¶

Gets all coverage (performance) values using the defined metric_set. It expects a list of tuples of user/item combinations, e.g., [(user_1, item_1), (user_2, item1),]. The space of possible users and items to recommend is defined during initalisation of this class.

Parameters

List[Tuple] (recommendations) – recommendations of items to users, made by the evaluated system.
user has to decide which score or confidence levels to use prior to calling this ScoreCard. (The) –

Returns

The coverage (performance) metrics calculated from the recommendations.

Return type

performance_metrics (pd.DataFrame)

metric_set: Mapping[str, Callable[[numpy.array, numpy.array], float]]¶

class ScoreCard(metric_set: Mapping[str, Callable[[numpy.array, numpy.array], float]])[source]¶

A scorecard can be used to aggregate metrics, plot those, and generate performance reports.

filter_scores(scores: pandas.core.frame.DataFrame, training_set: pandas.core.frame.DataFrame, testing_set: pandas.core.frame.DataFrame, validation_set: pandas.core.frame.DataFrame, columns: List[str]) → pandas.core.frame.DataFrame[source]¶

A method to filter out those entries which also appear in either the training, testing or validation sets. The original is here: <https://papers.nips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf>. :param scores: A dataframe with the scores. :type scores: pd.DataFrame :param training_set: A dataframe of training data points. :type training_set: pd.DataFrame :param testing_set: A dataframe of testing data points. :type testing_set: pd.DataFrame :param validation_set: A dataframe of validation data points. :type validation_set: pd.DataFrame :param columns: A list of column names used for cross-referencing. :type columns: list

Returns: The scores for data points which are not in the reference sets.
Return type: scores (pd.DataFrame)

generate_report(scores_to_evaluate: pandas.core.frame.DataFrame, grouping: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]¶

A method to calculate (aggregated) performance metrics based on a dataframe of ground truth and predictions. It assumes that the dataframe has the y_true and y_score keys in the dataframe.

Parameters

scores_to_evaluate (pd.DataFrame) – A dataframe with the scores and ground-truth - it has the y_true
y_score keys. (and) –
grouping (list) – A list of performance grouping variable names.

Returns

The performance report.

Return type

report (pd.DataFrame)

get_performance_metrics(y_true: numpy.array, y_score: numpy.array) → pandas.core.frame.DataFrame[source]¶

A method to get the performance metrics for a pair of vectors.

Parameters

y_true (np.array) – A vector of ground truth values.
y_score (np.array) – A vector of model predictions.

Returns

The performance metrics calculated from the vectors.

Return type

performance_metrics (pd.DataFrame)

metric_set: Mapping[str, Callable[[numpy.array, numpy.array], float]]¶

print_metrics()[source]¶: Printing the name of metrics.

Metric Sets ¶

class ClassificationMetricSet[source]¶: A set of classification metrics with the following metrics included:

Area Under the Receiver Operating Characteristic Curve

Area Under the Precision Recall Curve

Average Precision

F-1 Score

Matthew’s Correlation Coefficient

Fowlkes-Mallows Index

Precision

Recall

Specificity

Accuracy

Balanced Accuracy

class CoverageMetricSet[source]¶: A set of coverage metrics with the following metrics included: | Item Coverage | User Coverage

class MetricSet[source]¶

A metric set is a special dictionary that contains metric: name keys and evaluation metric function values.

add_metrics(metrics: List[Tuple])[source]¶

A method to add metric functions from a list of function names and functions.

Parameters: metrics (List[Tuple]) – A list of metric name and metric function tuples.
Returns: The metric set after the metrics were added.
Return type: self

filter_metrics(filter: Collection[str])[source]¶

A method to keep a list of metrics.

Parameters: filter – A list of metric names to keep.
Returns: The metric set after the metrics were filtered out.
Return type: self

print_metrics()[source]¶: Printing the name of metrics.

class RankingMetricSet[source]¶: A set of ranking metrics with the following metrics included:

class RatingMetricSet[source]¶

A set of rating metrics with the following metrics included:

Mean Absolute Error
Mean Squared Error
Root Mean Squared Error
Mean Absolute Percentage Error
Symmetric Mean Absolute Percentage Error
Coefficient of Determination
Pearson Correlation Coefficient

normalize_metrics()[source]¶

A method to normalize a set of metrics.

Returns: The metric set after the metrics were normalized.
Return type: self

Ranking Metrics ¶

average_precision_at_k(relevant_items: numpy.array, recommendation: numpy.array, k=10)[source]¶

Calculate the average precision at k (AP@K) of items in a ranked list.

Parameters

relevant_items (array-like) – An N x 1 array of relevant items.
recommendation (array-like) – An N x 1 array of ordered items.
k (int) – the number of items considered in the predicted list.

Returns

The average precision @ k of a predicted list.

Return type

AP@K (float)

Original

average_recall_at_k(relevant_items: List, recommendation: List, k: int = 10)[source]¶

Calculate the average recall at k (AR@K) of items in a ranked list.

Parameters

relevant_items (array-like) – An N x 1 array of relevant items.
recommendation (array-like) – An N x 1 array of items.
k (int) – the number of items considered in the predicted list.

Returns

The average precision @ k of a predicted list.

Return type

AR@K (float)

discounted_cumulative_gain(y_true: numpy.array, y_score: numpy.array)[source]¶

Computes the Discounted Cumulative Gain (DCG), a sum of the true scores ordered by the predicted scores, and then penalized by a logarithmic discount based on ordering.

Parameters

y_true (array-like) – An N x M array of ground truth values, where M > 1 for multilabel classification problems.
y_score (array-like) – An N x M array of predicted values, where M > 1 for multilabel classification problems..

Returns

Discounted Cumulative Gain

Return type

DCG (float)

gmean_rank(relevant_items: Sequence[rexmex.metrics.ranking.X], recommendation: Sequence[rexmex.metrics.ranking.X]) → float [source]¶

Calculate the geometric mean rank (GMR) of items in a ranked list.

Parameters

relevant_items – An N x 1 sequence of relevant items.
recommendation – An N x 1 sequence of ordered items.

Returns

The mean reciprocal rank of the relevant items in a predicted.

hits_at_k(relevant_items: numpy.array, recommendation: numpy.array, k=10)[source]¶

Calculate the number of hits of relevant items in a ranked list HITS@K.

Parameters

relevant_items (array-like) – An 1 x N array of relevant items.
recommendation (array-like) – An 1 x N array of predicted arrays
k (int) – the number of items considered in the predicted list

Returns

The number of relevant items in the first k items of a prediction.

Return type

HITS@K (float)

intra_list_similarity(recommendations: List[list], items_feature_matrix: numpy.array)[source]¶

Calculate the intra list similarity of recommended items. The items are represented by feature vectors, which compared with cosine similarity. The predicted consists of item indices, which are used to fetch the item features.

Parameters

recommendations (List[list]) – A M x N array of predicted, where M is the number of predicted and N the number of recommended items
items_feature_matrix (matrix-link) – A N x D matrix, where N is the number of items and D the number of features representing one item

Returns

Average intra list similarity across predicted

Return type

(float)

Original

kendall_tau(relevant_items: numpy.array, recommendation: numpy.array)[source]¶

Calculate the Kendall’s tau, measuring the correspondence between two lists.

Parameters

relevant_items (array-like) – An 1 x N array of items.
recommendation (array-like) – An 1 x N array of items.

Returns

The tau statistic. p-value (float): two-sided p-value for null hypothesis that there’s no association between the predicted.

Return type

Kendall tau (float)

mean_average_precision_at_k(relevant_items: List[list], recommendations: List[list], k: int = 10)[source]¶

Calculate the mean average precision at k (MAP@K) across predicted lists. Each prediction should be paired with a list of relevant items. First predicted list is evaluated against the first list of relevant items, and so on.

Example usage:

import numpy as np
from rexmex.metrics.predicted import mean_average_precision_at_k

mean_average_precision_at_k(
    relevant_items=np.array(
        [
            [1,2],
            [2,3]
        ]
    ),
    predicted=np.array([
        [3,2,1],
        [2,1,3]
    ])
)
>>> 0.708333...

Parameters

relevant_items (array-like) – An M x N array of relevant items.
recommendations (array-like) – An M x N array of recommendation lists.
k (int) – the number of items considered in the predicted list.

Returns

The mean average precision @ k across recommendations.

Return type

MAP@K (float)

mean_average_recall_at_k(relevant_items: List[list], recommendations: List[list], k: int = 10)[source]¶

Calculate the mean average recall at k (MAR@K) for a list of recommendations. Each recommendation should be paired with a list of relevant items. First recommendation list is evaluated against the first list of relevant items, and so on.

Parameters

relevant_items (array-like) – An M x R list where M is the number of recommendation lists, and R is the number of relevant items.
recommendations (array-like) – An M x N list where M is the number of recommendation lists and N is the number of recommended items.
k (int) – the number of items considered in the recommendation.

Returns

The mean average recall @ k across the recommendations.

Return type

MAR@K (float)

mean_rank(relevant_items: Sequence[rexmex.metrics.ranking.X], recommendation: Sequence[rexmex.metrics.ranking.X]) → float [source]¶

Calculate the arithmetic mean rank (MR) of items in a ranked list.

Parameters

relevant_items – An N x 1 sequence of relevant items.
recommendation – An N x 1 sequence of ordered items.

Returns

The mean rank of the relevant items in a predicted.

mean_reciprocal_rank(relevant_items: List, recommendation: List)[source]¶

Calculate the mean reciprocal rank (MRR) of items in a ranked list.

Parameters

relevant_items (array-like) – An N x 1 array of relevant items.
recommendation (array-like) – An N x 1 array of ordered items.

Returns

The mean reciprocal rank of the relevant items in a predicted.

Return type

MRR (float)

normalized_discounted_cumulative_gain(y_true: numpy.array, y_score: numpy.array)[source]¶

Computes the Normalized Discounted Cumulative Gain (NDCG), a sum of the true scores ordered by the predicted scores, and then penalized by a logarithmic discount based on ordering. The score is normalized between [0.0, 1.0]

Parameters

y_true (array-like) – An N x M array of ground truth values, where M > 1 for multilabel classification problems.
y_score (array-like) – An N x M array of predicted values, where M > 1 for multilabel classification problems..

Returns

Normalized Discounted Cumulative Gain

Return type

NDCG (float)

normalized_distance_based_performance_measure(relevant_items: List, recommendation: List)[source]¶

Calculates the Normalized Distance-based Performance Measure (NPDM) between two ordered lists. Two matching orderings return 0.0 while two unmatched orderings returns 1.0.

Parameters

relevant_items (List) – List of items
recommendation (List) – The predicted list of items

Returns

Normalized Distance-based Performance Measure

Return type

NDPM (float)

Metric Definition: Yao, Y. Y. “Measuring retrieval effectiveness based on user preference of documents.” Journal of the American Society for Information science 46.2 (1995): 133-145.

Definition from: Shani, Guy, and Asela Gunawardana. “Evaluating recommendation systems.” Recommender systems handbook. Springer, Boston, MA, 2011. 257-297

novelty(recommendations: List[list], item_popularities: dict, num_users: int, k: int = 10)[source]¶

Calculates the capacity of the recommender system to to generate novel and unexpected results.

Parameters

recommendations (List[list]) – A M x N array of items, where M is the number of predicted lists and N the number of recommended items
item_popularities (dict) – A dict mapping each item in the recommendations to a popularity value. Popular items have higher values.
num_users (int) – The number of users
k (int) – The number of items considered in each recommendation.

Returns

novelty

Return type

(float)

Metric Definition: Zhou, T., Kuscsik, Z., Liu, J. G., Medo, M., Wakeling, J. R., & Zhang, Y. C. (2010). Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10), 4511-4515.

Original

personalization(recommendations: List[list])[source]¶

Calculates personalization, a measure of similarity between recommendations. A high value indicates that the recommendations are disimillar, or “personalized”.

Parameters: recommendations (List[list]) – A M x N array of predicted items, where M is the number of predicted lists and N the number of items
Returns: personalization
Return type: (float)

Original

rank(relevant_item: rexmex.metrics.ranking.X, recommendation: Sequence[rexmex.metrics.ranking.X]) → float [source]¶

Calculate the rank of an item in a ranked list of items.

Parameters

relevant_item – a target item in the predicted list of items.
recommendation – An N x 1 sequence of predicted items.

Returns

The rank of the item.

reciprocal_rank(relevant_item: rexmex.metrics.ranking.X, recommendation: Sequence[rexmex.metrics.ranking.X]) → float [source]¶

Calculate the reciprocal rank (RR) of an item in a ranked list of items.

Parameters

relevant_item – a target item in the predicted list of items.
recommendation – An N x 1 sequence of predicted items.

Returns

The reciprocal rank of the item.

Return type

RR (float)

spearmans_rho(relevant_items: numpy.array, recommendation: numpy.array)[source]¶

Calculate the Spearman’s rank correlation coefficient (Spearman’s rho) between two lists.

Parameters

relevant_items (array-like) – An 1 x N array of items.
recommendation (array-like) – An 1 x N array of items.

Returns

Spearman’s rho. p-value (float): two-sided p-value for null hypothesis that both predicted are uncorrelated.

Return type

(float)

Classification Metrics ¶

accuracy_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the accuracy score for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of .

Return type

(float)

average_precision_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of average precision.

Return type

average_precision (float)

balanced_accuracy_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the balanced accuracy for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the balanced accuracy score.

Return type

balanced_accuracy (float)

condition_negative(y_true: numpy.array) → float [source]¶

Calculate the number of instances which are negative.

Parameters: y_true (array-like) – An N x 1 array of ground truth values.
Returns: The number of negative instances.
Return type: cn (float)

condition_positive(y_true: numpy.array) → float [source]¶

Calculate the number of instances which are positive.

Parameters: y_true (array-like) – An N x 1 array of ground truth values.
Returns: The number of positive instances.
Return type: cp (float)

critical_success_index(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the critical success index (duplicate of threat_score()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The critical success index value.

Return type

ts (float)

diagnostic_odds_ratio(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the diagnostic odds ratio.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The diagnostic odds ratio value.

Return type

dor (float)

f1_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the F-1 score for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the F-1 score.

Return type

f1 (float)

fall_out(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the fall out (duplicate of false_positive_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The fall out value.

Return type

fpr (float)

false_discovery_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the false discovery rate.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The false discovery rate value.

Return type

fdr (float)

false_negative(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the number of false negatives.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The number of false negatives.

Return type

fn (float)

false_negative_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the false negative rate (duplicated in miss_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The false negative rate value.

Return type

fnr (float)

false_omission_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the false omission rate.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The false omission rate value.

Return type

fomr (float)

false_positive(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the number of false positives.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The number of false positives.

Return type

fp (float)

false_positive_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the false positive rate (duplicated in false_positive_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The false positive rate value.

Return type

fpr (float)

fowlkes_mallows_index(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the Fowlkes-Mallows index.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The the Fowlkes-Mallows index value.

Return type

fm (float)

hit_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the hit rate (duplicate of true_positive_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The hit rate.

Return type

tpr (float)

informedness(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the informedness.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The informedness value.

Return type

bm (float)

markedness(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the markedness.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The markedness value.

Return type

mk (float)

matthews_correlation_coefficient(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate Matthew’s correlation coefficient for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of Matthew’s correlation coefficient.

Return type

mat_cor (float)

miss_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the miss rate (duplicate of false_negative_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The miss rate value.

Return type

fnr (float)

negative_likelihood_ratio(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the negative likelihood ratio.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The negative likelihood ratio value.

Return type

lr_minus (float)

negative_predictive_value(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the negative predictive value (duplicted in precision_score()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The negative predictive value.

Return type

npv (float)

positive_likelihood_ratio(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the positive likelihood ratio.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The positive likelihood ratio value.

Return type

(float)

positive_predictive_value(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the positive predictive value (duplicated in precision_score()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The positive predictive value.

Return type

ppv (float)

pr_auc_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the precision recall area under the curve (PR AUC) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the precision-recall area under the curve.

Return type

pr_auc (float)

precision_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the precision for a ground-truth prediction vector pair.

Duplicate of positive_predictive_value(), but with an alternate implementation using sklearn.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of precision.

Return type

precision (float)

prevalence_threshold(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the prevalence threshold score.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The prevalence threshold value.

Return type

pthr (float)

See also

recall_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the recall for a ground-truth prediction vector pair.

Duplicate of true_positive_rate(), but with alternate implementation from sklearn.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of recall.

Return type

recall (float)

Note

It’s surprising that the sklearn implementation of TPR needs to be binarized but the rexmex implementation does not

roc_auc_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the AUC for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the area under the curve.

Return type

auc (float)

selectivity(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the selectivity (duplicate of true_negative_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The selectivity score.

Return type

tnr (float)

sensitivity(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the sensitivity (duplicate of true_positive_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The sensitivity score.

Return type

tpr (float)

specificity(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the specificity (duplicate of true_negative_rate()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The specificity score.

Return type

tnr (float)

threat_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the threat score.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The threat score value.

Return type

ts (float)

true_negative(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the number of true negatives.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The number of true negatives.

Return type

tn (float)

true_negative_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the true negative rate (duplicated in specificity() and selectivity()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The true negative rate.

Return type

tnr (float)

true_positive(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the number of true positives.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The number of true positives.

Return type

tp (float)

true_positive_rate(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the true positive rate (duplicated in hit_rate(), sensitivity(), and recall_score()).

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The true positive rate.

Return type

tpr (float)

Coverage Metrics ¶

item_coverage(possible_users_items: Tuple[List[Union[int, str]], List[Union[int, str]]], recommendations: List[Tuple[Union[int, str], Union[int, str]]]) → float [source]¶

Calculates the coverage value for items in possible_users_items[1] given the collection of recommendations. Recommendations over users/items not in possible_users_items are discarded.

Parameters

possible_users_items (Tuple[List[Union[int, str]], List[Union[int, str]]]) – contains exactly TWO sub-lists,
one with users (first) –
with items (second) –
recommendations (List[Tuple[Union[int, str], Union[int, str]]]) – contains user-item recommendation tuples,
[ (e.g.) –

Returns: item coverage (float): a metric showing the fraction of items which got recommended at least once.

user_coverage(possible_users_items: Tuple[List[Union[int, str]], List[Union[int, str]]], recommendations: List[Tuple[Union[int, str], Union[int, str]]]) → float [source]¶

Calculates the coverage value for users in possible_users_items[0] given the collection of recommendations. Recommendations over users/items not in possible_users_items are discarded.

Parameters

possible_users_items (Tuple[List[Union[int, str]], List[Union[int, str]]]) – contains exactly TWO sub-lists,
one with users (first) –
with items (second) –
recommendations (List[Tuple[Union[int, str], Union[int, str]]]) – contains user-item recommendation tuples,
[ (e.g.) –

Returns: user coverage (float): a metric showing the fraction of users who got at least one recommendation out of all possible users.

Rating Metrics ¶

mean_absolute_error(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the mean absolute error (MAE) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The mean absolute error value.

Return type

mae (float)

mean_absolute_percentage_error(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the mean absolute percentage error (MAPE) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The mean absolute percentage error value.

Return type

mape (float)

mean_squared_error(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the mean squared error (MSE) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The mean squared error value.

Return type

mse (float)

pearson_correlation_coefficient(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the Pearson correlation coefficient for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the correlation coefficient.

Return type

rho (float)

r2_score(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the coefficient of determination (R^2) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The coefficient of determination value.

Return type

r2 (float)

root_mean_squared_error(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the root mean squared error (RMSE) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the root mean squared error.

Return type

rmse (float)

symmetric_mean_absolute_percentage_error(y_true: numpy.array, y_score: numpy.array) → float [source]¶

Calculate the symmetric mean absolute percentage error (SMAPE) for a ground-truth prediction vector pair.

Parameters

y_true (array-like) – An N x 1 array of ground truth values.
y_score (array-like) – An N x 1 array of predicted values.

Returns

The value of the symmetric mean absolute percentage error.

Return type

smape (float)

Utility ¶

class Annotator[source]¶

A class to wrap annotations that generates a registry.

annotate(*, lower: float, upper: float, higher_is_better: bool, link: str, description: str, name: Optional[str] = None, lower_inclusive: bool = True, upper_inclusive: bool = True, binarize: bool = False, duplicate_of: Optional[Callable[[numpy.array, numpy.array], float]] = None)[source]¶: Annotate a function.

duplicate(other, *, name: Optional[str] = None, binarize: Optional[bool] = None)[source]¶: Annotate a function as a duplicate.

Metric¶

A function that can be called on y_true, y_score and return a floating point result

alias of Callable[[numpy.array, numpy.array], float]

binarize(metric)[source]¶

Binarize the predictions for a ground-truth - prediction vector pair.

Parameters: metric (function) – The metric function which needs a binarization pre-processing step.
Returns: The function which wraps the metric and binarizes the probability scores.
Return type: metric_wrapper (function)

normalize(metric)[source]¶

Normalize the predictions for a ground-truth - prediction vector pair.

Parameters: metric (function) – The metric function which needs a normalization pre-processing step.
Returns: The function which wraps the metric and normalizes predictions.
Return type: metric_wrapper (function)

Synthetic Datasets ¶

class DatasetReader[source]¶

Class to read synthetic test datasets.

read_dataset(dataset: str = 'erdos_renyi_example')[source]¶

Method to read the dataset. :param dataset: Dataset of interest, one of:

("erdos_renyi_example"). Default is ‘erdos_renyi_example’.

Returns: The example dataset for testing the library.
Return type: data (pd.DataFrame)