rougeEvaluationScore
Syntax
Description
The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm evaluates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models.
returns the ROUGE score between the specified candidate document and the reference
documents. The function, by default, computes unigram overlaps between
score
= rougeEvaluationScore(candidate
,references
)candidate
and references
. This is also known as
the ROUGE-N metric with n-gram length 1. For more information, see ROUGE Score.
specifies additional options using one or more name-value pairs.score
= rougeEvaluationScore(candidate
,references
,Name,Value
)
Examples
Evaluate Similarity
Specify the candidate document as a tokenizedDocument
object.
str = "the fast brown fox jumped over the lazy dog";
candidate = tokenizedDocument(str)
candidate = tokenizedDocument: 9 tokens: the fast brown fox jumped over the lazy dog
Specify the reference documents as a tokenizedDocument
array.
str = [ "the quick brown animal jumped over the lazy dog" "the quick brown fox jumped over the lazy dog"]; references = tokenizedDocument(str)
references = 2x1 tokenizedDocument: 9 tokens: the quick brown animal jumped over the lazy dog 9 tokens: the quick brown fox jumped over the lazy dog
Calculate the ROUGE score between the candidate document and the reference documents.
score = rougeEvaluationScore(candidate,references)
score = 0.8889
Specify N-Gram Lengths
Specify the candidate document as a tokenizedDocument
object.
str = "a simple summary document containing some words";
candidate = tokenizedDocument(str)
candidate = tokenizedDocument: 7 tokens: a simple summary document containing some words
Specify the reference documents as a tokenizedDocument
array.
str = [ "a simple document" "another document with some words"]; references = tokenizedDocument(str)
references = 2x1 tokenizedDocument: 3 tokens: a simple document 5 tokens: another document with some words
Calculate the ROUGE score between the candidate document and the reference documents using the default options.
score = rougeEvaluationScore(candidate,references)
score = 1
The rougeEvaluationScore
function, by default, compares unigram (single-token) overlaps between the candidate document and the reference documents. Because the ROUGE score is a recall-based measure, if one of the reference documents is made up entirely of unigrams that appear in the candidate document, the resulting ROUGE score is one. In this scenario, the output of the rougeEvaluationScore
function is uninformative.
For a more meaningful result, calculate the ROUGE score again using bigrams by setting the 'NgramLength'
option to 2
. The resulting score is less than one, since every reference document contains bigrams that do not appear in the candidate document.
score = rougeEvaluationScore(candidate,references,'NgramLength',2)
score = 0.5000
Input Arguments
candidate
— Candidate document
tokenizedDocument
scalar | string array | cell array of character vectors
Candidate document, specified as a tokenizedDocument
scalar, a string array,
or a cell array of character vectors. If
candidate
is not a
tokenizedDocument
scalar, then it
must be a row vector representing a single document, where each
element is a word.
references
— Reference documents
tokenizedDocument
array | string array | cell array of character vectors
Reference documents, specified as a tokenizedDocument
array, a string array,
or a cell array of character vectors. If references
is not a
tokenizedDocument
array, then it must be a row vector representing
a single document, where each element is a word. To evaluate against multiple reference
documents, use a tokenizedDocument
array.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: scores =
rougeEvaluationScore(candidate,references,'ROUGEMethod','weighted-subsequences')
specifies to use the weighted subsequences ROUGE method.
ROUGEMethod
— ROUGE method
'n-grams'
(default) | 'longest-common-subsequences'
| 'weighted-subsequences'
| 'skip-bigrams'
| 'skip-bigrams-and-unigrams'
ROUGE method, specified as the comma-separated pair consisting of
'ROUGEMethod'
and one of the following:
'n-grams'
– Evaluate the ROUGE score using n-gram overlaps between the candidate document and the reference documents. This is also known as the ROUGE-N metric.'longest-common-subsequences'
– Evaluate the ROUGE score using Longest Common Subsequence (LCS) statistics. This is also known as the ROUGE-L metric.'weighted-subsequences'
– Evaluate the ROUGE score using weighted longest common subsequence statistics. This method favors consecutive LCSs. This is also known as the ROUGE-W metric.'skip-bigrams'
– Evaluate the ROUGE score using skip-bigram (any pair of words in sentence order) co-occurrence statistics. This is also known as the ROUGE-S metric.'skip-bigrams-and-unigrams'
– Evaluate the ROUGE score using skip-bigram and unigram co-occurrence statistics. This is also known as the ROUGE-SU metric.
NgramLength
— N-gram length
1 (default) | positive integer
N-gram length used for the 'n-grams'
ROUGE method (ROUGE-N),
specified as the comma-separated pair consisting of 'NgramLength'
and a positive integer.
If the 'ROUGEMethod'
option is not
'n-grams'
, then the 'NgramLength'
option has no
effect.
Tip
If the longest document in references
has fewer than
NgramLength
words, then the resulting ROUGE score is
NaN
. If candidate
has fewer than
NgramLength
words, then the resulting ROUGE score is zero. To ensure
that rougeEvaluationScore
returns nonzero scores for very short
documents, set NgramLength
to a positive integer smaller than the length
of candidate
and the length of the longest document in
references
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
SkipDistance
— Skip distance
4 (default) | positive integer
Skip distance used for the 'skip-bigrams'
and
'skip-bigrams-and-unigrams'
ROUGE methods (ROUGE-S and ROUGE-SU),
specified as the comma-separated pair consisting of 'SkipDistance'
and a positive integer.
If the 'ROUGEMethod'
option is not
'skip-bigrams'
or 'skip-bigrams-and-unigrams'
,
then the 'SkipDistance'
option has no effect.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Output Arguments
score
— ROUGE score
scalar
ROUGE score, returned as a scalar value in the range [0,1] or
NaN
.
A ROUGE score close to zero indicates poor similarity between
candidate
and references
. A ROUGE score
close to one indicates strong similarity between candidate
and
references
. If candidate
is identical to one
of the reference documents, then score
is 1. If
candidate
and references
are both empty
documents, then the resulting ROUGE score is NaN
.
Tip
If the longest document in references
has fewer than
NgramLength
words, then the resulting ROUGE score is
NaN
. If candidate
has fewer than
NgramLength
words, then the resulting ROUGE score is zero. To ensure
that rougeEvaluationScore
returns nonzero scores for very short
documents, set NgramLength
to a positive integer smaller than the length
of candidate
and the length of the longest document in
references
.
Algorithms
ROUGE Score
The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm [1] calculates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models.
Given an n-gram length n, the ROUGE-N metric between a candidate document and a single reference document is given by
where the elements ri are sentences in the reference document, is the number of times the specified n-gram occurs in the candidate document and numNgrams(ri) is the number of n-grams in the specified reference sentence ri.
For sets of multiple reference documents, the ROUGE-N metric is given by
To use the ROUGE-N metric, set the 'ROUGEMethod'
option to
'n-grams'
.
Given a sentence and a sentence s, where the elements si correspond to words, the subsequence is a common subsequence of d and s if for and , where the elements of s are the words of the sentence and k is the length of the subsequence. The subsequence is a longest common subsequence (LCS) if the subsequence length k is maximal.
Given a candidate document and a single reference document the union of the longest common subsequences is given by
where is the set of longest common subsequences in the candidate document and the sentence ri from a reference document.
The ROUGE-L metric is an F-score measure. To calculate it, first calculate the recall and precision scores given by
Then, the ROUGE-L metric between a candidate document and a single reference document is given by the F-score measure
where the parameter controls the relative importance of the precision and recall. Because the ROUGE score favors recall, is typically set to a high value.
For sets of multiple reference documents, the ROUGE-L metric is given by
To use the ROUGE-L metric, set the 'ROUGEMethod'
option to
'longest-common-subsequences'
.
Given a weighting function f such that f has the property f(x+y)>f(x)+f(y) for any positive integers x and y, define to be the length of the longest consecutive matches encountered in the candidate document and a single reference document scored by the weighting function f. For more information about calculating this value, see [1].
The ROUGE-W is metric given an F-score measure which requires the recall and precision scores given by
The ROUGE-W metric between a candidate document and a single reference document is given by the F-score measure
where the parameter controls the relative importance of the precision and recall. Because the ROUGE score favors recall, is typically set to a high value.
For multiple reference documents, the ROUGE-W metric is given by
To use the ROUGE-W metric, set the 'ROUGEMethod'
option to
'weighted-longest-common-subsequences'
.
A skip-bigram is an ordered pair of words in a sentence allowing for arbitrary gaps between them. That is, given a sentence from a candidate document, where the elements cij correspond to the words in the sentence, the pair of words is a skip-bigram if.
The ROUGE-S metric is an F-score measure. To calculate it, first calculate the recall and precision scores given by
where the elements ri and ci are sentences in the reference document and candidate document, respectively, is the number of times the specified skip-bigram occurs in the candidate document, and numSkipBigrams(s) is the number of skip-bigrams in the sentence s.
Then, the ROUGE-S metric between a candidate document and a single reference document is given by the F-score measure
For sets of multiple reference documents, the ROUGE-S metric is given by
To use the ROUGE-S metric, set the 'ROUGEMethod'
option to
'skip-bigrams'
.
To also include unigram co-occurrence statistics in the ROUGE-S metric, introduce unigram counts into the recall and precision scores for ROUGE-S. This is equivalent to including start tokens in the candidate and reference documents, since
where Count(unigram,candidate) is the number of times the specified unigram appears in the candidate document, and and denote the reference sentence and the candidate document augmented with start tokens, respectively.
For sets of multiple reference documents, the ROUGE-SU metric is given by
where is the reference document with sentences augmented with start tokens.
To use the ROUGE-SU metric, set the 'ROUGEMethod'
option to
'skip-bigrams-and-unigrams'
.
References
[1] Lin, Chin-Yew. "Rouge: A package for automatic evaluation of summaries." In Text Summarization Branches Out, pp. 74-81. 2004.
Version History
Introduced in R2020a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)