Main Content

ocrTrainingData

Create training data for OCR from ground truth

Since R2023a

    Description

    [imds,boxds,txtds] = ocrTrainingData(gTruth,labelName,attributeName) creates datastores for loading images, bounding boxes, and image text from ground truth.

    ocrTrainingData creates training data that you can use to train and evaluate an optical character recognition (OCR) model from ground truth data. Use the trainOCR function to train an OCR model and the evaluateOCR function to evaluate the model.

    example

    Examples

    collapse all

    This example shows how to analyze an OCR ground truth data to identify its character set and to understand the class distribution.

    Load the ground truth data and then extract text labels.

    ld = load("14SegmentGtruth.mat");
    gTruth = ld.gTruth;
    [~,~,txtds] = ocrTrainingData(gTruth,"Text","Word");

    Read all ground truth text corresponding to each image and combine them.

    allImagesText = txtds.readall;
    allText = strjoin([allImagesText{:}], "");

    Find the unique set of characters in the ground truth text.

    [characterSet, ~, idx] = unique(char(allText));

    Display the ground truth character set.

    disp("Ground Character Set: " + string(characterSet))
    Ground Character Set: +,-./3ABCDEFGHIJKLMNOPQRSTUVWXYZ
    

    The ground truth data contains all the 26 alphabets of English language in capital case, the digit 3 and five special characters: +,-./.

    To understand the class distribution, count the character occurences and tabulate the character count.

    characterSet = cellstr(characterSet');
    characterCount = accumarray(idx,1);
    characterCountTbl = table(characterSet, characterCount, ...
        VariableNames=["Character", "CharacterCount"]);
    characterCountTbl = sortrows(characterCountTbl, ...
        "CharacterCount", "descend");

    Visualize the character count with a word cloud chart.

    wordcloud(characterCountTbl, "Character", "CharacterCount")

    Figure contains an object of type wordcloud. The chart of type wordcloud has title CharacterCount.

    The characters O, E, T, N and A have the highest character count and the characters -, +, /, . , 3 have the least character count.

    Visualize the class distribution with a bar graph.

    figure
    numCharacters = numel(characterSet);
    bar(1:numCharacters, characterCountTbl.CharacterCount)
    xticks(1:numCharacters)
    xticklabels(characterCountTbl.Character)
    xlabel("Character")
    ylabel("Number of samples")

    Figure contains an axes object. The axes object with xlabel Character, ylabel Number of samples contains an object of type bar.

    This example shows preparing data to train an OCR model that can recognize fourteen-segment characters.

    The training data contains word samples of fourteen-segment characters from a page of text. Read the training image and display it.

    I = imread("CVT-DSEG14.jpg");
    imshow(I)

    Figure contains an axes object. The hidden axes object contains an object of type image.

    This image was annotated with bounding boxes containing words and text labels were added to these bounding boxes as an attribute using the Image Labeler. To learn more about labeling images for OCR training, see Train Custom OCR Model. The labels were exported from the app as groundTruth object and saved in 14SegmentGtruth.mat file.

    ld = load("14SegmentGtruth.mat");
    gTruth = ld.gTruth;

    Create datastores that contain images, bounding boxes and text labels from the groundTruth object using the ocrTrainingData function with the label and attribute names used during labeling.

    labelName = "Text";
    attributeName = "Word";
    [imds,boxds,txtds] = ocrTrainingData(gTruth,labelName,attributeName);

    Combine the datastores.

    cds = combine(imds,boxds,txtds);

    The combined datastore can be used for training an OCR model using the trainOCR function.

    Input Arguments

    collapse all

    Ground truth data, specified as a groundTruth object or an M-by-1 array of groundTruth objects exported from the Image Labeler app.

    Name of the rectangular ROI label used for labeling ground truth, specified as a string scalar or character vector. You must use the Rectangle label type for OCR ground truth labeling.

    Use the Image Labeler app to label ground truth data. After loading your images into the app, select Label from the toolbar, then select Rectangle. A dialog box appears that provides the field for entering the label name.

    Attribute name that corresponds to the label name, specified as a string scalar or character vector. The attribute identifies what the OCR detects in the specified ROI labelName. For example, word. To name an attribute in Image Labeler, after creating the ROI, select Attribute from the toolbar. A dialog box appears that provides the field for entering the attribute name.

    Output Arguments

    collapse all

    Image datastore, returned as an imageDatastore object that contains images extracted from specified groundTruth object or objects.

    Bounding box label datastore associated with the ground truth images, returned as an arrayDatastore object.

    Text label datastore that corresponds to the attribute name input, returned as an arrayDatastore object.

    Version History

    Introduced in R2023a