Name: MATLAB code for preprocess text image
SKU: P2018F017
Availability: InStock

Description

% This function takes in a camera image of a page with Thai text
% in a document format and process it to create a clean document format.
% The camera format may:
% – an RGB image
% – contain noise
% – regions that are not text (e.g. background that’s not on the page)
% – be rotated
% – have different lighting

% First convert the image into a grayscale image

% Use region labelling in 1D to find the number of characters
% and the horizontal locations of each character.

% threshold the image using locally adaptive thresholding

% invert the binary image so that the text becomes foreground

% Remove unwanted background that’s not text
% Do this by region labelling. Remove regions with sizes larger
% than a certain threshold (assume they are not text)
% Remove any labels with size smaller than a certain threshold
% (assume these are noise)

% threshold is +- standard deviation of the area

% Images are AND to remove unwanted artifacts

% rotate the image to the correct orientation
% Use Hough transform to find angle of rotation

% Only keep lines that are long enough to be considered
% More than half the length of the longest line.
% This removes any lines found that may correspond to small details of
% the character structure of the Thai language that produces
% weird/unwanted angles. (e.g. 45 degrees and -45 degrees appears often
% even with a perfectly aligned/rotated document).

% Find the mean, mode, and median of the angles for reference.

% Use the mode value for rotation (concluded from running script on
% many samples)
% The rotation angle must be modified to make sure it rotates
% correctly.

% find the areas where the sentences are and clean up noise
% First remove any regions that have an area larger than 1 std above
% the mean and with an extent of more than 1 std over the mean.

% Next find the bounding box for the text. Assuming the text is written
% in a document style with margins around the text box.
% Use an interpolation technique of the cumulation of number of pixels
% to find the edges of the bounding box and remove any noise outside
% the box.

% Then resize the image to the original size

% Do a final noise clean up and smoothing of the text by image erosion
% and dilation (morphological image processing). Open filter.

% Separate the sentences out (OPTIONAL: for noisy images, this is
% better used, if image not noisy then no need to do)

Image Retrieval

The ImageNet dataset and competition

https://stackoverflow.com/questions/28935983/preprocessing-image-for-tesseract-ocr-with-opencv

Reviews

There are no reviews yet.

Be the first to review “MATLAB code for preprocess text image”

You must be logged in to post a review.

MATLAB code for preprocess text image

Description

Reviews

Cart

Troubleshooting and online tutorials

Product Categories

Product tags

MATLAB code for preprocess text image

Description

Reviews

Related products

Improving The Imperialist Competitive Algorithm To Find Nash Equilibrium Points In Crisis Management Problem

MATLAB code of thesis (An Investigation Of Scattering And Absorptions Cross Sections Of Solar Cells Using Ag Nanoparticles)

MATLAB Code of thesis (Improved steganography algorithms in digital images using a single value decomposition)

MATLAB Code of Seeker Evolutionary Algorithm (SEA), a novel algorithm for solving continuous optimization problem

MATLAB code of Laboratory investigation of limestone breaks down by fiber laser

MATLAB Code of Data Fusion Strategies for Road Obstacle Detection

Adaptive Noise Cancellation algorithm MATLAB code

Adaline neural network MATLAB code

A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier

MATLAB code Edge detection of noisy images based on cellular neural networks

Memetic Algorithm MATLAB code

Fuzzy Type 2 MATLAB code

Cart

Troubleshooting and online tutorials

Product Categories

Product tags