Name: MATLAB code for preprocess text image
SKU: P2018F017
Availability: InStock

Description

% This function takes in a camera image of a page with Thai text
% in a document format and process it to create a clean document format.
% The camera format may:
% – an RGB image
% – contain noise
% – regions that are not text (e.g. background that’s not on the page)
% – be rotated
% – have different lighting

% First convert the image into a grayscale image

% Use region labelling in 1D to find the number of characters
% and the horizontal locations of each character.

% threshold the image using locally adaptive thresholding

% invert the binary image so that the text becomes foreground

% Remove unwanted background that’s not text
% Do this by region labelling. Remove regions with sizes larger
% than a certain threshold (assume they are not text)
% Remove any labels with size smaller than a certain threshold
% (assume these are noise)

% threshold is +- standard deviation of the area

% Images are AND to remove unwanted artifacts

% rotate the image to the correct orientation
% Use Hough transform to find angle of rotation

% Only keep lines that are long enough to be considered
% More than half the length of the longest line.
% This removes any lines found that may correspond to small details of
% the character structure of the Thai language that produces
% weird/unwanted angles. (e.g. 45 degrees and -45 degrees appears often
% even with a perfectly aligned/rotated document).

% Find the mean, mode, and median of the angles for reference.

% Use the mode value for rotation (concluded from running script on
% many samples)
% The rotation angle must be modified to make sure it rotates
% correctly.

% find the areas where the sentences are and clean up noise
% First remove any regions that have an area larger than 1 std above
% the mean and with an extent of more than 1 std over the mean.

% Next find the bounding box for the text. Assuming the text is written
% in a document style with margins around the text box.
% Use an interpolation technique of the cumulation of number of pixels
% to find the edges of the bounding box and remove any noise outside
% the box.

% Then resize the image to the original size

% Do a final noise clean up and smoothing of the text by image erosion
% and dilation (morphological image processing). Open filter.

% Separate the sentences out (OPTIONAL: for noisy images, this is
% better used, if image not noisy then no need to do)

Image Retrieval

The ImageNet dataset and competition

https://stackoverflow.com/questions/28935983/preprocessing-image-for-tesseract-ocr-with-opencv

Reviews

There are no reviews yet.

Be the first to review “MATLAB code for preprocess text image”

You must be logged in to post a review.

MATLAB code for preprocess text image

Description

Reviews

Cart

Troubleshooting and online tutorials

Product Categories

Product tags

MATLAB code for preprocess text image

Description

Reviews

Related products

MATLAB Code of thesis (Improved steganography algorithms in digital images using a single value decomposition)

VHDL Code for Design and implementation of a reconfiguration microprocessor

MATLAB code of Share Price Forecasting Through Data Mining With Combinatory Evolutionary Algorithms

MATLAB code of Recurrent Neural Network for estimation a parameters in sEMG signal

Fuzzy Type 2 MATLAB code

Classification of MNIST database (MATLAB Code)

MATLAB Code of thesis (Investigate the use of machine vision technology in registry entry and exit of goods)

Support Vector Machine for intrusion detection

Assembly and C Code of thesis (Real Time Implementation of G.728 Speech Codec using TMS320C5402)

Artificial Immune System MATLAB code for download

MATLAB Code of Seeker Evolutionary Algorithm (SEA), a novel algorithm for solving continuous optimization problem

MATLAB code for improved fuzzy genetic algorithm

Cart

Troubleshooting and online tutorials

Product Categories

Product tags