NEXYAD


Pattern recognition
(HOW TO BUILD AN EFFICIENT PATTERN RECOGNITION APPLICATION ?)

Keywords : how to build an efficient pattern recognition application, tutorial, pattern recognition, vision, image processing, signal processing, data analysis, multisensors signature, signature, statistics, bayesian networks, decision trees, eigen vectors discrimination methods, artificial neural networks, fuzzy logic, fuzzy decision arrays, genetic classifiers, vector learning machines, expert systems, case based reasoning, rough data, preprocessing, descriptor, shape descriptor, fourier descriptor, descriptor extraction, classification, postprocessing, recognized pattern, qualitative variable with N modalities, unrecognized pattern, confidence, methodology, AGENDA, functional description of a pattern recognition system, factors of variation, filtering factors of variations, agenda graph, unvariant with, learning data base, tuning data base, test data base, design plans, eigen vectors, reverse engineering of pattern recognition systems
Written by Gerard YAHIAOUI and Pierre DA SILVA DIAS, main founders of the applied maths research company NEXYAD, © NEXYAD,
all rights reserved : for any question, please CONTACT
Reproduction of partial or complete content of this page is authorized ONLY if "source : NEXYAD  http://www.nexyad.com" is clearly mentionned.
This tutorial was written for students or engineers that wish to understand main hypothesis, ideas, and methodologies of pattern recognition applications.


Vocabulary
Pattern recognition is a field of application for many applied maths methods such as statistics, bayesian networks, decision trees, eigen vector based discrimination methods ("classical data analysis"), artificial neural networks, fuzzy decision arrays, genetic classifiers, vector learning machines, expert systems, case based reasoning, ...
It may also be applied to "shapes" that are defined in many ways :
 - images : example : optical characters recognition (OCR)
 - signals : example : speech recognition
 - data : example : answers to a questionnary
 - multisensors signatures : example : biometric applications
These several physical sources of data/signals/images make "preprocessings"  come from many applied maths techniques such as vision/image processing, signal processing, data analysis, ...


Because every applied maths domain has its own vocabulary, it is important that we choose a vocabulary for pattern recognition that will not be influenced by the techniques. We also have to give a clear definition of 'pattern recognition'. Even if some readers may disagree with our definitions, they will be usefull for our purpose : giving a very short description of main difficulties, methodologies, and tricks not to stay trapped in.

We will consider that a pattern recognition system can be described with the following regular synoptic :


Rough data / rough signals / rough images directly come from sensors or measurement systems.
Preprocessing elements are :
 - transforms (ex : Fourier Transform)
 - normalisations
 - denoising
 - ...
Descriptors extraction system builds a "signature" from preprocessed and/or rough data. The "signature" is a set of data that are supposed to contain useful and "easy to use" information for classification. Elements of the signature may be :
 - shape parameters (ex : Fourier descriptors, shape descriptors, ...),
 - a selected part of preprocessed and/or rough data,
 - scores (from scoring methods),
 - ...
Classification system takes a set of descriptors and/or rough data and evaluates which "class" it should belong to.
One also could say that the output of the Classification system is a qualitative variable with N modalities. Every modality of this variable is called a "class".
There are many Classification techniques such as :

 - classical statistics based methods,
 - bayesian networks,
 - decision trees,
 - eigen vectors based discrimination methods,
 - artificial neural networks,
 - fuzzy decision arrays,
 - genetic classifiers,
 - vector learning machines,
 - expert systems,
 - case based reasoning, ...

 - ...
Some classification techniques can tune automatically their parameters in order to optimize their performance on a sample of data (we will talk about "supervised learning", even if classical data analysis techniques, for instance, never use such a vocabulary).
Some classification techniques may use the knowledge of experts. We will talk about knowledge based systems.
Postprocessing system allows :
 - to estimate the confidence of the classification
 - to re evaluate the class taking into account the risk attached to misclassification
 - to add a class number (N+1) that says "unrecognized"

The most frequently asked question is "how to build an efficient pattern recognition system ?".
Because the "heart" of such a recognition pattern system is the "classification" system, one often considers that the first task is to choose a classification technique : "should i use neural networks ? Bayesian networks, ...".
This is the wrong way.
The first task is to choose a global architecture : if one uses heavy preprocessings and smart descriptors extractions, classification may become easier. On the opposite, classification from rough data may need to choose a powerful classification technique. The balance between preprocessings, descriptors extraction, and classification complexity is the first point to check. There doesn't exist a unique solution : many solutions may lead to acceptable results.
There is a methodology that helps to build efficient solutions and to evaluate their viability before the development : the AGENDA methodology (see  further and check references).



How to build an efficient pattern recognition application : the AGENDA methodology
The AGENDA methodology proposes to write a functional analysis of the pattern regognition system (before starting any development). From this functional analysis, one can choose technical solutions, linking every technical choice to a functional need. This "classical" way of working (classical for regular industrial projects ... but not for pattern recognition projects) allows :
 - to share the knowledge of several experts on the rough data
 - to choose functional options before any technical choice that could disable a functionality
 - to brainstorm on algorithms choice
 - to guarantee traceability of every technical choice
 - to re use solutions from a project to another if some functionalities are the same
 - to test separately every module
 - to help for maintenance and further evolutions development
 - to communicate quickly with other searchers and engineers (every choice has an explicit reason)


AGENDA proposes a special way of describing functionalities for a pattern recognition system. This way is general and can be applied to a big range of applications. The idea is to consider the known causes of variations of the input vector :
example : character recognition : Known causes of variations are :
 - small rotation of the character
 - translation
 - thickness of lines
 - style (font)
 - colour of printing
 - name of the character (1, a, 2, b, c, ...)
 - ...

This causes of variations generate together a combination of cases that leads to a very big number of different characters. That is why pattern recognition is a complicated task : only a few causes of the input vector variations should lead to a variation of the output (ex : name of the character), most causes of variations of the input vector shouldn't change the output (ex : thickness, rotation, translation, ...)
It means that the pattern recognition system can be seen as a "filter" on factors of variations.
This leads to a very simple way of describing the expected functionalities of a pattern recognition system : one should just choose if a factor is to be filtered or not : example :
 - small rotation of the character
 - translation

 - thickness of lines
 - style (font)
 - colour of printing
 - name of the character (1, a, 2, b, c, ...)
 - ...


means "usefull" factor of variation : "not to be filtered"
means "to be filtered"

And every
factor should be linked to a technical solution "invariant with this cause of variation".

Example :

Red links are a memory of the goal of every technical choice.

If one cannot find a solution for some factors of variations, then one still can use a "learning" classification system : example :


It becomes obvious, looking at such an "AGENDA graph", that there exist many solutions : many technical items may have the same "unvariant with" functionality; other sensors may lead to a simpler solution; ...

But this AGENDA Graph can also bring more information :
- every link has a cost : computing time, ...
- intuitively, one can understand that a graph with every link pointing to the classification learning system would need more examples to get tuned than the above graph (example : no preprocessing, no descriptors extraction, and rough data directly into a neural network ...). The idea is to use the design plans in order to build the data base  : orthogonal design plans (of the set of factors that point on the learning system) should be considered as a simple way of building this data base.This data base is needed for :
     - getting a learning data base (if classification is automatically tuned from data) or a tuning data base (if classification is tuned
       by hand),
     - getting a test data base : to evaluate performance of the classification system on examples that were not used to tune it.


Although there is a deterministic and simple way of building a relevant data base, one needs to cut this data base into 2 parts in order to get two sub-data bases : the learning and the test ones. This is made by using two fractional sub design plans.
Then, it is important to verify that the 2 data bases are representations of the same statistical phenomenon in the variables space (preprocessed data and descriptors) : mean values, standard deviations, and N first eigen vectors must be the same.


At this point, a classification technique is carefully picked. Criteria of choice are :
 - do i have an explicit expertise for classifying examples (is yes, knowledge based systems such as fuzzy decision arrays may be a good solution : they are usually tuned "by hand"; if not, learning based methods, automatically tuned from data,  such as neural networks or bayesian networks may be a good solution).
 - do i need an explicit explanation of the result (if yes, "black boxes" methods are forbidden ...),
 - do i have a software that lets me apply classification techniques (or can i buy one ? develop one ? ...),


There are often several classification techniques that fit into criteria and it is a good thing to test several approches in order to know which part of the result is a "method" effect.

Once the classification system is tuned, it is time to build the post processing system : th aim of this system is to interpret quantitative partial informations of the classification system in order to give to the final result a confidence. Lack of confidence should lead to a new output (never given by a classification system) : "unknown" or "unrecognized".

Methods for confidence estimation are different for every classification method, but main ideas are :
- choose a measure of confidence for the selected class,
- choose a measure of hesitation (comparing confidence for the selected class to confidence for the other classes)
- use a decision method from this measures.

Once the pattern recognition system is available, tests often show cases of misclassification. Analysis of those cases allows to upgrade AGENDA graphs : some factors of variations are added, new preprocessings/descriptors can be picked, and the data base can be upgraded.

This incremental way of building the system is a guaranty of maintenability : indeed, industrial applications often need evolutions among time (new functionalities, new classes, ...). The AGENDA graphs can be considered then as a memory of the engineering process, allowing a very quick reverse engineering for maintenance and upgrades.




For more questions or applications, please feel free to contact us