2.1. Statistical Pattern Recognition, Knowledge Discovery, Data Mining and Bioinformatics.
1. A.K.C. Wong and T.S. Liu, "Typicality, diversity, and feature pattern of an ensemble", IEEE Trans. on Computers, 24(2), pp.158-181,1975.
(C:68) (C: means the citation number; number(s) in square brackets refer to paper in this contribution list).
An important early work to use information measures to reveal the typicality, diversity, attribute interdependency, normalized sum of interdependency, class characteristics of ensembles of categorical data (later mixed-mode data).
It greatly impacts on Dr. Wong's later work in: data discretization ; class characteristics and statistical knowledge discovery [2, 9];
and attribute/pattern clustering [5,7] in discrete data. When recently applied to proteomic data , it effectively reveals site/residue
conservations and variations and unknown class characteristics - resolving mislabeled and imbalanced class problems in unsupervised settings .
2. A.K.C. Wong and D. K.Y. Chiu, "Synthesizing statistical knowledge from incomplete mixed-mode data", IEEE Trans. on Pattern Analysis and Machine Intelligence, (6) , pp.796-805, 1987 (C:164).
The difficulties in analyzing, synthesizing and clustering multivariate data of the mixed type (discrete and continuous) in the 80's were largely due to:
1) non-uniform scaling in different coordinates, 2) lack of order in nominal data, and
3) lack of a suitable similarity measure.
This paper presents a new approach to overcome these difficulties by acquiring statistical knowledge from incomplete mixed-mode data.
It introduces an event-covering approach to cover subsets of statistically relevant outcomes for subsequent probabilistic inference, cluster analysis, and pattern detection.
It later motivates Wong's work in patterns discovery [4-8] and attribute and pattern clustering [5,7,9] in revealing localized probabilistic knowledge in relational datasets.
3. J.Y. Ching, A.K.C Wong, K.C.C Chan, "Class-dependent discretization for inductive learning from continuous and mixed-mode data", IEEE Trans. on Pattern Analysis and Machine Intelligence, 17 (7), 641-651, 1995 (C: 225).
This paper has great impact in machine learning since it solves the classification problem for mixed mode data.
Motivated by this work, a more effective and rigorous algorithm: "A global optimal algorithm for class-dependent discretization of continuous data" was later developed in 2004 (C:41).
It was later extended to unsupervised mixed-mode data analysis by using the mode (the most interdependent attribute in an attribute clusters) to
drive the discretization of continuous data, a crucial step in tackling big data.
4. A.K.C Wong, and Y. Wang, ¡¥High-order pattern discovery from discrete-valued data", IEEE Transactions on Knowledge and Data Engineering, 9(6), pp. 877-893, 1997. (C:78) (with related patent (C: 97)).
The paper first uses statistical significance and minimal support to discover high order association patterns, wherever they are,
in relational datasets for knowledge representations and classification. Patterns are then clustered  and pruned 
(reducing redundant patterns (up to 90%) yet still covering 90% of the data. With unsupervised fuzzy patterns discovery,
over 50% pattern loss between attribute clusters could be recovered. It later [5,7] furnishes a scalable approach to discover
and organize local and correlated patterns from large datasets without relying on prior knowledge and with minimal
information loss [5, 7, 8, 9].
5. W.H. Au, K.C.C. Chan, A.K.C. Wong and Y. Wang, "Attribute clustering for grouping, selection, and classification of gene expression data", IEEE/ACM Trans on Computational Biology and Bioinformatics, Vol 2, No2, pp 83-101, 2005 (C:134).
By replacing samples by attributes, distance measures by normalized mutual information, this research turns the k-means algorithm
for clustering samples into a k-mode algorithm to cluster attributes. By treating each gene as an attribute and its discretized expression
level as an outcome, much better gene clustering and classification results are obtained. It breaks up the attributes into independent
(orthogonal) and intra-dependent (correlated) groups, selects significant features from each group, unbiased by strong groups.
Together with unsupervised fuzzy pattern discovery, it plays an important role to handle big data.
6. Y. M. Sun, Mohamed S. Kamel, A.K.C. Wong, and Y. Wang, "Cost-sensitive boosting for classification of imbalanced data", Journal of Patten Recognition, Vol 40, Issue 12, pp 3356-3378, 2007 (C: 421).
This is a significant milestone in pattern recognition and machine learning.
It is a meta-technique to deal with imbalanced data so as to improve classification accuracy.
By introducing cost-items into AdaBoost, it treats samples of different classes equally to tally with a
stage-wise additive modeling in statistics to minimize the cost of exponential loss.
It uses weighting strategies for different types of samples in identifying rare cases through experiments.
Since it applies to a wide class of classifiers and diverse imbalance class problems, it is widely used in supervised learning.
7. A.K.C. Wong and G.C.L. Li, "Simultaneous pattern and data clustering for pattern cluster analysis", IEEE Trans. on Knowledge and Data Engineering, 20(7), pp. 911-923, 2008 (C: 31).
This is the first formal mathematical analysis of complex relationship among patterns via dual pattern and data spaces.
It clusters patterns and their associated data simultaneously, making the relation between patterns and data explicit
and allowing users to build their knowledge on a statistical basis as well as knowing where the pattern groups reside
- a most effective way to explore knowledge in large datasets. Experiments on real data demonstrated its usefulness.
It introduces a new paradigm that localizes pattern clusters in unsupervised learning to establish data sub-spaces and
pattern sub-spaces as effectively used in .
8. A.K.C. Wong, D. Zhuang, G.C. Li, and E.S. Lee, "Discovery of delta closed patterns and noninduced patterns from sequences", IEEE Trans. on Knowledge and Data Engineering, 24(8), 1408-1421, 2012 (C: 12).
This paper overcomes a key concern of pattern discovery for generating more patterns that it can handle.
It prunes redundant sub-patterns (delta-closed) and super-patterns (statistically non-induced)
using suffix tree to render an extraordinary effective algorithm (linear to input data size)
to obtain a smaller set of statistically ranked patterns, making the identification and use of
the discovered patterns feasible and Aligned Pattern Clusters manageable .
It leads to important development of gap pattern discovery.
9. A.K.C. Wong and E.S.A. Lee, "Aligning and clustering patterns to reveal the protein functionality of sequences", IEEE Trans. on Computational Biology and Bioinformatics 11, 1-13, 2014.
This paper presents a new computationally efficient method in discovering conserved non-redundant sequence patterns
allowing variations in local and distant correlated regions to render a compact
representation called Aligned Pattern Clusters (APCs) and Co-occurring APCs without relying on prior knowledge.
When applied to protein families, it identifies the binding segments and residues and discovers all binding sites
in the APCs with superior compactness (entropy) and data coverage when compared with other methods.
The identification, ranking, and association of patterns with variations help biologists avert time-consuming
simulations and experimentations in studying bio-molecular bindings and interactions.
2.2. Image Analysis, Structural Pattern Recognition, Computer Vision and Intelligent Robotics.
10. J.N. Kapur, P.K. Sahoo and A.K.C. Wong, "A new method for gray-level picture thresholding using the entropy of the histogram" , Computer Graphics and Image Processing, pp 273-285, 1985 (C: 2134).
By introducing entropy maximization for image segmentation, we produced a paper with global impact even up to today.
It was accompanied by a significant survey paper with equally high citation (C: 2650).
In the 80's, most of the commonly used methods in extracting objects from an image are known as "thresholding".
They assume that most of the gray-level histograms are bimodal when an object is clearly distinguishable from the background.
Thus, the threshold for segmentation can be chosen at the bottom of the valley. However, gray-level histograms are not always bimodal.
Entropy maximization proves as an effective, simple and useful approach to tackle the general cases.
It hence has global impact ever since. In fact, the citations are still rising.
11. H.C. Shen and A.K.C. Wong, "Generalized texture representation and metric", Computer vision, graphics, and image processing, 3 (2), 187-206, 1983 (C:74).
It was a new approach in the 80's devised to reflect various perceptual aspects of image texture in line and circular event frequency diagrams
(gray level, gradients and directionality events) at different resolution levels. Based on the spatial properties of the event frequency diagrams,
new metrics known as event-set distances are introduced as an assignment problem to enhance the feature dissimilarities in the event space where
the traditional approaches using symmetric differences between distributions (such as histograms and FFT) fail to capture.
Hence, the effectiveness of its representation and metric for texture analysis surpasses its counterparts.
12. A.K.C. Wong and M.L. You, "Entropy and distance of random graphs with application to structural pattern recognition" IEEE Trans. on Pattern Analysis and Machine Intelligence, 7(5), pp. 599-609, 1985 (C:259).
This paper is the first in proposing a direct probabilistic representation of structural aspect of relational data by synthesizing attributed graphs
(AG) via optimal graph monomorphsim into random graphs (RG) for classification, clustering and pattern characterization, followed by a book chapter
Random graphs, 1990 (C:54). It uses minimum change of entropy if two graphs are merged to direct their synthesis and thus clusters the AGs/RGs into RGs.
With a well-formed probability structures and an entropy based distance measures between AGs/RGs, it preserves all the mathematical properties in the
classical PR paradigm while providing the underlying probabilistic and structural aspects of structural patterns. Its citation is still rising today.
13. RV Mayorga, F. Janabi-Sharifi and A.K.C. Wong, "A fast approach for the robust planning of redundant robot manipulators", Journal of Robotis Systems, 12(2) pp. 147-161, 1999 (C:31).
This solves the fundamental problem for intelligent robots to plan a execution path to avoid obstacles provided by the vision system in
achieving its designated tasks. It is based on inverse kinematics problems under an inexact context to permit the avoidance of obstacles
and ensure it smooth execution in an smooth and appropriate manner. These properties make the proposed approach suitable for the redundant
robots operating in real time in a sensor-based environment. This is the theoretical basis in support of vision-guided intelligent robots.
14. A.K.C. Wong, S.W. Lu and M. Rioux, "Recognition and shape synthesis of 3-D objects based on attributed hypergraphs" IEEE Trans. on Pattern Analysis and Machine Intelligence, 2(3), pp 279-290, 1989 (C: 114).
In graph theory, edges between nodes are binary relationship. This paper introduces n-nary attributed relations among attributed nodes to
provide a much general representation of structural and relational patterns in the form of attributed hypergraphs (AH).
It thus eases the construction, partition, representation of various parts and types of relational patterns by their specified or
induced attributed hyperedges (AEs) which could be topological, geometrical, structural, feature-wise relations for objects and scenes.
It applies also to other relational structures by inducing other relations such as statistical and functional [4, 9].
Breaking down structures into AEs helps reducing complexity of search, matching and transformation as reflected by the claims in patent .
15. A.K.C. Wong and Li Rong, "Intelligent modeling, transformation and manipulation system", Approved, US Patent Application No.: US-2002-0095276-AI, 2005 (C: 32).
Beginning with our theoretical work in , the patent demonstrates very general operations of AH  based on Category theory, irregular triangular
meshes, symbolic AH construction, transformation and manipulation. Hence, once a physical and a kinematic world are mapped into the AH category,
all the physical manipulations can be carried on symbolically (and computationally) in the AH world. It provides a generalized framework of modeling,
transformation and manipulation of the 3D and the Virtual World with broad applications. Hence, integrating 3D modelling/vision with intelligence
robotics, 2D and 3D model manipulation, as well as animation, non-linear transformations and morphings in the cyberworld can be accomplished.