Efficient Block Classification of Computer Screen Images for Desktop Sharing using Neural Network

Page | 703 © 2013, Hindawi All Rights Reserved Volume 2013 Journal of Electrical and Computer Engineering Research Paper Available online at: www.hindawi.com Efficient Block Classification of Computer Screen Images for Desktop Sharing using Neural Network P.S.Jagadeesh Kumar Department of Computer Science University of Cambridge, United Kingdom Abstract— This paper presents a neural network based efficient block classification of compound images for desktop sharing. The objective is to maximize the precision and recall rate of the classification algorithm, while at the same time minimizing the execution and training time of the neural network. It segments computer screen images into text/graphics, picture/background blocks by using as input, the statistical features based on DWT coefficients in the sub-bands of each 8×8 block. The proposed algorithm can perform accurate block classification of text information with different fonts, sizes and ways of arrangement from the background image, so that text/graphics blocks can be compressed at higher quality than background image blocks. The proposed work is expected to minimize block classification error due to the adaptive nature of neural network. Keywords— Compound image, Neural Network, Block Classification, Segmentation I. INTRODUCTION A picture can say more than a thousand words. Unfortunately, storing an image can cost more than a million words. This isn't always a problem, because many of today's computers are sophisticated enough to handle large amounts of data. Sometimes however you want to use the limited resources more efficiently. Digital cameras for instance often have a totally unsatisfactory amount of memory, and the internet can be very slow. Mostly in internet, it is necessary to send the digital type of images using digital camera, personal computers. It contains more and more compound images. While sending the compound images, it occupies more size and takes large amount of time to attach. In such conditions, compound image compression is needed and thus requires rethinking of our approach to compression. In this paper, the block based segmentation approach is considered and it gives the better result. In the case of object based approach, complexity is the main drawback, since image segmentation may require the use of very sophisticated segmentation algorithms. In layer based segmentation, the main drawbacks are mismatch between the compression method and the data types, and an intrinsic redundancy due to the fact that the same parts of the original image appear in several layers. But in the block based segmentation it gives the better mismatch between the region boundaries and the compression algorithms, and the lack of redundancy. The proposed block classification algorithm has low calculation complexity, which makes it very suitable for real-time application. II. SEGMENTATION The proposed compound image compression for real-time computer screen image transmission follows first- pass of two-pass segmentation procedure and classifies image blocks into picture and text/graphics blocks by thresholding the number of colors of each block. Basic shape primitives of text/graphics from picture blocks, shape primitives from text/graphics blocks are extracted and are lossless coded using a combined shape-based and palette based coding algorithm. Pictorial blocks are coded by lossy JPEG [1, 2]. So numerous coding algorithms are needed in this method and basic shape primitives defined in this method is not ample for text of different sizes. In this paper, the proposed algorithm first classifies 8 × 8 non-overlapping blocks of pixels into two classes, such as, text/graphics and picture/background based on the statistical feature computed from detail sub-band coefficients of each 8 × 8 DWT transformed image block [3, 6]. Then, each class is compressed using an algorithm specifically designed for that class. The proposed one-pass block classification simplifies segmentation by separating the image into two classes of pixels and also minimizes misclassification error irrespective of font color, style, orientation and background complexity. III. WAVELET TRANSFORM Unlike the Fourier transform, whose basis functions are sinusoids, wavelet transforms are based on small waves called wavelets of varying frequency and limited duration. In 1987, wavelets are first shown to be the foundation of a powerful new approach to signal processing and analysis called multi-resolution theory. Multi-resolution theory incorporates and unifies techniques from a variety of disciplines including sub-band coding signal processing, quadrature mirror filtering from digital speech recognition and pyramidal image processing [4, 5]. Another important imaging technique with ties to multi-resolution analysis sub-band coding. In this coding, an image is decomposed into a set of band-limited components called sub-bands, which can be reassembled to reconstruct the original image without error.

Page | 704 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 IV. NEURAL NETWORKS A. Network Design The work flow for the neural network design process has seven steps: 1. Collect data 2. Create the network 3. Configure the network 4. Initialize the weights and biases 5. Train the network 6. Validate the network 7. Use the network After a neural network has been created, it needs to be configured and then trained. Configuration involves arranging the network so that it is compatible with the problem to be solved, as defined by sample data. After the network has been configured, the adjustable network parameters (weights and biases) need to be tuned, so that the network performance is optimized. This tuning process is referred to as training the network. Configuration and training require that the network be provided with example data. B. Neuron Model The fundamental building block for neural networks is the single-input neuron, such as this example. Fig.1 Simple neuron First, the scalar input p is multiplied by the scalar weight w to form the product wp, again a scalar. Second, the weighted input wp is added to the scalar bias b to form the net input n. (In this case, you can view the bias as shifting the function f to the left by an amount b. The bias is much like a weight, except that it has a constant input of 1.) Finally, the net input is passed through the transfer function f, which produces the scalar output a. The names given to these three processes are: the weight function, the net input function and the transfer function. For many types of neural networks, the weight function is a product of a weight times the input, but other weight functions (e.g., the distance between the weight and the input, |w − p|) are sometimes used. The most common net input function is the summation of the weighted inputs with the bias, but other operations, such as multiplication, can be used. Note that w and b are both adjustable scalar parameters of the neuron. C. Neuron with Vector Input The simple neuron can be extended to handle inputs that are vectors. A neuron with a single R-element input vector is shown below. Here the individual input elements p1 , p2, … pR are multiplied by weights w1,1 , w1,2 ,…w1, R and the weighted values are fed to the summing junction. Their sum is simply Wp, the dot product of the (single row) matrix W and the vector p. Fig.2 Neuron with vector input The neuron has a bias b, which is summed with the weighted inputs to form the net input n. (In addition to the summation, other net input functions can be used). The net input n is the argument of the transfer function f.

Page | 705 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 D. Abbreviated Notation When you consider networks with many neurons, and perhaps layers of many neurons, there is so much detail that the main thoughts tend to be lost. Thus, an abbreviated notation for an individual neuron has been devised. This notation, which is used later in circuits of multiple neurons, is shown here. Fig.3 Abbreviated notation of neuron The input vector p is represented by the solid dark vertical bar at the left. The dimensions of p are shown below the symbol p in the figure as R × 1. (Note that a capital letter, such as R in the previous sentence, is used when referring to the size of a vector.) Thus, p is a vector of R input elements. These inputs post multiply the single-row, R-column matrix W. As before, a constant 1 enters the neuron as an input and is multiplied by a scalar bias b. The net input to the transfer function f is n, the sum of the bias b and the product Wp. This sum is passed to the transfer function f to get the neuron‟s output a, which in this case is a scalar. Note that if there were more than one neuron, the network output would be a vector. A layer of a network is defined in the previous figure. A layer includes the weights, the multiplication and summing operations (here realized as a vector product Wp), the bias b, and the transfer function f. The array of inputs, vector p, is not included in or called a layer. As with the “Simple Neuron”, there are three operations that take place in the layer: the weight function (matrix multiplication, or dot product, in this case), the net input function (summation, in this case), and the transfer function. Each time this abbreviated network notation is used, the sizes of the matrices are shown just below their matrix variable names. This notation will allow you to understand the architectures and follow the matrix mathematics associated with them. When a specific transfer function is to be used in a figure, the symbol for that transfer function replaces the f shown above. Here are some examples. Fig.4 Different types of layer transfer functions E. One Layer of Neurons A one-layer network with R input elements and S neurons follow. Fig.5 One-layer neural network In this network, each element of the input vector p is connected to each neuron input through the weight matrix W. The ith neuron has a summer that gathers its weighted inputs and bias to form its own scalar output n(i). The various n(i) taken together form an S-element net input vector n. Finally, the neuron layer outputs form a column vector a. The expression for a is shown at the bottom of the figure. Note that it is common for the number of inputs to a layer to be different from the number of neurons (i.e., R is not necessarily equal to S). A layer is not constrained to have the number of its inputs equal to the number of its neurons. The input vector elements enter the network through the weight matrix W. Note that the row indices on the elements of matrix W indicate the destination neuron of the weight, and the column indices indicate which source is the input for that weight. Thus, the indices in w1, 2 say that the strength of the signal from the second input element to the first (and only) neuron is w1, 2.

Page | 706 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 Fig.6 R-input one-layer network F. Inputs and Layers To describe networks having multiple layers, the notation must be extended. Specifically, it needs to make a distinction between weight matrices that are connected to inputs and weight matrices that are connected between layers. It also needs to identify the source and destination for the weight matrices. We will call weight matrices connected to inputs input weights; we will call weight matrices connected to layer outputs layer weights. Further, superscripts are used to identify the source (second index) and the destination (first index) for the various weights and other elements of the network. Fig.7 One layer multiple input network G. Multiple Layers of Neurons A network can have several layers. Each layer has a weight matrix W, a bias vector b, and an output vector a. To distinguish between the weight matrices, output vectors, etc., for each of these layers in the figures, the number of the layer is appended as a superscript to the variable of interest. You can see the use of this layer notation in the three-layer network shown next, and in the equations at the bottom of the figure. Fig.8 Three layer neural network The network shown above has R1 inputs, S1 neurons in the first layer, S2 neurons in the second layer, etc. It is common for different layers to have different numbers of neurons. A constant input 1 is fed to the bias for each neuron. Note that the outputs of each intermediate layer are the inputs to the following layer. Thus layer 2 can be analyzed as a one-layer network with S1 inputs, S2 neurons, and an S2 × S1 weight matrix W2. The input to layer 2 is a1; the output is a2. Now that all the vectors and matrices of layer 2 have been identified, it can be treated as a single-layer network on its own. This approach can be taken with any layer of the network. The layers of a multilayer network play different roles. A layer that produces the network output is called an output layer. All other layers are called hidden layers. The three-layer network shown earlier has one output layer (layer 3) and two hidden layers (layer 1 and layer 2). Some refer to the inputs as a fourth layer. The matlab toolbox does not use that designation. The architecture of a multilayer network with a single input vector can be specified with the notation R − S1 − S2 −...− SM, where the number of elements of the input vector and the number of neurons in each layer are specified.

Page | 707 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 Fig.9 Abbreviated notation of three-layer network Multiple-layer networks are quite powerful. For instance, a network of two layers, where the first layer is sigmoid and the second layer is linear, can be trained to approximate any function (with a finite number of discontinuities) arbitrarily well. V. IMPLEMENTATION AND TESTING The proposed neural network is implemented on an Intel Atom Dual Core Processor using MATLAB 7.12, and several compound images of various sizes were used to demonstrate the performance of the proposed system. The classified 8x8 blocks of Computer Screen Image are tested by using precision rate and recall rate. The recall rate is defined as the ratio of correctly detected background/text/graphics blocks to the sum of correctly detected background/text/graphics blocks plus false negatives. False negatives are those blocks in the image which are actually text characters, but have not been detected by the algorithm. The precision rate is defined as the ratio of correctly detected background/text/graphics blocks to the sum of correctly detected blocks plus false positives. False positives are those blocks in the image which are actually not characters of a text, but have been detected by the algorithm as text blocks. Table.1 Precision and Recall rates for neural network based block classification algorithm Image Mode Background Text Graphics PR RR PR RR PR RR Ch1.jpg T 100 100 100 100 100 100 Ch1.jpg S 100 100 100 100 100 100 Ch2.jpg S 100 100 100 100 100 100 Ch3.jpg S 100 100 100 80 97.436 100 Ch3.jpg T 100 100 100 100 100 100 Ch3.jpg S 100 100 100 100 100 100 Ch4.jpg S 100 100 100 100 100 100 Ch5.jpg S 100 100 100 100 100 100 Ch6.jpg S 100 100 100 93.75 96 100 Ch6.jpg T 100 100 100 100 100 100 Ch6.jpg S 100 100 100 100 100 100 A. Functional testing Functional tests at the system level are used to ensure that the behavior of the system adheres to the requirements specification. Functional tests are black box in nature. The focus is on the inputs and proper outputs for each function. Improper and illegal inputs must also be handled by the system. System behavior under the latter circumstances tests must be observed. All functions must be tested. B. Performance testing Performance testing is used to determine the speed or effectiveness of a computer, network, software program or device. The training time and number of iterations is used during training to evaluate the performance of a neural network. Table.2 Performance of neural networks MS_Net1 and MS_Net2 during training Image Number of iterations during training Training Time MS_Net1 MS_Net2 Ch1.jpg 68 16 32.687927 Ch3.jpg 25 9 16.095989 Ch6.jpg 9 13 14.910167

Page | 708 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 VI. CONCLUSION AND FUTURE ENHANCEMENT A. Experimental Results The proposed neural network is implemented on an Intel Atom Dual Core Processor using MATLAB 7.12, and several compound images of various sizes were used to demonstrate the performance of the proposed system. First, it was found that the system worked with 100% accuracy during training mode. For a similar compound image, the system was able to achieve over 95% accuracy during simulation mode. Furthermore, any inaccurate results can be corrected via training and one can expect nearly 100% accuracy in simulate mode. B. Future Enhancement For the proposed neural network based algorithm, accuracy for a similar compound image was over 95% accuracy during simulation mode. Future work is to be done to achieve same level of accuracy for non-similar compound images as well. This can be achieved by adding more hidden layers to the neural network. Table.3 Images used for testing Precision and Recall rate ch1.jpg ch2.jpg ch3.jpg ch4.jpg ch5.jpg ch6.jpg

Page | 709 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 Fig.10 Neural network 1 being trained Fig.11 Neural network 2 being trained Fig.12 Segmented grey-scale image of „ch1.jpg‟

Page | 710 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 Fig.13 Blocks classified as Background Fig.14 Blocks classified as Text Fig.15 Blocks classified as Graphics

Page | 711 © 2013, Hindawi All Rights Reserved PSJ Kumar, Journal of Electrical and Computer Engineering, 2013, pp. 703-711 Fig.16 Runtime for project in train mode using „ch1.jpg‟ REFERENCES [1] Florinabel D J, Juliet S E, Dr Sadasivam V, Efficient Coding of Computer Screen Images with Precise Block Classification using Wavelet Transform. Volume 91 May 2010. [2] Gonzalez R C, Woods R E and Eddins S L. „Digital Image Processing using MATLAB‟. Prentice Hall, Upper Saddle River, NJ, 2004. [3] Keslassy I, Kalman M, Wang D and Girod B. „Classification of Compound Images based on Transform Coefficient Likelihood‟. Proceedings of International Conference on Image Processing, vol 1, October 2001. [4] Mallat S. „A Wavelet Tour of Signal Processing‟. Second Edition, Academic Press, 1999. [5] A. Said and A. Drukarev, “Simplified segmentation for compound image compression”, Proceeding of ICIP‟ 2009, pp.229-233. [6] H. Cheng and C.A. Bouman, “Multiscale Bayesian segmentation using a trainable context model” IEEE Trans.Image Processing, vol. 10, pp. 511–525, April 2001.

Efficient Block Classification of Computer Screen Images for Desktop Sharing using Neural Network

More Related Content

What's hot

Similar to Efficient Block Classification of Computer Screen Images for Desktop Sharing using Neural Network

More from DR.P.S.JAGADEESH KUMAR

Recently uploaded

Efficient Block Classification of Computer Screen Images for Desktop Sharing using Neural Network