USING THE KERAS LIBRARY FOR THE NEURAL NETWORK BASED IMAGE RECOGNITION
10.11.2021 22:12
[1. Інформаційні системи і технології]
Автор: Turchyk Ye.L., student, Department of Modeling and Software, Kryvyi Rih National University;
Puzino M.V., student, Department of Modeling and Software, Kryvyi Rih National University;
Rybalchenko O.H., senior lecturer, Department of Modeling and Software, Kryvyi Rih National University
Image recognition is the process of assigning an object with a fixed set of features to one of the classes of a given items scope. Object recognition is widely used in many areas of human activity such as technical diagnostics, medical diagnostics, biometrics, security systems, word processing, bioinformatics, forecasting and robotics. One of the modern progressive methods of solving the object recognition task is to use the neural networks.
Artificial neural networks are the mathematical models of the biological nerve cell networks of a living organism. The main element of an artificial neural network is a neuron. Interconnected neurons form layers, which number varies depending on how complex the network is and what the tasks does it solve. The advantage of neural networks over traditional algorithms is the possibility of their training [1].
An artificial neural network with several hidden layers is called a deep neural network (DNN). They can simulate complex nonlinear relationships between elements. In the process of DNN learning, the resulting model tries to represent the object as a combination of simple primitives. Additional layers allow you to build abstractions of ever-higher levels, which then makes it possible to create models for recognizing complex real world objects. Most often DNNs are built as direct distribution networks.
Convolutional neural networks (CNNs) are well suited for image classification. Their implementation is based on a special architecture inspired by the data obtained in physiological experiments with the visual cortex of a human eye. CNNs are usually built using one of the varieties of a multilayer perceptron, which is designed to minimize input information pre-processing. The use of CNN eliminates the high computing resource cost of training and running an application with a large number of artificial neuron layers.
This neural network learning type can be referred to the learning with a teacher. In this scheme, the system learns to recognize images using various adaptive features. According to this learning method, the correct classification of each of the learning images is known in advance.
The CNN topology proposed by Jan LeKun [2] consists of alternating convolutional layers, subdiscrete layers and fully connected layers at the output. This architecture contains of three main paradigms: local perception, distributed weights, and subsampling. An image from the three-color channels (RGB) is fed to the CNN input, after which several convolutional and subdiscrete cascade layers are placed. The source layer contains the probabilities of the image belonging to a particular class.
Let us consider in detail how to implement a similar CNN with the use of Keras library for Python 3 programming language.
Keras Library is a neural network deep learning tool. It is an open source neural network library, which is also used in such popular frameworks as Deeplearning4j, Tensorflow and Theano. It is aimed to work quickly with deep learning networks, while being compact, modular and extensible. The original building block of Keras is a model. The library provides two ways to combine the models: sequential composition and functional composition.
Let’s define the CNN implementation algorithm using the Keras library and Python.
1. Loading the CIFAR-10 dataset
((X_train, y_train), (X_test, y_test) = cifarlO,load_data()).
CIFAR-10 dataset consists of 60000 colored images, each having the size of 32 by 32 pixels with 3 color channels, divided in 10 subclasses. This dataset is often used to check the machine learning algorithms efficiency.
2. Data processing. Let's convert the data to a categorical form. Convert pixel intensity information to a moving point format and normalize.
3. Creating a network model (model=Sequential()).
We will use a sequential composition to connect the network model, which is a linear pipeline (stack) of neural network layers.
The first two convolution layers (Convolution2D) have 32 3x3 convolution filters. We will use ReLU as an activation function, which introduces nonlinearity into the model. The next sub-sample layer (MaxPooling2D) has a block size of 2x2 and regularization coefficient of 25% (Dropout (0.25)).
The second stage consists of two convolution layers (Convolution2D), which have 64 convolutional filters 3x3, and a subselection layer (MaxPooling2D) with a block size of 2x2 and regularization factor of 25% (Dropout (0.25)). We are also using ReLU as an activation function.
The data is converted using (Flatten ()) and transmitted to a fully connected layer (Dense ()) consisting of 512 neurons. The next layer is the source (Dense ()), consisting of 10 neurons. For this layer we are using the softmax activation function, which is a generalization of the sigmoid. Softmax "flattens" a k-dimensional vector containing arbitrary real numbers into a k-dimensional vector of real numbers from the interval [0, 1]. Then we apply regularization coefficient of 50% (Dropout (0.5)) between these two source layers.
4. At the last stage, the obtained deep CNN must be compiled for execution by the base library (Theano or TensorFlow).
The Keras library implements a stochastic gradient descent (GHS) and two optimization methods – RMSprop and Adam. It supports the following target functions: root mean square error, binary cross entropy, categorical cross entropy and maintains the following quality indicators: recall, i.e. the ratio of the number of correct predictions to the total number of labels; precision, i.e. part of the correct answers of the model; completeness, i.e. the fate of the revealed true events.
When compiling the model as a target function, we use the categorical cross-entropy (categorical_crossentropy). Optimization process is performed using a stochastic gradient descent, the parameters of which vary: SGD (lr = XXX, decay = 1e-6, momentum = 0.9, nesterov = True), where lr is the network learning speed. As metrics, we use precision.
5. Once the deep convolutional neural network model has been trained, it should be tested on a test set that contains images that have not previously been presented. This makes it possible to obtain the minimum value achieved by the objective function and the best value of the quality indicator.
Let's build a precision graph depending on the number of epochs and layers (Fig. 1).
Fig. 1 Precision dependency graphs on the number of CNN learning epochs from 40 epochs of study
The graph shows that the network achieves an accuracy of 78.4% on the test set in 40 iterations.
References:
1. C. D. Manning «Computational Linguistics and Deep Learning» «Computational Linguistics», vol. 41, 2015
2. Y. LeCun, Y. Bengio «Convolutional Networks for Images, Speech, and Time-Series», Brain Theory Neural Networks, vol. 3361, 1995
3. Antonio Julie, Sujit Pal. The Keras Library is an in-depth learning tool. Implementation of neural networks using Theano and TensorFlow libraries / trans. with English Slinkin AA - M .: DMK Press, 2018. - 294 p.