Coal Classification Method Based on Improved Local Receptive Field-Based Extreme Learning Machine Algorithm and Visible–Infrared Spectroscopy

In the process of using coal, if the type of coal cannot be accurately determined, it will have a significant impact on production efficiency, environmental pollution, and economic loss. At present, the traditional classification method of coal mainly relies on technician’s experience. This requires a lot of manpower and time, and it is difficult to automate. This paper mainly studies the application of visible-infrared spectroscopy and machine learning methods in coal mine identification and analysis to provide guidance for coal mining and production. This paper explores a fast and high-precision method for coal identification. In this paper, for the characteristics of high dimensionality, strong correlation, and large redundancy of spectral data, the local receptive field (LRF) is used to extract the advanced features of spectral data, which is combined with the extreme learning machine (ELM). We improved the coyote optimization algorithm (COA). The improved coyote optimization algorithm (I-COA) and local receptive field-based extreme learning machine (ELM-LRF) are used to optimize the structure and training parameters of the extreme learning machine network. The experimental results show that the coal classification model based on the network and visible–infrared spectroscopy can effectively identify the coal types through the spectral data. Compared with convolutional neural networks (CNN algorithm) and principal component analysis (PCA algorithm), LRF can extract the spectral characteristics of coal more effectively.


INTRODUCTION
Coal is the main source of energy in the world. With the development of society and industry, the analytical quality of coal plays a decisive role in production efficiency and environmental pollution. In the process of coal mining and production, the classification of coal has important guiding significance for production planning and resource estimation. However, in the current mining process, the traditional coal classification method mainly relies on artificial experience classification. This requires a lot of manpower and material resources, and it is difficult to achieve automation. Therefore, how to distinguish different types of coal quickly, accurately, and in real time is an important research on the mining and use of coal.
According to the degree of coalification, coal is mainly divided into three categories, namely, anthracite, bituminous coal, and lignite. Different coal types have different usage methods in different occasions. For example, the type and quality of coal used in thermal power plants is an important basic basis for boiler design and production process. Therefore, in the process of coal mining, distinguishing mining by coal type is the conventional production method of coal mining. There are two common traditional coal classification methods: one is to use manual classification method, although the classification speed is relatively fast, but the classification accuracy is low, and it is difficult to achieve automation; the second is to use chemical analysis methods, although the recognition accuracy is high, This method has the disadvantages of high cost and long detection time. Accurate and efficient determination of the type of coal is of great research significance for reducing classification costs and improving classification efficiency. Therefore, how to quickly and accurately determine the type of coal is an urgent problem to be solved by modern mineral processing technology.
In the previous research on coal classification, Freidina 1 proposed the principle of faceting method and mass method and introduced the classification method of fossil coal from the perspective of the useful quality of coal. Dong, Li, and others 2 classified iron ore based on BP algorithm and iron ore spectrum and achieved good results. Wang 3 adopted the gray fixed weight clustering evaluation model and calculated the objective weight of each index using the projection pursuit method so as to obtain the clustering coefficient of coal samples belonging to each gray class, namely, anthracite, bituminous coal, and lignite, to solve the problem of coal type identification. Wang 4 proposed a rapid coal classification method based on near-infrared spectroscopy. Wang 5 first used the discriminant analysis function of statistical software SPSS to screen out the main indicators for discriminating coal types and then used variables to establish a Bayesian stepwise linear discriminant function to identify modeling samples and test samples. Pandit 6 proposed to use a classic unsupervised clustering technology for coal classification, namely, "K-means clustering" and a nonlinear clustering form based on artificial intelligence (AI), namely, "self-organization" mapping (SOM). Carlsen 7 used fast pyrolysis gas chromatography/mass spectrometry combined with statistical methods such as principal component analysis (PCA) and hierarchical cluster analysis (HCA) to classify coal.
With the development of spectroscopy, compared with chemical methods, it has the advantages of fast detection speed, economy, and convenience. Therefore, spectral analysis technology has been used in many fields such as rock identification, composition analysis, and food qualification. Since the factors that affect the spectrum of coal mines are mainly the oxides of carbon, sulfur, iron, etc., the spectral data generally contains a lot of irrelevant information for coal type identification, which makes the spectral data of coal mines have high dimensions and redundancy. The margin is large. Therefore, before analyzing the spectrum, effective feature extraction must be done.
In recent years, machine learning and big data analysis have attracted great attention from researchers in different disciplines. To our knowledge, the success of machine learning depends on three key factors: a powerful computing environment, rich dynamic data, and efficient learning algorithms. More importantly, in big data applications, higher requirements are placed on efficient machine learning algorithms.
Most traditional neural network training methods, such as the BP algorithm, 8 involve a large number of gradient descent search steps, which have the problems of slow convergence, local minimum, and serious human intervention. Extreme learning machine (ELM) aims to overcome these shortcomings and limitations faced by traditional learning theories and technologies. ELM provides an efficient and unified learning framework for "generalized" single hidden layer feedforward neural networks (SLFN), including but not limited to Sigmoid networks, RBF networks, threshold networks, trigonometric networks, 9 and fuzzy inference, fully complex neural networks, 10 high-order networks, Ridge polynomial networks, wavelet networks, Fourier series, etc. 11 In biomedical analysis, 12,13 chemical process, 14 system modeling, 15,16 power system, 17 motion recognition, 18 hyperspectral image, 19 and many other different applications, it provides competitive accuracy with extremely high efficiency.
In almost all ELM implementations implemented in the past few years, hidden nodes are fully connected to input nodes. Fully connected ELM has good generalization performance in many applications and achieves high efficiency. However, in applications such as spectral analysis and image processing, there may be strong local correlations, so it is reasonable to expect that the corresponding neural network has local connections instead of full connections in order to learn local correlations. As we all know, there is a local receptive field in the retina module of the biological learning system, which helps to consider the local correlation of the input image. The ELM theory 9,10,20 proves that hidden nodes can be randomly generated according to any continuous probability distribution. Naturally, in some applications, hidden nodes can be generated according to some continuous probability distributions, which are denser around certain input nodes and sparser in farther places. Therefore, the ELM theory is actually valid for local receptive fields.
This paper explores a new coal classification model based on spectral technology and deep learning. In view of the characteristics of high dimensionality, strong correlation, and high redundancy of spectral data, this paper proposes to combine the local receptive field with ELM to solve the problem of coal classification, and to further improve the classification accuracy, the coyote optimization algorithm is studied. An improved coyote algorithm is proposed, and the improved coyote algorithm is used to improve and optimize the structure and training parameters of the model. Finally, different experiments verify the effectiveness of the proposed method. Figure 1. Although ELM-LRF supports a wide range of local receptive fields as long as they are non-linear piecewise continuous, a random convolution node can be considered as a special ELM-LRF combined node.

EXPERIMENTAL SECTION
2.1. ELM-LRF Algorithm. Although different types of local receptive fields and combined nodes can be used in ELM, for ease of implementation, we use a simple step probability function as the sampling distribution and square root/square root pooling structure to form the combined node. According to the ELM-LRF theory, 21 the receptive field of each hidden node will consist of input nodes within a predetermined distance from the center. In addition, the input weights are simply shared to different hidden nodes to directly perform the convolution operation, and it is easy to implement.
In this way, we have constructed a specific case for the general ELM-LRF ( Figure 1): 1) The random convolution node in the Feature Map in Figure 1 is a locally connected hidden node.
2) The node in the Pooling Map in Figure 1 is an example of a combined node.
In order to obtain a complete representation of the input, K different input weights are used to generate K different feature maps. Figure 2 describes the implementation network used by all K maps in this paper.
In Figure 2, the hidden layer consists of random convolution nodes. The input weights of the same feature map are shared among different feature maps. The input weights are randomly generated and then orthogonalized as follows: 1) Randomly generate the initial weight matrix Âi nit , the feature map is (d − r + 1) × (d − r + 1) under the input size d × d and receptive field r × r.
2) The orthogonalized initial weight matrix Âi nit uses singular value decomposition (SVD), and Â, a k̂i s the orthogonal basis of Âi nit 2 .
Orthogonalization allows the network to extract a more complete feature set than non-orthogonal features, thereby further improving the generalization performance of the network.
The input weight to the kth feature map is a k ∈ R r × r and it corresponds to ak ∈ R r × r . The kth feature mapping convolution node (i, j), i, j = 1, ···, (d − r + 1) can be calculated as A square/square root pool structure is used to form a combined node. The pool size e is the distance between the center and the edge of the pool area, as shown in Figure 2. The pool map is the same size as the feature map, which is (d − r + 1) × (d − r + 1). c i, j, k and h p, q, k represent the node (i, j) in the kth feature map and the combined node (p, q) in the kth pool map, respectively: If ( , ) exceed the limit: 0 For an input sample x, the value h p, q, k of the combined node is calculated by the cascade of eqs 3 and 4: If ( , ) exceeds the limit: 0 Figure 2. Implementation of the ELM-LRF network with K feature maps.

ACS Omega
http://pubs.acs.org/journal/acsodf Article Simply concatenate the values of all combined nodes into a row vector and put the rows of N input samples together to get the combined layer matrix H ∈ R N × K · (d − r + 1) 2 : The coyote optimization algorithm (COA) is a population-based algorithm. 22 It is inspired by canines that are divided into two groups: group intelligence and evolutionary heuristics, and it is inspired by the behavior of coyotes. 23,24 Unlike the grey wolf optimizer (GWO), 25 the COA has different algorithmic structure settings, and it does not pay attention to the social hierarchy and domination rules of these animals, even if alpha is used as a group leader (as previously described). In addition, the COA focuses on the social structure and exchange of experiences of wolves, rather than hunting prey as in GWO.
The social condition soc (set of decision variables) of cth wolf at tth instant pth is written as It means the adaptation of the wolf to the environment (the cost of the objective function) . The first step of the COA is to initialize the global coyote population. It is achieved by assigning random values in the search space of the cth coyote of the pth package in the jth dimension, as follows: where lb j and ub j respectively denote the lower and upper bounds of the j th decision variable, D is the dimension of the search space, and r j is the real random number generated in the range [0,1] using uniform probability. After that, assess the adaptation of the coyotes under their current social conditions: Initially, coyotes were randomly assigned to wolves, but sometimes they would leave the wolves: Considering the minimization problem, the α of pth package at instant tth is defined as The COA links all the information from the coyotes and calculates it as the cultural tendency of the ethnic group: In the equation, O p, t represents the ranking social status of all pth wolves in the range [1,D] for each j at the instant of time tth. In other words, the cultural orientation of the ethnic group is calculated as the average social status of all coyotes from a specific ethnic group.
Considering the two main biological events, birth and death, COA calculates the age (in years) of the coyote and expresses it as the age The birth of the new coyote is written by the combination of the social conditions of two parents (randomly selected) plus environmental impact, like this where r 1 and r 2 are random coyotes from package pth, j 1 and j 2 are the two random dimensions of the problem, P s is the scattering probability, P a is the correlation probability, R j is the random number within the bounds of the jth decision variable, and rnd j is the random number in [0,1] generated by uniform probability. Diversity and probability of association guide the cultural diversity of wolves. In the initial version of COA, P s and P a are defined as where P a has the same effect on both parents. In order to maintain the stability of the population, COA synchronizes the birth and death of wolves, as described in Alg.1, where ω and φ respectively represent the coyote group that is more adaptable to the environment than puppies (that is, the solution that presents the worst objective function cost Program group) and the number of coyotes in this group. Note that two or more coyotes may be of similar age (on line 4). In this case, the coyotes with poor adaptability will die.
In order to express the cultural interaction within the ethnic group, COA assumes that the coyotes are affected by alpha (δ 1 ) and ethnic influence (δ 2 ). The first refers to a cultural difference, from a random coyote pack (cr 1 ) to an alpha coyote, while the second is a cultural difference, from a random coyote (cr 2 ) to a culturally inclined pack. The random coyote is selected by the uniform distribution of probability, δ 1 and δ 2 are written as Therefore, the new social conditions of the coyotes are updated using the following equations using alpha and group influence: selection by uniform probability distribution, δ 1 and δ 2 are written as where r 1 and r 2 are the weights of alpha and pack, respectively. Initially, r 1 and r 2 were defined as random numbers in the range [0,1] generated with uniform probability. Then, assess the new social situation: The cognitive ability of the wolf determines whether the new social conditions are more suitable to maintain it than the old social conditions, which means Finally, choose the social conditions of the coyote that can best adapt to the environment as the overall solution to the problem. The pseudocode for COA is described in Algorithm 2, where N c can be set to the first guess in the range [5,10], and N p can be subsequently adjusted to define the overall size of the algorithm.
With the rapid development of intelligent algorithms, many optimization problems have become more complicated. Inspired by the theory of swarm intelligence, the advantages of the artificial coyote algorithm (COA) have attracted the attention of researchers all over the world and achieved most research results. In most cases, the COA is very suitable for regression optimization problems.
Usually, the performance of an algorithm is usually measured by two indicators: local search capability and global convergence capability. Local search refers to the ability to infinitely approach the optimal solution, while global convergence ability refers to the ability to find the approximate location of the global optimal solution. Local search ability and global search ability are indispensable in optimization algorithms.
However, a problem with the COA is the lack of global search capabilities. Although its strong optimization capabilities can allow group diversity and expand the search range, it also means that the convergence rate needs to be slow. Therefore, the basic COA search performance is strong but the development performance is weak.
In the COA, for the birth and death of a population, the oldest individual dies at φ > 1, but when there are multiple individuals of the same age in ω at the same time, the algorithm does not give a strategy.
In order to improve the global search ability of the COA, at φ > 1, a random factor γ is added to the oldest coyote of ω, and the γth individual is randomly selected among the oldest individuals to die. The improved algorithm is as follows.
The second point is that in the COA, there are problems in optimizing the ELM parameters in order to express the cultural interaction within the ethnic group. The evaluation of the alpha influence δ 1 = alpha p, t − soc cr 1 p, t and the ethnic influence δ 2 = cult p, t − soc cr 2 p, t cannot reflect the influence of the group on the individual. The alpha effect δ 1 = alpha p, t − soc cr 1 p, t is improved to Ethnic influence δ 2 = cult p, t − soc cr 2 p, t is improved to In the mutual influence of the ethnic groups, it will also be affected by a part of the environment. In order to increase the global search ability of the algorithm, for the new social conditions new − soc c p, t = soc c p, t + r 1 · δ 1 + r 2 · δ 2 of the coyotes, the impact of adding the environment on the ethnic group is improved as follows: where β is the reference value, which is a fixed value, and r 3 is a 0−1 random value. This not only reflects the internal impact on individuals but also reflects the environmental impact on individuals.
Returning to the ELM-LRF network, although the weights and deviation vectors are randomly given, the ELM-LRF network can approximate any continuous function. However, ACS Omega http://pubs.acs.org/journal/acsodf Article in practical applications, such a random parameter method has a significant impact on the performance of the model because the random parameter method does not guarantee the optimal state of the model output. With the development of swarm intelligence algorithms, more and more scholars use swarm intelligence algorithms to optimize artificial intelligence systems, especially on artificial neural network algorithms; 26,27 swarm intelligence optimization algorithms have good random search capabilities and fast convergence speed.The generalization performance of neural networks is strong, so they can be combined to give play to their own advantages.
With the rapid development of swarm intelligence theory, it has the advantages of good random exploration ability and fast convergence speed. In this section, we will discuss the combination of I-COA and ELM-LRF network to form a more complete coal composition analysis model. This method is called ICELM-LRF algorithm. The specific algorithm flow is shown in Algorithm 4: In this paper, the improved COA is named I-COA. The I-COA is used to optimize the parameters of the ELM-LRF algorithm. First, the ELM-LRF is initialized, the I-COA is initialized, and the parameters are determined, including the number of wolves and the size of the wolves, the maximum number of iterations, etc. ; then check the adaptability of the coyote, get the alpha coyote of the population i, and get the social tendency of the population i; call ELM-LRF to calculate the new social conditions of the population, update the social conditions of the population i; probability of transition between populations; call ELM-LRF to calculate the fitness value of progeny to obtain the survival and death conditions of the population, update the age of the coyote; until the stopping conditions are met, output the ELM optimal parameters and optimal values.
2.3. Acquisition and Processing of Spectral Data. This article collected samples collected from the Shenhua Zibao Rixile, Jiajinkou, Zhijin, and Yimin mining areas. Four kinds of spectrum are anthracite, bituminous coal, lignite, and gangue. It includes 315 samples, of which 282 coal samples are used as the training set and 33 coal samples are used as the test set. The anthracite label is 1, the bituminous coal label is 2, the lignite label is 3, and the gangue label is 4. The detailed data attributes of coal types are shown in Table 1.
This subject is based on the interdisciplinary technology of spectroscopy and computer science to analyze, identify, and monitor coal mines. We collected a large number of coal and non-coal samples in several coal mining areas. Use a Spectra Vista's SVC HR-1024 ground spectrometer as an experimental instrument to measure the spectral information of the sample. The measurement spectrum experiment was carried out in different environments outdoors and indoors. First, clean the surface of the sample and then cut the coal sample into two parts: one for grinding and the other for the block. For outdoor experiments, using solar radiation as the light source, the experimental observation is selected between 10:00 and 14:00 under the sun. When the sun's altitude angle is 60°, the sky is clear without clouds or few clouds, the scanning time is in seconds/time, and the probe of the spectrometer is 480 mm away from the sample surface. The surface of the coal sample should be kept as horizontal as possible, and the spectrometer lens and observation surface should be kept vertical. Experimenters are required to wear dark clothing and not allowed to move around in order to minimize the data error caused by the reflection of the clothing. Because the solar radiation value is different at different times, it is necessary to perform whiteboard measurement calibration every 10 min in the experiment to calibrate the equipment.
For indoor experiments, a halogen lamp is used as the light source, the scanning time is also 1 s/time, and the probe of the spectrometer is 480 mm away from the surface of the coal sample and perpendicular to the surface of the sample piece. The halogen lamp is 320 mm away from the coal sample and forms an angle of 45°with the surface of the coal sample. In order to reduce the interference caused by the environment to the experiment and avoid the interference of other light sources, the spectrum measurement experiment was completed in a closed room. Experimenters are not allowed to walk around and wear dark clothing to reduce the interference of the environment with the spectral data. Figure 3 shows the sample collection and data collection process.
For the coal content, chemical analysis methods are used to determine the coal composition. These indicators are the main basis for judging coal quality. The chemical analysis method is to measure through complex experiments and standard instruments to obtain moisture, volatile matter, ash, fixed carbon, low calorific value, and sulfur content in coal. For example, the volatile matter is heating the coal under air under ACS Omega http://pubs.acs.org/journal/acsodf Article high temperature conditions, the organic matter in the coal will react, part of the mass will become gas escape, and the rest of the material will be left in solid form. The SVC HR-1024 spectrometer has a total of 1024 bands, and the spectral range is 350−2500 nm. Figure 4 shows photos of different samples, and Figure 5 shows the spectral curves of these samples. Each coal sample has 961 spectral characteristics. Spectral curves between different kinds of samples are mixed together, especially bituminous coal and anthracite, because the coal itself is black, most of the energy injection is absorbed, the reflection intensity is low, and the spectrum has little valid information. If manual methods are used, it is difficult to distinguish these coal types, especially anthracite, fat coal, and coking coal. Therefore, using machine learning methods for recognition can effectively improve the classification accuracy.

RESULTS AND DISCUSSION
3.1. Coal Classification Model Based on the ICELM-LRF Algorithm. The spectral data of coal has high dimensions, strong correlation, and great redundancy. If the unprocessed spectral data is directly used for modeling and identification, the model identification ability will be reduced. Therefore, the ICELM-LRF algorithm established in this paper is mainly to solve the problems of hyperspectral data and at the same time improve the accuracy of coal classification. Based on this idea, the parameters of the ICELM-LRF coal classification model designed in this paper are as follows: For the LRF network, this paper uses ReLU as the activation function and the average sampling function. The kernel size of the convolutional layer is 4 × 4, the number of output features of the convolutional layer is 2, and the sampling function size of the sampling layer is 3 × 3. I-COA sets the maximum number of iterations to 100, the number of populations to 5, and the number of individuals in each population to 3. Use the Sigmoid function as the activation function of the ELM algorithm.
In order to evaluate the performance of the ICELM-LRF model, this article establishes three models based on LRF, COA, and ELM algorithms and compares them, including LRF combined with basic ELM (ELM-LRF model), LRF combined with basic COA and ELM (LRF-ICELM model), and LRF combined to improve COA and ELM (ICELM-LRF model). In this paper, different experiments were carried out on indoor and outdoor spectra. The comparison results are shown in Figure 6, Figure 7, Figure 8, Figure 9, and Table 2.
After the algorithm ran 30 times, we take the best one. Figure 6 is the distribution diagram of the prediction results of the test and training sets of the ELM-LRF model. It can be seen that, due to the lack of optimization of the ELM

ACS Omega
http://pubs.acs.org/journal/acsodf Article structure and training parameters, the classification results of the test and training sets have a large deviation. In particular, there is a large deviation in the classification results of the first type of anthracite. Therefore, the ELM-LRF model has limited ability to identify the type of coal. According to the results shown in Figure 9 and Table 2, the classification accuracy of the test set of the ELM-LRF model is 75.76% in the spectral experiment, and the training time is 0.3460 s. Figure 7 shows the prediction results of the CELM-LRF model. We can see that the model has fewer error points. After the algorithm ran 30 times, we take the best one. The number of misclassifications in the spectrum experiment of the CELM-LRF model is 1 (coal sample). Figure 8 shows the classification results of the ICELM-LRF model. After the algorithm ran 30 times, we take the best one. The model achieved 100% accuracy of the test set in the spectrum experiment. According to Figure 9 and Table 2, the classification accuracy of the CELM-LRF model in the spectral experiment test set is 96.97%. The classification accuracy of the ICELM-LRF model in the spectral experiment test set is 100%. The training time of CELM-LRF and ICELM-LRF models is about 50 s, and the training time is basically the same.
From the above analysis, it can be seen that the ICELM-LRF model has good performance for coal classification, and using the LRF network to extract features for data in the ELM algorithm can achieve highly accurate classification. Using the I-COA to optimize the ELM-LRF network can also improve the performance of the classification model. Compared with the original COA, the improved COA has a higher accuracy and stability. This provides a fast and efficient identification method for coal type identification technology.
3.2. Comparison with Other Methods. Principal component analysis combined with ELM has been widely used in many problems, and there have been many related studies in the field of spectroscopy. Jiang et al. 28 used Fourier transform spectroscopy combined with multivariate analysis to determine the hardness and solid content of pears. They use principal components and independent components to extract valid information from the original spectral data. Then, use ELM to build a regression model. Zheng et al. 29 applied ELM, PCA, 30 and spectroscopy to food classification and counting and compared with other typical intelligent methods (including SVM, squared discriminant analysis, and backpropagation BP network). The accuracy is better than other classification methods. In addition, the execution speed of ELM is much faster than these algorithms. This shows that ELM is a promising method of food classification. Liu et al. 31 combined laser-induced breakdown spectroscopy with machine learning methods to identify non-transgenic and transgenic corn and achieved 100% test accuracy. CNN 32 is a feature extraction method for deep learning. CNN mainly uses multiple hidden layers to extract data

ACS Omega
http://pubs.acs.org/journal/acsodf Article features and uses the back-propagation method to find the optimal parameters of the network. CNN proved to be an efficient feature extraction method. This article compares the LRF method, PCA method, and CNN method. Among them, PCA uses 10 principal components to replace the original data. For the recognition algorithm, in addition to using the ICELM algorithm, this article also uses RBF-SVM for comparative analysis of the results. The experimental results are shown in Table 3 and Figure 10.
According to the results shown in Table 3 and Figure 10, it can be seen that the ICELM-LRF algorithm has the highest recognition accuracy among all algorithms, reaching 100%, but the training takes a long time. For using CNN to extract features, the highest recognition precision is CNN-IPELM, and the test set recognition accuracy reaches 93.02%, while for PCA as feature extraction, the highest recognition accuracy is the PCA-IPELM model, and the test set recognition accuracy reaches 91.87%. The data after feature extraction using the LRF network enables the model to effectively identify the type of coal, and the accuracy rate is generally higher than the model using CNN and PCA. The results also verify the effectiveness of the improved coyote algorithm. The improved algorithm has a higher recognition accuracy than other typical algorithms. Table 4 is a comparison of artificial experience, chemical testing methods, and the methods proposed in this article. This table is based on the identification costs of 206 coal samples. It can be seen that although the artificial experience method is very economical, the recognition accuracy is low. In addition to the need for expensive experimental drugs using chemical analysis methods, some chemical experiment measuring instruments cost 2 million or more. Therefore, although the chemical method is accurate, the cost is high, and it takes a long time. The cost of the method proposed in this paper only needs to be within 350,000, which is higher than the traditional artificial empirical method in recognition accuracy, less investment, and shorter time-consuming than  the chemical analysis method. This provides a fast, highefficiency, and low-cost analysis tool for coal classification technology.

CONCLUSIONS
This paper presents a coal classification model with visible and infrared spectroscopy combined with deep learning network. First, the LRF network combined with ELM algorithm is used to construct a coal recognition model. To improve the recognition accuracy, this paper proposes an improved coyote algorithm to optimize the ELM-LRF model. The experimental