Introduction

Materials and methods

Experimental materials and treatment methods

Spectral data collection

Spectral data preprocessing

Results and discussion

Prediction and analysis of the saponin concentration of ginseng powder

Conclusion

Conflicts of interest

## Introduction

Often referred to as “the king of all herbs”, ginseng is a fleshy root that mainly grows in cold regions. Ginseng is a common tonic herb, with the most effective component of ginseng being saponin. According to the “Chinese Pharmacopoeia”, the content of ginsenosides generally plays a major role in assessing the quality of ginseng. However, the ginsenoside content alone is not enough to fully evaluate the quality of ginseng, and the red color of ginseng is also used as an indicator of ginseng quality (Raksakantong et al., 2012; Ning and Han, 2013).

Testing for saponins, the major component of ginseng, still mostly relies on chemical analyses, which mainly include a thin layer chromatography-colorimetric method, an ultra-high-performance liquid chromatography UV detection method, a light-scattering detection method, and a liquid chromatography mass spectrometry method. These methods require chemical pretreatment of samples, a long timescale to perform, and introduce chemical contamination into the samples. In addition, these methods include a high analysis cost and cannot be used for real-time on-line analysis of large quantities of samples (Ha et al., 2014). Recently, spectral technology has been used in major saponin content analysis and prediction. For example, Zhang et al. (2015) used near- infrared spectroscopy (NIRS) for rapid determination of ginsenoside Rg1 and Re in the Chinese patent medicine Naosaitong pill. Li et al. (2018) studied the method of saponin content prediction in soapnut (Sapindus mukorossi Gaertn.) fruit by NIRS.

Spectral analysis technology has many advantages including fast analysis capability, being nondestructive, providing good reproducibility, requiring no need for the pretreatment of samples, enabling easy on-line analysis implementation, and easy operation. Due to these advantages, spectral analysis technology has also been widely used in civil sectors; for example, in the field of agriculture and food industry, spectral analysis technology has been used for the qualitative and quantitative analysis of food components, and has also been used in vegetation and pest detection systems as well as in many other forms of testing (Xin et al., 2007). Studies testing and assessing the prediction models have shown that the model assessment data demonstrate good performance, and there is no significant difference between spectral analysis results and values predicted by the other models that has adequately proved that the spectral detection technology can be used as a rapid nondestructive testing technology (Windham et al., 2003). Spectra are generated in the process when molecules transition from a lower energy level to a higher energy level, which is well-explained by quantum mechanics. The transition process records the absorption of H-containing groups, such as CH-, -OH, and NH-. Evidence suggests that the absorption of H-containing groups differs in various physical and chemical environments. NIRS can reveal the structural and compositional information of samples, which makes it suitable for detecting the components of H-containing organic compounds. Ginseng is rich in saponins, which contain H groups, and therefore NIRS can be used to perform quality testing (Xing and Chang, 2009; Tan et al., 2015; Gong et al., 2014).

The shortwave band of hyperspectral imaging spectroscopy is situated between the visible light band and the long-wavelength band. According to the principle of shortwave infrared imaging spectroscopy, it possesses a strong advantage in terms of stability and image quality compared to other wave bands (Xu and Ying, 2002). With the increasing development of economics and technology, the requirement for imaging technology in spectral detection is also increasing. Visible and infrared wave band spectral imaging technology has improved locally and abroad, whereas the development of shortwave band imaging technology still requires further development. Domestic scholars have therefore intensified the study of this aspect in recent years. Significant progress has been made in digitalization for use in the agricultural industry and other areas (Cai et al., 2011; Wang et al., 2010), leading to the study of preprocessing methods of spectral data. This technology has great developmental prospects and value for civil applications, industry, military applications, and other sectors (He et al., 2008; Sun et al., 2016). In this study, we apply shortwave hyperspectral imaging technology to examine ginseng powder, analyze the hyperspectral imaging characteristics of the ginsenosides, and establish a ginsenoside content prediction model.

## Materials and methods

Experimental materials and treatment methods

Six-year-old artificially cultivated ginseng from the same source in the Changbai Mountains, Jilin, was used in this experiment. The ginseng samples selected for this experiment were all healthy and without defects. 2 g of ginseng root was selected, washed, dried, and pulverized into 36 powder samples. To ensure accurate representation by the test samples, the total saponin content of the ginseng powder samples should exhibit a certain gradient. The total saponin content was found to be between 57.52 mg/g and 63.96 mg/g, with an average value of 60.64 mg/g. The ginseng was pulverized into powder by a LD-Y400A high-speed universal pulverizer (Shanghai Dingshuai Electric Co., Ltd.) at a speed of 25000 rotations/min with each crushing process lasting 1 min, producing a particle diameter of 80 µm. The ginseng powder was then divided into a test set and a prediction modeling set to facilitate the spectrum acquisition and model construction described in the following.

Spectral data collection

*Spectral acquisition equipment*

In this study, a shortwave imaging spectrometer (Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences) was used to collect the spectral data. This spectrometer was equipped with a Sony second-generation ILX511BCCD, with a wavelength range of 1000 nm–2500 nm, spectral resolution of 10 nm, 256 wave bands, an integration time of 65 ms, spatial resolution of 1 mrad, 320 space- dimension pixels each measuring 30 μm × 30 μm, 14-bit digitization, and a frame rate of 10 fps - 100 fps. The light source was a 150-W halogen lamp, and the volume of the instrument components were as follows: the detector head was 392 mm × 170 mm × 151 mm, the power source was 120 mm × 240 mm × 70 mm, and the controller was 260 mm × 260 mm × 95 mm. This spectrometer is particularly suited for applications that require high- speed image processing. The assembled shortwave imaging spectral detection system is shown in Figure 1.

*Spectral characteristics of image acquisition*

In general, spectral acquisition using an imaging device first obtains the image of the test sample from which further spectral information of each point can then be obtained, which means that spectral information of multiple points can be obtained from a single scan. If an imaging device is not used, multiple scans are needed to obtain the spectral information of multiple points. Therefore, if there are many points required in order to acquire adequate spectral information, the use of a spectral imaging device can save a lot of time and effort. The image obtained in the experiment is shown in Figure 2.

(3) Ginseng spectral acquisition

First, the light source of the spectral acquisition system is switched on for preheating and a series debugging processes is performed on the acquisition system to ensure the adequate stability of the experimental environment. The 36 ginseng powder samples prepared in advance were numbered and grouped; 24 were selected as test samples while the remaining 12 samples were used as model prediction samples. The samples were placed sequentially in containers of radius 1 cm on the lifting platform of the shortwave imaging spectrum- acquisition instrument, with the platform adjusted so that the distance between the sample and the light source was in a suitable position. Once the test environment was stabilized, the spectral information obtained from the stable sample was stored in the form of images. ENVI spectral acquisition software was required to extract the spectral information from the image information. ENVI software features a modular design, and possesses complete set of remote sensing image-processing functions as well as an abundant secondary development function library, comprising a comprehensive image- processing system. When the obtained map information is input to the ENVI software, five 3 mm × 3 mm regions are extracted from each sample image and the corresponding spectral image information in the 900 nm - 2500 nm wave band of the sample map can be obtained (Liang et al., 2010). Once the spectra of the samples are successfully acquired, the spectra are collated by comparing the ginseng powder sample and the white board using six points of ginseng power samples to obtain average values. The process of acquiring a sample spectrum and spectral image information of certain points is shown in Figure 3 and Figure 4, respectively. The functional C=O, 2C-H, and O-H groups show strong stretching vibration in the 2100 nm - 2500 nm region.

Spectral data preprocessing

*Spectral data preprocessing software*

The spectral information obtained in this study requires the use of the Unscrambler 9.7 stoichiometry software developed by the CAMO Corporation. Unscrambler 9.7 is a multi-featured data analysis program mainly used for principal component analysis, regression analysis, discriminant analysis, prediction, and experimental design to facilitate the analysis and interpretation of large amounts of data. The software is also compatible with Microsoft Office software for data transmission purposes and can directly transfer raw data so that it can be directly used to generate line graphs, histograms, and matrices. Aside from simple calculation and sorting functions, this software also provides some commonly used data-preprocessing functions as well as correction methods including multiple linear regression, principal component regression, partial least squares regression, and other quantitative analysis methods. Unscrambler 9.7 can also perform soft independent modeling by class analogy and partial least squares discriminant analysis (PLS-DA), two qualitative analysis methods. Using these methods, the data import, principal component analysis, cross-validation, and regression equation formulation were carried out and the ginsenoside PLS model was established (R. Glenn and Glenn, 2005; Liu and Chen, 2014).

*Spectral data preprocessing and modeling analysis*

After the obtained spectral data were averaged, the final spectral data were input to the Unscrambler 9.7 software and subjected to noise reduction preprocessing, optical path correction preprocessing, differential preprocessing, and combination preprocessing, respectively. It was found after the two noise reduction preprocessing functions of moving the average and S-G smoothing that the model was not as suitable as that without the noise reduction processing, suggesting that the built-in noise-reduction function of the ENVI spectral acquisition software effectively eliminates the effect of noise so that there is no need for additional noise reduction during preprocessing. Although the coefficient of determination is higher when performing standard normal variable (SNV) transformation during optical path correction preprocessing, it was found after establishing the PLS model that the root mean squared error of calibration (RMSEC) and the root mean squared error of prediction (RMSEP) did not meet the specifications, suggesting that the prediction was not sufficiently accurate and thus required further processing (Liu et al., 2017; Wu and Sun, 2016). The established PLS model, which was implemented after two differential preprocessing methods, had a similar effect as the optical path correction preprocessing described above. To develop a more ideal PLS model of ginsenoside content with improved prediction performance, SNV transformation and multiple scattering correction, two different optical path correction preprocessing methods, were combined with first-order and second-order differentials, respectively. The results of the established models after these four combinations of preprocessing methods were recorded.

*Determination of the ginsenoside content*

A ginsenoside reference sample of 10.5 mg was accurately weighed and transferred to a 10-ml volumetric flask, dissolved in methanol, and diluted to the specified concentration. Different volumes of the ginsenoside Re control solution, with deionized water used as a blank, were combined with 0.5 ml of 5% (w/v) vanillin glacial acetic acid solution and 5.0 ml of 70% (w/v) sulfuric acid aqueous solution. The solution was placed in a 60℃ water bath for 15 min that was then cooled for 10 min, and the solution was then left at room temperature for another 10 min. A TU-1810 UV-Vis spectrophotometer was used to measure the absorbance at 544 nm, and the standard curve was plotted. The obtained regression equation is *Y* = 0.47998 + 2.83571*X* with *R*^{2} = 0.99860, and the corresponding total ginsenoside content was calculated from the standard equation (Ha et al., 2014).

## Results and discussion

Prediction and analysis of the saponin concentration of ginseng powder

*Sample saponin concentration measurement results*

Tables 1 and 2 display the statistical results of the saponin concentrations of the test set and modeling set of ginseng powder samples. From this data, the total saponin concentration values of the 24 ginseng powder samples in the test set are between 58.48 mg/g and 63.96 mg/g, with an average of 60.84 mg/g and a standard error of 1.20 mg/g. The total saponin concentration values of the 12 ginseng powder samples in the prediction set are between 59.85 mg/g and 63.16 mg/g, with a standard error of 1.05 mg/g.

Table 1. Total saponin contents of samples

Number | Absorbance |
Concentration
mg/mL |
Concentration
mg/g | Number | Absorbance |
Concentration
mg/mL |
Concentration
mg/g |

1 | 0.66 | 0.063483 | 62.26409 | 19 | 0.755 | 0.096985 | 60.03994 |

2 | 0.665 | 0.065246 | 60.41019 | 20 | 0.754 | 0.096632 | 61.81072 |

3 | 0.759 | 0.098395 | 63.95682 | 21 | 0.890 | 0.144592 | 59.98458 |

4 | 0.647 | 0.058899 | 60.28424 | 22 | 0.823 | 0.120964 | 60.62687 |

5 | 0.722 | 0.085347 | 60.4757 | 23 | 0.752 | 0.095927 | 62.35229 |

6 | 0.731 | 0.088521 | 61.53868 | 24 | 0.851 | 0.130838 | 59.94502 |

7 | 0.809 | 0.116027 | 60.4178 | 25 | 0.84 | 0.126959 | 60.5236 |

8 | 0.71 | 0.081115 | 62.72507 | 26 | 0.694 | 0.075473 | 60.05756 |

9 | 0.893 | 0.14565 | 60.67223 | 27 | 0.707 | 0.080058 | 59.93741 |

10 | 1.039 | 0.197136 | 59.1383 | 28 | 0.845 | 0.128723 | 60.6697 |

11 | 1.017 | 0.189378 | 60.0955 | 29 | 0.725 | 0.086405 | 63.16336 |

12 | 0.895 | 0.146355 | 60.13067 | 30 | 0.716 | 0.083231 | 60.10038 |

13 | 0.804 | 0.114264 | 60.2717 | 31 | 0.748 | 0.094516 | 60.43541 |

14 | 0.791 | 0.10968 | 60.29185 | 32 | 0.781 | 0.106153 | 62.99965 |

15 | 0.723 | 0.0857 | 59.90492 | 33 | 0.75 | 0.095221 | 60.89385 |

16 | 0.714 | 0.082526 | 60.64195 | 34 | 0.768 | 0.101569 | 61.0198 |

17 | 0.722 | 0.085347 | 60.4757 | 35 | 0.733 | 0.089226 | 60.99712 |

18 | 0.689 | 0.07371 | 60.91146 | 36 | 0.728 | 0.087463 | 59.85102 |

Table 2. Statistics of ginseng powder saponin concentration

Quality parameters |
Sample number |
Minimum value mg/g |
Maximum value mg/g |
Average value mg/g |
Standard error mg/g | |

Total saponin concentration of ginseng | Test set | 24 | 58.48 | 63.96 | 60.84 | 1.20 |

Prediction set | 12 | 59.85 | 63.16 | 60.89 | 1.05 |

*Analysis of preprocessing results of total saponin concentration*

The obtained visible/near-infrared spectral reflectance data of the ginseng powder were averaged and then processed by noise reduction, optical path correction, and differential processing, and then the partial least squares regression method was used to establish the model according to the saponin concentration level. Table 3 presents the statistical figures of the modeling results after the preprocessing of the various data.

Table 3. Statistics of preprocessing results of the spectral data for saponin concentration

Preprocessing method | Number of factors | R^{2} | RMSEC | RMSEP | |

None | 10 | 0.9056 | 0.3672 | 1.4327 | |

Noise reduction preprocessing | Moving average | 10 | 0.8786 | 0.4164 | 1.4415 |

S-G smoothing | 10 | 0.9039 | 0.3706 | 1.5218 | |

Optical path correction preprocessing | MSC | 10 | 0.9629 | 0.2301 | 1.6981 |

SNV | 10 | 0.9822 | 0.1595 | 1.3642 | |

Normalize | 10 | 0.9365 | 0.3013 | 1.6160 | |

Differential preprocessing | FD | 9 | 0.9936 | 0.0953 | 1.6391 |

SD | 8 | 0.9953 | 0.0823 | 1.5747 | |

Combination preprocessing | FD+MSC | 9 | 0.9963 | 0.0490 | 1.5030 |

FD+SNV | 8 | 0.9913 | 0.1115 | 1.5355 | |

SD+SNV | 8 | 0.9972 | 0.0628 | 1.5268 | |

SD+MSC | 8 | 0.9952 | 0.0832 | 1.4751 |

As seen from Table 3, similar to the modeling results using the degree of red coloration of the ginseng powder, after the two noise reduction processing functions of moving average and S-G smoothing, the model was found to be not as effective as the model that did not use noise reduction processing, confirming the conclusion reached earlier and suggesting that the built-in noise-reduction and scattered light correction functions of the ENVI spectral acquisition software can eliminate the impact of noise to sufficient extent. Therefore, when the spectral information is preprocessed there is no need to perform further noise reduction processing.

Among the optical path correction preprocessing methods, SNV transformation has a higher coefficient of determination of 0.9822, a lower RMSEC of 0.1595, and a lower RMSEP of 1.3642. In the PLS model established by the SNV transformation, RMSEP (1.3642) > RMSEC (0.1595), suggesting that its prediction performance is still not sufficiently adequate, and thus requires further processing.

When differential processing is used to preprocess the data, although the established PLS model has a higher coefficient of determination after the two differential processing cycles, the RMSEP and the RMSEC values are similar and the situation where RMSEC is smaller than RMSEP also occurs. To develop a more ideal PLS model for the total saponin concentration, a combination of the optical path correction preprocessing method and differential preprocessing method was investigated. Two optical path correction processing methods with different algorithms, SNV transformation and multiple scattering correction, were combined with first-order and second-order differentials, respectively. The results of the established models after the preprocessing of the four combinations of methods were observed and examined.

Based on the obtained results of the evaluation index from the various models in Table 3, we find a relatively higher coefficient of determination of 0.9972 for the spectral data preprocessing method combining the SNV transformation with the second-order differential (SD + SNV), while the values of RMSEC (0.0628) and RMSEP (1.5268) are relatively smaller. The prediction performance of the model is more prominent, and provides a better result. Therefore, in terms of the total saponin concentration in the ginseng powder, the optimal spectral data preprocessing method is the combination of the SNV transformation with the second-order differential (SD + SNV).

*Analysis of the results of the total saponin concentration prediction model*

After the spectral data are preprocessed by the SD + SNV combination, PLS modeling of the total saponin concentration was performed. First, the number of principal components was determined, and in order to avoid omission, the absolute values of the prediction residuals of various numbers of components were obtained first while 10 principal components were initially selected. After SNV transformation and 25-point smoothing second- order differential preprocessing was performed on the spectral data, PLS modeling was carried out with the subsequent data. The corresponding histogram of the number of principal components and the absolute values of the prediction residuals are shown in Figure 5.

As shown in Figure 5, the relationship between the number of principal components and the absolute value of the prediction residual is an evidently decreasing function. When the number of principal components is 8, the absolute value of the prediction residual is the smallest so the established prediction model is more accurate and the prediction ability is stronger. Therefore, the principal component number of the ideal model is determined to be 8.

With the determined principal component number set as a parameter, a PLS model of the total saponin concentration was established. Its model prediction value, as well as the actual value results, is shown in Figure 6. This established PLS model was used to predict the total saponin concentration of the 12 ginseng powder samples in the prediction set, and the graphs corresponding to the predicted values and actual values of the test set samples along with the prediction set samples are shown in Figures 6 and 7.

It can be seen from Figure 6 and Figure 7 that the coefficient of determination of the ginsenoside is 0.9972, the RMSEC value is 0.0627, the determination coefficient *R*^{2} of the prediction model is 0.9999, the RMSEC value of 0.0041 < RMSEP of0.0043, and the bias is -0.5142. It is concluded that the prediction results of the model are reasonable, and that the prediction accuracy is sufficiently adequate.

The saponin prediction model is expressed as:

$$\begin{array}{l}Y=11.3628+\left(-9.2609\right)X_1+0.0031X_2\\\;\;\;\;\;\;+\left(-0.0284\right)X_3+\cdots+\left(-0.0003\right)X_{1230}\\\;\;\;\;\;\;+\left(-9.2609\right)X_{1231}\end{array}$$ | (1) |

Where *Y* is the ginsenoside concentration in mg/g and *X*_{1}... *X*_{1231} represent the saponin spectral reflectance values at 1231 wavelength points.

Table 4 presents the statistics of the prediction values for the ginseng powder samples in the prediction set obtained with the PLS prediction model of the total saponin concentration and their real values.

Table 4. Statistics of the real values and predicted values of the total saponin concentration of the prediction set samples

Number | Real value % | Prediction value % | Residuals | Relative error % |

1 | 60.524 | 60.638 | -0.114 | 0.19 |

2 | 60.058 | 60.665 | -0.607 | -1.01 |

3 | 59.937 | 59.995 | -0.058 | 0.10 |

4 | 60.670 | 59.558 | 1.112 | 1.83 |

5 | 63.163 | 60.443 | 2.720 | 4.31 |

6 | 60.100 | 59.367 | 0.733 | 1.22 |

7 | 60.435 | 60.941 | -0.506 | 0.84 |

8 | 63.000 | 59.757 | 3.243 | 5.15 |

9 | 60.894 | 59.165 | 1.729 | 2.84 |

10 | 61.020 | 61.268 | -0.248 | 0.41 |

11 | 60.997 | 60.707 | 0.290 | 0.48 |

12 | 59.851 | 61.975 | -2.124 | 3.55 |

According to the statistical data, the residual distribution graph of the predicted values and real values of the 12 ginseng samples was plotted and is shown in Figure 8.

From Table 4 and Figure 8, the residual values of the majority of the samples are within ±2 with a corresponding error of less than 3%. By comparison, in agricultural product-sorting, any product with a relative error less than 5% meets the actual production requirements. Each point representing a residual value is randomly distributed above and below the horizontal line on which the residual is equal to 0, indicating that the ginseng shortwave imaging spectral detection experiment exhibits a relatively good statistical fit, thereby meeting practical detection requirements. Therefore, this model satisfies real production requirements and can be used as a grading reference for the total saponin concentration of ginseng powder. Taking into account the previous analysis and evaluation of the determination coefficient, RMSEC value, residual distribution, relative error, and other detection indicators, it is found that the predicted values of the prediction set samples demonstrate a high correlation with the real values and the two sets of values are similar, meaning that the prediction model exhibits good prediction performance for reliably predicting ginsenoside concentration.

## Conclusion

(1)In this paper, we used a shortwave infrared imaging spectrometer combined with ENVI spectral acquisition software and Unscrambler 9.7 stoichiometry software to perform on-line nondestructive testing of ginseng quality. We used the imaging and spectral detection technology to obtain ginseng map information. The spectral information was extracted to form the spectral reflectance information map that could be examined, and then the obtained spectral information was preprocessed and quantitatively analyzed for the purposes of prediction modeling.

(2)After a series of preprocessing methods was performed on the various spectral data, the influence of the data that had been preprocessed differently on the establishment of the prediction model showed that the optimal preprocessing method for studying ginsenoside concentration is the combination of SNV transformation and second-order differential method (SD + SNV), and the performance of the prediction model established in this manner demonstrated improved results.

(3)The characteristic wave band selected in this experiment was in the range of 900 nm -2500 nm, which featured significant detection characteristics. Therefore, a stoichiometric method was used based on this particular wave band. The PLS method was used to perform the model analysis of ginseng quality. With regards to the total saponin concentration of the ginseng, the coefficient of determination of the calibration set was *R*^{2} = 0.9715 mg/g, with RMSEC = 0.2017 mg/g, RMSEP = 1.9260, and deviation value bias = 0.0586 mg/g.

(4)The experiment proved that the method of using shortwave imaging spectral technology to conduct real-time nondestructive detection of ginseng quality is practically feasible and meets ginsenoside detection modeling requirements. Furthermore, the imaging function of the shortwave imaging spectral technology also represents a major advantage in the detection of ginseng quality, allowing spectral detection data to be rapidly recorded in an image format and facilitating the subsequent detection of other ginseng quality indicators in the drying process. Far-infrared drying is seen to increase the yield and quality of red ginseng.

## Conflicts of interest

The authors have no conflicting financial or other interests.