Predicting Marian Plum Fruit Quality without Environmental Condition Impact by Handheld Visible–Near-Infrared Spectroscopy

Handheld near-infrared spectroscopy was used to study the effect of integration time and wavelength selection on predicting marian plum quality including soluble solids content (SSC), the potential of hydrogen ion (pH), and titratable acidity (TA). For measurements representing actual conditions, the on-tree fruits were scanned under in-field conditions. The assumption was that the robust model might be achieved when the models were developed under actual conditions. The results of the main effect test show that the integration time did not statistically affect SSC, pH, and TA predictions (p-value > 0.05) and the wavelength range had a significant impact on prediction (p-value < 0.01). An integration time of 30 ms coupled with a wavelength range of 670–1000 nm was the optimal conditions for the SSC prediction, while an integration time of 20 ms with 670–1000 nm wavelength was optimal for pH and TA prediction because of the lowest root-mean-square error of cross-validation (RMSECV). The optimal models for SSC, pH, and TA could be improved using spectral pre-processing of multiplicative scatter correction. The effective models for SSC, pH, and TA improved and reported the coefficients of determination (r2) and root-mean-square errors of prediction (RMSEP) of 0.66 and 0.86 °Brix; 0.79 and 0.15; and 0.71 and 1.91%, respectively. The SSC, pH, and TA models could be applied for quality assurance. These models benefit the orchardist for on-tree measurement before harvesting.


INTRODUCTION
The marian plum (Bouea macrophylla) is a fruiting tree in Southeast Asia that is mostly planted in Thailand, Indonesia, and Malaysia. 1,2 In Thailand, the plantation area and yield are reported to be 2787.36 ha and 14,162 tons/year, 3 respectively. The marian plum has a sweet taste coupled with some acidic flavor 1 and has become particularly popular with consumers as a result. Therefore, soluble solids content (SSC), the potential of hydrogen ions (pH), and titratable acidity (TA) are important parameters for indexing the quality of marian plum fruit. The quality indices of the fruit should be checked rapidly to confirm the quality before harvesting. 4 Marian plum farmers use their experience to estimate fruit quality for trading. They count the age of fruit after flower blooming to determine the harvest time. Nevertheless, this method is not reliable and cannot guarantee the quality of the fruit because marian plum is a climacteric fruit, with its properties rapidly changing during the fruit ripening process and fruit development. 2 Moreover, there are many factors concerning the maturity of the fruit, such as climate variation, air temperature, soil moisture, soil fertilizer, air relative humidity, and so on. 2 Even the same age of fruit does not guarantee a similar quality. The SSC, pH, and TA can be determined by traditional methods, i.e., refractometry, pH meters, and titration acidity, respectively. However, these methods are destructive, analytically challenging, time-consuming, and costly. 5 For these reasons, low-cost, rapid, and non-destructive techniques are in constant demand for quality assurance testing. 6,7 For convenience, handheld near-infrared (NIR) spectrometers are recommended due to their small size, low cost, and easy movement during operation, especially in the field. 8 Recently, portable NIR spectrometers have been applied for predicting the inner quality of agricultural products, including avocado, 9 Asian pears, 8 on-tree avocado fruit, 10 sugarcane stalks, 11 "Marsh" grapefruit, 12 pears, 13 cherries, 14 and "Sai Num Pung" tangerine fruit. 15 It has been shown that NIR spectrometers are beneficial tools in quality assessment that can provide a high-performance prediction. High accuracy can be obtained if the instrument setup is optimal, including integration time, optimal wavelength range, suitable spectral pre-processing, and spectral acquisition method. However, infield spectrometers have a rather low accuracy because many factors influence their analysis, i.e., surrounding temperature varies throughout the day. Hence, the improvement of the performance of handheld spectrometers reducing environmental condition impact is essential. From the literature review, the effect of sample temperature on model performance has been studied by many researchers, i.e., the sugar content of molasses, 16°B rix content of intact fruit, 17 dry rubber content of latex, 18 and the dry matter of tomato fruit. 19 They summarized that temperatures had significance to model accuracy. Many researchers have studied the effect of measurement location, such as in onions and garlic 20 and sugarcane stalks. 21,22 They recommended that the effect of measurement position should be investigated if the fruit was non-uniform.
Presently, no other reports applying NIR spectroscopy in the estimation of in-field marian plum quality were found during our literature review. The TSS and TA model of marian plum was developed using near-infrared transmittance, 23 and the samples were scanned in a controlled air-conditioned room (at 25°C). A calibration model for TSS and TA provided R values of 0.77 and 0.77 and RMSEP (root-mean-square error of prediction) values of 0.65°Brix and 0.03%, respectively. However, many studies have been carried out for mangoes, which are similar to the marian plum. 1 A literature review of NIR used for mango quality measurements reported that Nagle et al. 22 used portable NIR spectroscopy with a wavelength range of 700−1140 nm for determining the SSC and TA of "Chok Anan" mango, where the coefficients of determination (R 2 ) were 0.49 for SSC and 0.53 for TA. Jha et al. 24 utilized NIR spectroscopy across a wavelength range of 1200−2200 nm for estimating the SSC and pH of mango. It was found that partial-least-squares (PLS) regression gave the best results with correlation coefficients (r) for calibration and validation of 0.78 and 0.76 for SSC and 0.715 and 0.703 for pH, respectively. Marques et al. 4 used a handheld NIR spectrometer (960−1700 nm) to measure the SSC and TA of "Tommy Atkins" mango. A good prediction was found with the SSC model with an R 2 of 0.88−0.92, while the prediction of TA was not recommended because R 2 equaled 0.50. de Santos Neto et al. 25 applied a portable visible (VIS)−NIR spectrometer (699−999 nm) to evaluate the SSC of "Palmer" mango, with an R 2 of 0.87. Rungpichayapichet et al. 26 predicted the SSC and TA of mangoes (cv. Nam Dokmai subcv. Si Thong) using a shortwave (SW) NIR spectrometer (700−1000 nm), with R 2 values of 0.90 and 0.74 for SSC and TA, respectively. As in the above literature review, NIR spectroscopy can be used to reliably estimate the SSC. However, non-destructive tests for evaluating the TA and pH of fruit have rather low accuracy. The evaluation of their inner quality is also difficult due to non-uniformity. 27 Thus, the model development procedure cannot be fixed, and the integration time, the selection of optimal pre-processing, a suitable combination of wavelength range, and a proper measurement technique are keys to the analysis process. 28 Moreover, the factor that may affect the NIR prediction ability of fruit is integration time. It is the time during which both the light source and the voltage signal are turned on. Longer integration time means higher light intensity and shorter integration time provides low light intensity, which may affect the spectra of fruit samples if there are different path lengths in the scanned matter. The path length of fruit can change when the fruit and seed size varies; thus, integration time may be the factor that affects the accuracy of quality assessment. Comparing integration times of 200, 300, and 400 ms scanned using a visible−NIR spectrometer, Sanseechan et al. 11 found that 200 ms was optimal for the prediction of sugarcane stalk density and Phuphaphud et al. 29 reported that the fiber content model that provided the best accuracy was developed using an integration time of 300 ms.
Therefore, applying handheld visible (VIS)−NIR spectrometers for estimating the SSC, TA, and pH of fruits with temperature variation was investigated. The samples were scanned in a real-world situation (in-field scanning) to make a model without the influence of sample temperature, which may maintain model accuracy where sample temperature is changed. The sub-objectives were to (a) study the effect of integration times and wavelength range on the model accuracy, (b) find out the optimal integration times and wavelength range selection throughout the sample temperature variation, and (c) develop a model using spectral pre-processing. Finally, the utilization of optimal different integration times, wavelength ranges, and various spectral pre-processing methods of SSC, pH, and TA was extracted and confirmed based on sample temperatures of real situations.

RESULTS AND DISCUSSION
2.1. Spectral Data. The raw spectra and first derivative in a range of 570−1031 nm compared between integration times of 10, 20, and 30 ms are illustrated in Figure 1a,b, respectively, with the prominent peaks having a high variance of absorbance at 670 and 970 nm. Figure 1c shows the second derivative spectra compared between integration times of 10, 20, and 30 ms; the obvious peaks were at 581, 670, 740, 840, and 970 nm.
The absorption values of raw spectra for the integration times of 10, 20, and 30 ms were slightly different; this might be due to the fact that the peel and pulp of marian plum were soft, containing a high moisture content, and the intensity of light was highly absorbed by the object when using integration times of 10, 20, and 30 ms. Comparing the results of sugarcane stalk spectra scanning using integration times of 200, 300, and 400 ms, 11,29 the spectra from absorption were highly different, considering that the cane stalk peel was hard.
The raw spectra (interactance mode) trend of our study was similar to the absorption spectra measured by the transmittance mode of marian plum studied by Phonmakham et al. 23 The average raw spectral curve of marian plum fruit was different from those of mango 22,25 and apple, 30 even though marian plum had similar features to mango and apple but different colors and tissue constituents, which caused different absorption characteristics. 31 The obvious peaks were found at wavelengths of 581 nm, 670 nm (absorption of chlorophyll 32,33 ), 740 nm (C−H stretching, the fourth overtone of CH 3 ), 840 nm (3 × C−H str. + 2 × C−C stretching of benzene), and 970 nm (O−H stretching, the second overtone of H 2 O). 34 These prominent peaks had the greatest change when the composition of the samples differed, giving a high variance in the absorbance value. The peak information was beneficial for modeling. Hence, there is convincing evidence that the spectral information could determine the constituents of the fruit.
2.2. Reference Data. Table 1 delineates the temperature sample (°C) and statistical data of the marian plum fruit, including the SSC, pH, and TA used for model development as well as the maximum, minimum, average, range of values, and standard deviation. The temperature of samples had a range of 29.4−36°C, where the samples were scanned from 8:30 AM to 4:00 PM. The temperature is a real situation in the field, which shows that in the same field sample, the temperature can vary. The marian plum samples had the widest range for TA (16.45%) and a medium range for SSC (7.9°Brix), and the narrowest range was pH (1.67). The TSS reported by Phonmakham et al., 23 provided a minimum and maximum of 17.60 and 25.40°Brix, respectively.
Thus, a sweet and acidic (sour) taste was the major indicator of marian plum, 1 depending on growth location, chemical fertilizers, storage, and so on. 35 The relationship among the SSC, pH, and TA is displayed in To define the optimal integration time and wavelength range, the effects of these factors were tested. Table 3 demonstrates the results of the PLS model developed from raw spectra and validated by full cross-validation, showing the wavelength range, integration time, number of PLS factors, R 2 , r 2 , RMSEC (root-mean-square error of calibration), RMSECV (root-mean-square error of cross-validation), and bias. The main effects of integration time and wavelength range tested by one-way analysis of variance (ANOVA) are shown in Table 4. The result shows that different integration times did not statistically affect the accuracy (mean of RMSECV) of SSC, pH, and TA prediction (p-value > 0.05). In contrast, the wavelength range factor alone also does not affect the accuracy of SSC prediction (p-value > 0.05). The different wavelength ranges strongly affected the accuracy of the pH and TA models (p-value < 0.01).
The wavelength range had more influence than integration times. The correct wavelength can directly correspond to the vibrational band of chemical content in analytes, and that is the spectral information used for model development. Even though differences may visually seem small among integration times of 10, 20, and 30 ms (see Figure 1), the difference of performance was usually detectable by multivariate analysis techniques.
Even though integration time did not affect the accuracy of SSC, pH, and TA, there were different RMSECV trends found with different integration times and wavelength ranges (see Figure 2). The integration time level coupled with the wavelength range providing the lowest RMSECV should be selected to use as the scanning setting and model creation. The comparison of average RMSECV tested by the main effect of integration time and wavelength range is shown in Figure 2. For SSC prediction, the lowest RMSECV was with scanning by a wavelength range of 670−1000 nm and an integration time of 30 ms (see Figure 2a,b). For the pH and TA models, the lowest RMSECV was with a wavelength range of 570−1031 nm compared to the three remaining wavelength ranges, and   Higher integration time can give a higher accuracy for SSC prediction, comparing 10, 20, and 30 ms integration times. For pH and TA prediction, the model developed using an integration time of 20 ms has the same accuracy as that of 30 ms. Hence, the optimal integration time was 20 ms because the time was shorter. Since the main components in fruit are water and starch, the starch can be converted to glucose, fructose, and sucrose, which also have a hydrocarbon structure that strongly interacts with NIR radiation. Higher integration times can balance sufficiently between light intensity and a chemical component, as mentioned above. This leads to integration times of 20 and 30 ms being accurate for SSC and pH and TA compared to 10 ms.
These results are close to those of Luo et al., 36 who found that wavelengths between 861 and 1074 nm provided the most robust model for predicting the SSC of apples. The optimal model for the prediction of mango was built utilizing a wavelength range of 700−1140 nm. 22 A wavelength range of a TA = titratable acidity, N = number of samples, F = number of PLS factor, R 2 = coefficient of determination of the calibration set, RMSEC = rootmean-square error of calibration, r 2 = coefficient of determination of the validation set, RMSEP = root-mean-square error of prediction, RPD = ratio of prediction to standard deviation, and RSEP = relative standard error of prediction.  38 where it was found that wavebands at 461, 469, 947, and 1049 nm were the effective wavelengths for predicting sugar contents of "Fuji" apples; these wavelengths were related to the visible region, indicating the color of the sample. 39 The range between 500 and 750 nm corresponded to green (490− 570 nm), yellow (570−585 nm), orange (585−620), and red (620−740 nm). 25 In our results, the visible and NIR band (670−1000 nm) was selected for the model development for the SSC parameter, which means that the color of its peel might be related to the SSC of the fruit, while the vibrational band of water was strongly utilized, meaning that the vibrational band of water was confidently related to the SSC prediction. 22,40,41 Figure 3a demonstrates the histogram plot of marian plum temperature as the real field condition. Figure 3b−d shows the comparison of the SSC, pH, and TA of marian plum fruit predicted using a portable VIS−NIR spectrometer and measured by the reference laboratory of the validation set. The PLS models were created using the optimal wavelength range and integration time that were validated using full crossvalidation using raw spectra.
The scatter plot is marked with different temperature sample ranges to illustrate the impact of the sample temperature. The sample temperature range was divided into two groups: 29.4− 32.7°C was a low sample temperature (blue marks) and 32.7− 36°C was a high sample temperature (red marks). The temperature sample for measurements realistic for actual conditions was from 29.4 to 36°C (see Table 1 and Figure  3a). The scatter plot shows that different sample temperature groups gave similar accuracy. The bias of the two groups had a small difference (see Figure 3b−d). To develop a robust model, the NIR model could use the spectra obtained from various sample temperatures as a real situation to reduce the impact of the surrounding environment, therefore where the bias of two sample temperature groups was exceedingly small, and the model does not need to use a bias value as an adjustable value in prediction. In addition, the scatter plot of all samples was dispersed throughout, meaning that the model development from actual conditions reduced the effect of sample temperature and increased confidence in the predicted value.
However, the integration time and wavelength range providing the lowest RMSECV were selected for model development conditions. The optimal integration time of 30 ms was suitable for SSC equation creation, and 20 ms was optimal for pH and TA. The wavelength range between 670 and 1000 nm was suitable for SSC prediction, while 570−1031 nm should be used for pH and TA prediction. For the next ACS Omega http://pubs.acs.org/journal/acsodf Article improvement, the SSC, pH, and TAC model was improved using spectral pre-processing techniques. 2.4. Prediction for SSC, pH, and TA. Table 5 represents the results of the improved PLS model with spectral preprocessing for SSC, pH, and TA, including the spectral preprocessing method, number of PLS factors, coefficient of determination of the calibration set (R 2 ), coefficient of determination of the validation set (r 2 ), RMSEC, RMSEP, and bias. There was no outlier in the spectra scanned with 20 ms. However, three spectral outliers were found for scanning for 30 ms, and then they were removed. The spectral preprocessing of the MSC (multiplicative scatter correction) gave the best model in SSC prediction with the PLS factor, R 2 , RMSEC, r 2 , RMSEP, and bias of 7, 0.68, 0.75 0.66, 0.86, and −0.12, respectively. For the pH model, the best model was developed using the spectral pre-processing method of the MSC, with the PLS factor, R 2 , RMSEC, r 2 , RMSEP, and bias being 5, 0.58, 0.20 0.79, 0.15, and −0.0087, respectively. Meanwhile, the spectral pre-processing method of the MSC also provided the best model in the prediction of TA with PLS factor, R 2 , RMSEC, r 2 , RMSEP, and bias equaling 6, 0.68, 1.58%, 0.71, 1.91%, and −0.31%, respectively. Comparing the results with marian plum fruit studied by Phonmakham et al., 23 a calibration model for TSS using raw spectra obtained the best results for calibration and prediction (R = 0.90, RMSEC = 0.57°B x and R = 0.88, RMSEP = 0.65°Bx, respectively). For the TA model, the raw spectra obtained best results for calibration and prediction (R = 0.98, RMSEC = 0.01% and R = 0.88, RMSEP = 0.03%, respectively). The ability of the validation model was indicated by r 2 . An r 2 between 0.66 and 0.81

ACS Omega
http://pubs.acs.org/journal/acsodf Article indicated that the model could be utilized for screening, and when between 0.50 and 0.64, it could be used for rough screening. 12,27,42,43 On this basis, the SSC, pH, and TA model could be used for screening as quality assurance, Nevertheless, the absolute ratios between the bias and average measured value of the SSC, pH, and TA were 0.75% (| − 0.1232/16.39|), 0.22% (| − 0.00870/3.87|), and 4.24% (| − 0.3104/7.32|), respectively. Therefore, the error of the overall sample was acceptable, and the model could be utilized for predicting future samples. Comparing the reliability of the SSC models developed with different wavelength ranges (between 1000 and 2500 nm), the prediction accuracy obtained in this study was better than that reported by Jha et al., 24 who used portable shortwave infrared in a wavelength range of 1200−2200 nm with the diffuse reflectance mode to predict the SSC of mango, giving an r 2 of 0.58 and standard error of prediction (SEP) of 3.228°Brix for the validation set. The accuracy of the present study was also better than that by Nascimento et al., 44 who developed PLS models by FT-NIR spectroscopy for the estimation of SSC in intact low-chilling "Aurora-1" peach fruit. The model was developed using a wavelength range between 1000 and 2500 nm. The best SSC model was obtained with the NIR spectra pre-processed with MSC (SEP = 1.02%, R 2 P = 0.45, and RPD (ratio of prediction to standard deviation) = 1.36). On the other hand, better reliability of the SSC model was reported by Marques et al., 4 who evaluated the potential of a handheld near-infrared (NIR) spectrometer with a wavelength between 950 and 1650 nm, based on predicting SSC of the "Tommy Atkins" mango. The r 2 and RMSEP were, respectively, 0.88 and 0.66°Brix.
Compared to other works, the reports available for mangoes, which are of the same family as the marian plum, 1 our model gave better accuracy when compared to that of Nagle et al., 22 who predicted the SSC and TA of mango (r 2 = 0.49 for SSC and r 2 = 0.53 for TA). The SSC and TA models of "Tommy Atkins" mangoes using a portable NIR spectrometer reported by Marques et al. 4 provided the R 2 of SSC from 0.88 to 0.92, which agrees with our outcome, while the TA model provided an R 2 of 0.5, which was less than our result (R 2 equals 0.68). The results were different due to different crops (mango versus marian plum) and different wavelength ranges. Comparing the SSC model with previous studies, an RMSEP of 0.58°Brix and R 2 of 0.76 were less than an RMSECV of 1.39% and R 2 of 0.87 reported by de Santos Neto et al., 25 who studied the SSC of "Palmer" mangoes via a portable VIS−NIR spectrometer with a wavelength range of 306−1140 nm. The R 2 of the SSC in the "Palmer" mango model was higher than that of the marian plum because the range of the SSC in mangoes (21.0−3.8) was wider than that of the marian plum (20.3−12.4). Our results gave better accuracy than for the mangoes studied by Huang et al., 45 which had R 2 values of 0.64 and 0.67 for the SSC and TA, respectively. The outcome was different because Huang et al. 45 used longwave NIR spectroscopy (950−1650 nm). The SSC and pH prediction model of apple reported by Abasi et al. 30 provided R 2 values for the SSC and pH of 0.86 and 0.84, respectively. They summarized that the noise removal performance depended on the accuracy of the NIR model. The difference in PLS results as reviewed was the fact that the result depended on the kind of fruit, the NIR region, the spectral pre-processing, and the model development procedure.

ACS Omega
http://pubs.acs.org/journal/acsodf Article As a model improvement, the pre-treatment method can improve model performance, from an r 2 of 0.60−0.66 for SSC; 0.60−0.79 for pH; and 0.59−0.71 for TA. The result was that the MSC spectra gave the best result for SSC, pH, and TA, providing the highest r 2 and lowest RMSEP. The authors discussed that the spectral pre-treatment of MSC was preferred because it could solve the problem of path length variation in marian plum fruit. This problem was from the natural fruit because of different fruit and seed sizes.
Figure 4a−f shows the scatter plots of the improved SSC, pH, and TA models comparing the predicted value by NIR spectroscopy and measured with the reference laboratory method of the calibration and validation sets, respectively. Figure 5 exhibits the regression coefficient value multiplied with the averaged MSC spectra plot and X-loading plot of the SSC (Figure 5a,b), pH (Figure 5b,c), and TA models ( Figure  5d,e) obtained from PLS regression. The main goal of the calibration model developed by PLS regression was to find the optimal regression coefficient to give the highest accuracy. Absolute high values multiplied between the regression coefficient value and the absorbance value at any wavelength indicated that the vibration of the chemical bond related to that waveband had a highly significant influence on the calibration model.
The prominent peaks appearing in the regression coefficient value multiplied with the pre-treated spectra of the SSC model created with the MSC were around 873, 920, 947, and 970 nm (see Figure 5a). In comparison to other works, Nagle et al. 22 reported that the sensitive spectral region for the SSC prediction of mango was ∼910 nm in the regression coefficient. This outcome agreed with the results of Kawano et al., 40 who found that the highest correlation in the linear regression, including at 870, 878, 889, and 906 nm, was important in the estimation of sugar content in intact peaches. Meanwhile, several significant wavebands appeared in the regression coefficient value multiplied with the pre-treated spectra plot of the pH, including around 600, 618, 730, 816, 873, 940, and 960 (see Figure 5c). An important wavelength for predicting TA is shown in Figure 5d, with wavebands around 601, 618, 730, 814, 880, 932, and 960 nm. From the literature review, the sensitive spectral region that was between 920 and 925 nm was used for predicting the TA of mango. 22 The absorption bands with high regression coefficient value multiplied with the MSC spectra of the model for SSC, pH and TA prediction are shown in Table 6. The vibrational band of the C−H stretching of sucrose and O−H stretching of water affected the prediction of SSC, while the waveband of the visible (green and red), C−H stretching of carbohydrates and hydrocarbon and O−H stretching of water affected the prediction of pH and TA.
In the case of an industrial setting, the low cost of the spectrometer was considered. After model development, the optimal wavelength range, integration times, and spectral pretreatment method were obtained, i.e., 670−1000 nm, 20 ms,

CONCLUSIONS
This study has illustrated that for a one-way ANOVA test, the integration times do not influence the in-field measurement capability, while wavelength ranges were significant. However, if considering any integration time coupled with a wavelength range that provided the lowest RMSECV, it is recommended that the SSC model could be scanned using an integration time of 30 ms and the pH and TA parameters could be scanned using an integration time of 20 ms. The wavelengths of 670− 1000 nm were suggested for SSC, and wavelengths of 570− 1031 nm were suggested for pH and TA prediction. In addition, the 10 ms integration time is strongly not recommended for SSC, pH, and TA. The optimal conditions for modeling SSC, pH, and TA can improve the prediction model using the pre-processed spectra of MSC. The experiment was done to reduce the variation of sample temperature occurring, wherein the surrounding temperature varies greatly throughout the day. The model was created by combining fruit samples scanned throughout the day. The result of the experiment was that the bias value of the validation sample was exceedingly small compared to those between low and high sample temperatures. For an industrial setting, it was recommended that the prediction of SSC could reduce the spectral resolution from 1 to 40 nm per point, while the pH and TA model could reduce the spectral solution from 1 to 50 nm per point.

MATERIALS AND METHODS
4.1. Sample Collection. This experiment was conducted during the 2018/2019 season. A total of 120 marian plum fruits of the "Tool Kraw" variety was collected from the commercial orchard located in Nakhon Nayok Province (14°0 9′ 33.1″ N, 101°10′ 30.1″ E) in eastern Thailand. To obtain a large variety of chemical properties, the fruit samples were collected randomly with different maturity stages of the plant (between 60 and 75 days after flower blooming, which is the commercial harvesting period). Figure 7a shows the individual marian plums.
4.2. Spectral Acquisition. The portable VIS−SWNIR spectrometer was used, which was a custom-designed spectrometer. The specifications were established as a power supply with an internal battery or AC100−220 V via an AC adapter for about 1500 measurements/charge. The size was 350 (W) × 150 (D) × 250 (H) mm. The environment had a temperature of 10−40°C and humidity of 20−80%. The external communication was by exchanging data with a PC/ tablet (Windows 7, 8, 8.1, and 10) through a USB port. The measurement time was 10−500 ms. The light source was a high-density LED lamp with wavelengths of 570−1031 nm, and the measurement mode was interactance, with a diode array (P-TF1, HNK Engineering Co., Ltd., Hokkaido, Japan).
The samples were scanned in interactance mode, a wavelength range of 570−1031 nm, and integration times of 10, 20 and 30 ms. The spectral data were recorded with interval data of 1 point/nm. Every fruit was scanned for 10, 20, and 30 ms in the same position (middle of fruit).  To make the temperature range for measurements realistic for actual conditions, the fruits were scanned under in-field conditions to obtain the temperature variation. Before scanning each sample, the sample temperature was recorded using an infrared digital thermometer (GM 320, Benetech, China), and then white and dark references were made. A Teflon material was scanned for white reference. The dark reference was recorded when the light source was turned off. The absorbance can be calculated as the log 1/R unit, where where R sample is the reflection of the fruit sample, R white is the reflection of a white reference (Teflon), and R dark is the reflection of a dark reference recorded when the light source was turned off. Figure 7b illustrates the fruit spectral acquisition, measurement position, and schematic of the portable VIS−NIR spectrometer, including device features and a close-up view of the measurement probe. The fruit samples were scanned at the equatorial part at four sites perpendicular to each other, with the first position randomly selected. Each position was scanned once. For the whole fruit, four spectra were averaged into one spectrum and stored for modeling. 4.3. Reference Analysis. After NIR scanning, the fruit at the NIR scanned part was cut around the circumference, with a width of ∼20 mm and a depth of ∼10 mm, and immediately squeezed for its juice (see Figure 7b). The juice was divided into three parts for determining the SSC, pH, and TA. The SSC values were determined by dropping the juice into the test hole of a portable refractometer (Nar−3Ta, ATAGO, Tokyo, Japan). The pH was determined using a pH meter (HI98107, Hanna, Romania). The TA was determined by volumetric titration of 5 mL of marian plum juice, diluted in 100 mL of distilled water and titrated up to pH 8.1 with 0.1 mol/L NaOH expressed in g of 100 mL/L citric acid. The SSC and pH were analyzed in triplicate, while TA was determined in duplicate. The repeatability (Rep) and maximum coefficient of determination (R 2 max ) as well as the outlier of the measured reference value were calculated as where d i , d̅ , and SD y are the differences between triplicates (maximum − minimum) or duplicates, the average difference among triplicates or between duplicates, and the standard deviation of the measured value, respectively. Rep and R 2 max were used to pre-check the possibility of NIR spectroscopy before the model development.  (4) where y i is the reference value and y ̅ is the average value. If the equation was satisfied, the sample would be removed from the data set.

Effect of Integration Time and Wavelength
Range. The main effect on the spectra scanned with a data interval of 1.0 point/nm across integration times of 10, 20, and 30 ms and wavelength ranges of 570−1031, 670−1000, 700− 1000, and 800−1000 nm was examined. Then, there were 12 treatments in model creation (three different integration time levels × four different wavelength ranges). The SSC, pH, and TA models were established using PLS (partial-least-squares) regression coupled with raw spectra and validated using leaveone-out full cross-validation. After the modeling, the accuracy in each treatment was extracted using the statistical term of the root-mean-square error of cross-validation (RMSECV). The main effects of integration time and wavelength range were analyzed using analysis of variance (ANOVA), with significance levels of 0.05 and 0.01. The optimal integration time and wavelength range based on the index of RMSECV were selected by considering both of the integration time and wavelength range.
4.5. NIR Modeling. After the collection of effective scanning conditions, one integration time level and one wavelength range level were obtained. The best condition of each analyte was selected based on their model that provided the lowest RMSECV. The best one of integration time and that ACS Omega http://pubs.acs.org/journal/acsodf Article of the wavelength range level of SSC, pH, and TA models were used to be optimized again using the pre-processed spectra across PLS regression and validated using the test set method. This process was to select an effective spectral pre-treatment method. The spectral pre-processing was performed using multivariate software (Unscrambler X 10.3, Camo, Norway) to reduce the signal noise. This is an important process because the noise removal ability in the NIR model leads to accurate prediction. 30 The pre-processing of the spectra was performed utilizing the following methods: (a) the first derivative (Savitzky−Golay, segments = 9, orders = 2), (b) the second derivative (Savitzky−Golay, segments = 9, orders = 2), (c) the standard normal variate (SNV), and (d) the multiplicative scatter correction (MSC). The total data set was divided into a calibration set and a validation set (75% for the calibration set and 25% for the validation set). The reference values were rearranged in ascending order. One in every four sample fruit was selected for the validation set, and the remaining three data sets were used for the calibration set, where the maximum values and minimum values were assigned to the calibration set. Many spectral pre-processing steps were required to solve the scattering problem. For example, the SNV and MSC were able to reduce the multiplicative scattering effects of the sample surfaces resulting from the differences in particle size, the shiny samples, and different path length samples. 46 The first and second derivatives were able to solve the baseline offset problem and to separate the overlapping peaks. 46 After the modeling, the accuracy in each condition was compared using the statistical term of the coefficient of determination (R 2 : calibration set; r 2 : validation set), the rootmean-square error of calibration (RMSEC), the root-meansquare error of prediction (RMSEP), and bias. The effective model was selected based on the highest r 2 and the lowest RMSEP. These parameters can be calculated as follows: where y i , y pre , y ̅ , n, and SD y were the measured value of sample i, the corresponding predicted value, the average measured value, the number of samples, and the standard deviation of the measured value, respectively. The bias was the mean of the difference between the measured and predicted values in the validation set. Considering the application of NIR spectroscopy, the r 2 value was used to consider the application level. An r 2 of 0.00−0.49 indicates a poor correlation and the model was not recommended, 0.50−0.64 for rough screening, 0.66−0.81 for screening, 0.83−0.90 with caution for most applications, 0.92−0.96 for quality assurance, and >0.98 for any application. 12,27,42,43 After developing the effective model including wavelength range, integration times, and spectral pre-treatment method, each parameter was obtained for industrial settings to test how much can the spectral resolution of the spectrometer be reduced and still establish a validation model for fruit quality prediction. The resolution of VIS−NIR spectra (wavelength interval) was leveled into 2, 3, 4, 5, 10, 20, 30, 40, and 50 nm per point. The models were developed again using the spectra with different interval data. Then, their performances were compared to the original setting (1 nm per point interval). The high wavelength interval data that provided high accuracy were selected as the industrial settings for low-cost spectrometers.