Artificial Intelligence-Based, Wavelet-Aided Prediction of Long-Term Outdoor Performance of Perovskite Solar Cells

The commercial development of perovskite solar cells (PSCs) has been significantly delayed by the constraint of performing time-consuming degradation studies under real outdoor conditions. These are necessary steps to determine the device lifetime, an area where PSCs traditionally suffer. In this work, we demonstrate that the outdoor degradation behavior of PSCs can be predicted by employing accelerated indoor stability analyses. The prediction was possible using a swift and accurate pipeline of machine learning algorithms and mathematical decompositions. By training the algorithms with different indoor stability data sets, we can determine the most relevant stress factors, thereby shedding light on the outdoor degradation pathways. Our methodology is not specific to PSCs and can be extended to other PV technologies where degradation and its mechanisms are crucial elements of their widespread adoption.

S ingle-junction perovskite-based solar cells (PSCs) have demonstrated certified power conversion efficiencies (PCEs) above 26%. 1 With PCEs on par with those of well-established commercial PV technologies, the research focus is now aimed at improving PSCs' operational stability, which is approximately 20 years for commercial silicon PV.Although a consensus on accelerated stability tests including various stress factors has been published, 2 a strategy that permits the prediction of the PSCs' outdoor performance and lifetime from accelerated indoor aging tests is currently missing.Operational conditions outdoors include diurnal light/dark cycling and varying temperatures, illumination intensity, and spectrum during sunlight hours.While the PSC performance dependence on each of these factors was determined, it is highly sensitive to combinations of them, 3 making predictions of outdoor daily energy yield rather complex. 4−7 Several attempts to predict PSC lifetimes based on indoor tests have been published, such as testing under constant illumination, 8 a combination of continued 1 sun illumination (ISOS-L1 protocol) tests and dark storage (ISOS-D protocol) tests, 9 or damp heat testing at 85 °C and 85% relative humidity. 10These works predicted lifetimes of 5−20 years for PSCs of various architectures, demonstrating the need for accelerated testing.A recent publication described good correlation between modeling based on PSC indoor tests of light intensity and temperature-dependent performance dynamics and its long-term, nonreversible outdoor degradation. 11Several studies simulated outdoor conditions in lab tests. 12,13However, direct predictions of the detailed time-and climate conditions-dependent outdoor PSC performance including both daily power output and nonreversible degradation, based on accelerated indoor tests, are currently lacking.
Machine learning (ML) is a specialized area within artificial intelligence that focuses on the development of algorithms that acquire knowledge and make predictions or decisions by discerning patterns and insights within the data used for their training.−23 Few published works utilized ML tools to study factors affecting PSC stability. 24,25,25Herein we apply multiple ML algorithms for correlating indoor and outdoor performance testing of PSCs, successfully predicting outdoor time-and weatherdependent PCE patterns based on indoor constant illumination (ISOS-L) tests.The crucial aspect of the pipeline lies in training the algorithm with a different combination of indoor tests.Thus, we provide a robust way of determining the relevant outdoor degradation factors from specialized indoor accelerated tests.
The experimental data are generated from devices similarly fabricated and aged indoors and outdoors in two different laboratories: BGU and ICN2.PSCs were fabricated in the nip configuration of FTO/c-TiO 2 /m-TiO 2 /CsMAFAPb(IBr)3/ Spiro-OMeTAD/Au, and the solar cells were subjected to indoor (constant illumination, in Air or N 2 ) and outdoor photostability tests at maximum power point (MPP, encapsulated) conditions using an MPP tracker.The details of device fabrication and testing are provided in the Supporting Information.Photographs of the devices are shown in Figure S1.Further, herein we present the results of the best performing machine learning algorithms.The results of additional algorithms can be found in the Supporting Information.
The core of our method is the prediction of the outdoor behavior given the outdoor environmental conditions and a set of indoor accelerated degradation tests, at different light intensities and environments (Figure 1a).Each indoor test contains a level of relevance to the outdoor behavior, as well as a level of overlap with other indoor tests.This is not always trivial to determine, which hinders the analysis of real-world degradation through accelerated tests.Our pipeline attempts to account for both effects.Initially, the prediction algorithm was trained using a single type of indoor test as input.This was repeated for all indoor tests.The test that produced the lowest error, i.e., the difference between the predicted and actual outdoor performance tracks, bears the most relevance to the outdoor degradation mechanisms.As a second step the algorithm was trained using combinations of indoor tests as input.Training a machine learning algorithm with data that are strongly correlated leads to either inefficiency or at worse an accuracy decrease in the prediction.However, viable combinations of uncorrelated inputs can significantly enhance the quality of the predictions.To this end, we have implemented a Frechet distance metric 26 that determines whether two curves are correlated or not.This is achieved taking into account both the positions and the ordering of the curve points to account for stretching effects.The prediction errors and their relation to the errors of the previous step allowed us to determine dependencies between indoor tests.The entire pipeline is summarized in Figure 1a.This pipeline has two major outcomes.The first is a robust prediction of outdoor behavior based on indoor tests, combined with the actual environmental conditions of the area the panels were deployed.The second is the determination of the most relevant indoor tests for outdoor behavior prediction and their interdependence.
The measurements of maximum power output versus time (see Figure 1b for a representative curve) were performed on devices fabricated with six different annealing temperatures.Since annealing temperature has been shown to affect the final PCE and by extension the maximum power output, 27 altering this value provides a natural data set that allows for different behaviors.Four identical devices were fabricated per annealing temperature to ensure the reproducibility of results.The performance tracks (maximum power output vs time) of PSCs devices with the same fabrication process were averaged to decrease the noise to signal ratio as much as possible.This process was applied both to the outdoor and indoor curves.This allowed us to capture the average trend but entirely discards the intrinsic deviations that are present in the perovskite fabrication process.To mitigate this, a follow-up work will include uncertainty quantification of predictions as well as the predictions themselves.Six unique data points present a challenging start for any machine learning endeavor, while using all 24 would involve highly noisy measurements.To ensure that the algorithm was fairly tested despite the limited numbers of samples, we implemented a 6-fold validation strategy as outlined in Figure S2.Additionally, during the measurements of the outdoor conditions (irradiance and temperature), the sensor failed for some hours.To fill the data gaps, a K-nearest neighbor data imputation method was employed, along with a classification that determined the night and day cycle and set the irradiance values to zero during the night.Lastly, to extract more concentrated information from the data, we transformed the time series using the Daubechies 2 wavelet.The wavelet transformation was favored over the Fourier one as the time series in question show a degrading character which would not be present at the pure frequency domain.Short-time Fourier transform was also considered over wavelets, but the wavelet's flexibility of representation ultimately proved a crucial advantage that enhanced predictions significantly.Multiple wavelets were tested, and the final choice of the specific wavelet was done based on the k-fold test set prediction error.
At present, the best performing method was proven to be Kernel Ridge Regression (KRR). 28It combines the kernel trick and L2 regularization with a linear regression algorithm.The kernel trick is the process of projecting the data into a more informative data space by means of a kernel function.An additional term is added to the error expression, specifically, the euclidean norm of the model parameters, to counter overfitting.The parameters of the models were optimized using Bayesian optimization, which outperformed the next best hyperparameter tuning method by at least 20%.This optimization provides a very efficient way of determining the hyperparameter values as it learns from previous hyperparameters and makes a more educated guess after every iteration.This ensures both faster convergence and higher probability at finding a better optimum point.The loss function chosen was the Mean Square Error.This can be interpreted as the average squared distance of every true point from its predicted relevant value.The methods have been implemented in Python with the sklearn library. 29Multiple additional algorithms were tested, notably Gaussian Processes, 30 bidirectional Long Short-term Memory Networks, 31 and Transformers. 32The results can be seen in the SI in Figures S3−S5.After the Frechet distance is calculated, the results can be seen in Table 1.As a cut-off we have chosen 0.95 and will therefore not test the combinations of 1.4 sun and 1 sun as they are considered highly correlated and therefore redundant.
Since we expect a certain smoothness from the results, they have been denoised using Impulse Response (IIR) filter.The first half of the core results are shown in Figure 2, reflecting the relative relevance of the indoor tests in relation to the outdoor behavior.Specifically, encapsulated devices were tested outdoors, while nonencapsulated devices were tested in the indoor setup in air or N 2 atm.Therefore, it is natural to expect that the indoor tests performed in air have the least relevance to the outdoor tests.This is indeed verified with the air measurements (indoor test 3 in Figure 1a) generating predictions that are more than twice worse than the best ones.Further, the light intensity of 1 sun is expected to be the most relevant indoor tests.This is consistent with the results presented, with the 1 sun in nitrogen (indoor test 1) results outperforming the next best by 30%.As is evident, our pipeline provides not only qualitative evaluations but also quantitative ones, which allows for the precise determination of the stress factor importance.Using the conformity of the results to our expectations as proof of concept, we can now expand the algorithm to tests that are nontrivially correlated with the outdoor behavior in a future work.Compounding on this, if tests of combined different stress factors provide better accuracy than separate ones, then we can assume that the degradation paths are nontrivially intertwined, which will affect our rationalization of the mechanism.
The second half of the core results are shown in Figure 3a and provide a strong proof that the algorithm has learned the real device behavior.The algorithm reconstruction of the outdoor behavior is quite remarkably accurate with an average mean square error of 0.24, and the reconstructed curve fits almost perfectly the measured one, in data that is not used during training.Further, as can be seen in Figure 3b, when the trained algorithm was presented with test data generated from another lab, without UV protection during aging (see the Supporting Information) and with significantly different environmental conditions (Mediterranean and desert climates at ICN2 and BGU, respectively), the prediction was accurate  within 1 order of magnitude.In fact, the discrepancy could be credibly attributed to the differences in aging protocols, especially the lack of UV protection.This is despite the fact that the indoor measurements at ICN2 were conducted without UV light.The existence of double peaks in the prediction curve of Figure 3b can be attributed to the existence of a high-frequency noise component from the wavelet decomposition due to the difference in measuring frequencies across laboratories.This part of the pipeline not only verifies the results of the first part shown in Figure 2 but also provides important value in the form of predicting outdoor PCE evolution.It further proves that so long as the indoor and outdoor measurements are consistent with each other, a generally well-behaving prediction can be expected.By utilizing this functionality, the stability of new device types can be evaluated in a matter of days rather than months.In conclusion, we have presented a robust pipeline that identifies the relevant stress factors of perovskite solar cell degradation in a qualitative and quantitative fashion.Further, it allows direct predictions of outdoor solar cell stability based on accelerated indoor stability tests without further modeling.Subsequently, the same pipeline is used to reconstruct the outdoor behavior based only on the relevant indoor tests.These findings are extremely important, as they can rationalize outdoor degradation mechanisms from relatively quick tests as well as provide insight into the degradation mechanisms as a whole.Further, given the quantitative nature of our factor importance, the laboratories can choose to test a device with a less relevant indoor test if they judge that the accuracy loss is sufficiently compensated by the decreased test duration.Further, by identifying the relevant stress factors, the syntheses can move toward the direction that mitigates these specific effects.The pipeline's results were achieved using only six different device types to train the algorithms.Nevertheless, with an expanded data set, we could have efficiently utilized more complex algorithms such as Transformers and bidirectional Long Short-Term Memory (bLSTM) networks.These algorithms have proven robust and accurate when trained with large data sets.In a data set of the size that we are currently investigating, the additional complexity is proven to decrease the quality of the prediction.In contrast, the kernel methods have a lower parametric load and are therefore better suited to our case study.Our pipeline can be applied without loss of generality both to low-throughput and high-throughput laboratories by choosing a suitable method.Since our pipeline is general and can be applied across technologies and laboratories, it can be used to provide general intuition that will combine the findings of many independent and disjoint laboratories.
Experimental Methods (preparation of triple cation perovskite precursor solution, perovskite solar cell fabrication, encapsulation, and indoor and outdoor photostability studies) and Algorithms and Data Processing Methods (algorithm testing strategy and results of different algorithms) (PDF) ■

Figure 2 .
Figure 2. Comparative accuracy of predictions with different indoor tests as inputs.

Figure 3 .
Figure 3. Predictions (red) and true (blue) performance tracks of maximum power evolution in outdoor conditions.The true tracks are generated by averaging over the measured curves of all the devices with the same fabrication procedure.(a) Tracks measured in Barcelona, Spain, and (b) tracks measured in Sede Boqer, Israel

Table 1 .
Frechet Distance Denoting Curve Similarity a a 1 indicates identical curves.