ACS Publications. Most Trusted. Most Cited. Most Read
Systematic Development of a Machine Learning-Based Asset Management Tool for Wastewater Pipeline Networks
My Activity
  • Open Access
Article

Systematic Development of a Machine Learning-Based Asset Management Tool for Wastewater Pipeline Networks
Click to copy article linkArticle link copied!

  • Jake Stengel
    Jake Stengel
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    More by Jake Stengel
  • Emmanuel Aboagye
    Emmanuel Aboagye
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
  • Phuong Le
    Phuong Le
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    More by Phuong Le
  • Matt DeNafo
    Matt DeNafo
    Atlantic County Utility Authorities (ACUA), Atlantic City, New Jersey 08401, United States
    More by Matt DeNafo
  • Dylan Snyder
    Dylan Snyder
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    More by Dylan Snyder
  • Nathanial Nelson
    Nathanial Nelson
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
  • Kirti Yenkie*
    Kirti Yenkie
    Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    *Email: [email protected]
    More by Kirti Yenkie
Open PDFSupporting Information (1)

ACS ES&T Water

Cite this: ACS EST Water 2024, 4, 12, 5555–5565
Click to copy citationCitation copied!
https://doi.org/10.1021/acsestwater.4c00608
Published November 18, 2024

Copyright © 2024 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Abstract

Click to copy section linkSection link copied!

Utility companies face significant challenges in managing wastewater distribution networks (WWDN), where continuous service delivery is crucial. The American Society of Civil Engineers (ASCE) rated the U.S. wastewater treatment infrastructure as inadequate, assigning it a D+ grade due to its high risk of failure. Such failures can lead to severe economic and environmental damage, with untreated waste contaminating ecosystems and causing costly cleanups. Traditionally, the industry has relied on reactive, subjective asset management, addressing issues only after failures occur. This reactive approach often results in unexpected expenses and strains operational budgets. To address these challenges, we propose a proactive, data-driven asset management framework for the wastewater industry. Our strategy aims to reduce unforeseen costs by minimizing the likelihood of asset failure, environmental risks, and financial losses. By leveraging machine learning, specifically random forest classification, and analyzing historical data, we developed a predictive tool in Python. This tool identifies high-risk assets, enabling prioritized maintenance actions, ultimately mitigating potential environmental impacts and associated costs.

This publication is licensed under

CC-BY 4.0 .
  • cc licence
  • by licence
Copyright © 2024 The Authors. Published by American Chemical Society

Note Added after ASAP Publication

Due to a production error, this paper was published ASAP on November 18, 2024 with the wrong Supporting Information file. The corrected version was reposted on November 19, 2024.

Synopsis

This research presents a proactive asset management framework using machine learning to enhance the reliability and reduce costs in wastewater treatment networks.

1. Introduction

Click to copy section linkSection link copied!

Infrastructure is crucial for national growth, productivity, and competitiveness. (1) However, poor infrastructure can negatively affect the community and economy it serves. (2) The American Society of Civil Engineers (ASCE) evaluates and ranks United States (US) infrastructure quality, with the overall 2021 grade of “C-“indicating a critical need for attention. (3) This ranking suffers when key infrastructure categories, such as roadways, energy, drinking water, and wastewater, are not maintained to satisfy the growing population demand. (4) Therefore, developing adaptable infrastructure and asset maintenance plans is imperative to promote social and economic growth.
When considering the wastewater infrastructure, the most valuable asset is the wastewater distribution network (WWDN). (5) Currently, the last major mass retrofitting of the wastewater infrastructure in the US occurred during the 1970s after the Clean Water Act (CWA) was passed. The CWA was an extension of the Federal Water Pollution Control Act, which outlined the basic structure of regulating pollutant discharges. (6) In the years that followed, public utility companies were granted limited public funding to upgrade WWDN with new assets to meet the requirements set by the CWA. (7) Since then, most WWDNs have not seen significant enhancement for about 50 years.
Given that the average life cycle expectancy of WWDN assets is approximately 50 to 75 years, many assets are reaching their end-of-life, leading to a rise in asset failures across the nation. (8,9) When a WWDN asset fails, there are negative consequences to the surrounding environment and economy, requiring a significant shift in resources to fix the failed asset. From an environmental standpoint, asset failures result in structural damages, surface erosion, adverse health impacts, reduced ecosystem quality, and pollution of local receiving waters. (10) Furthermore, when alerted to a failure, the utility company responsible must devote substantial capital to fix the issue. This shift of resources leads to an increase in operation and maintenance costs, taking away from the fixed yearly budget to perform large-scale replacements to the WWDN. In 2019, the ASCE reported an estimation of over 240,000 wastewater treatment networks (WWTNs) breaks, costing the US an estimated $3 billion in immediate repairs. (11,12) As a result, many public companies are facing a conundrum of trying to maximize their budget to simultaneously effect large-scale replacements and upgrade the deficient WWDN to meet the quality standards set by the CWA. However, with a lack of considerable investment in the wastewater industry, many treatment plants are unable to upgrade their WWDN to meet the growing population demands. (13) The lack of investment in the wastewater infrastructure has led the American Society of Civil Engineers (ASCE) to rate the wastewater infrastructure at a low grade of “D+” from 2013 to 2021. (3,14) This consistent rating shows that the wastewater infrastructure is poorly maintained and needs significant upgrades to meet the acceptable “B–” ranking standard. Figure 1 shows the projected yearly financial needs to bring the current infrastructure to acceptable levels.

Figure 1

Figure 1. Funding gap representing the capital investment needed for the WWDN infrastructure across the United States reported by the ASCE.

Illustrated in Figure 1, the estimated funding gap and total funding from the nation decreased gradually from 2009 to 2017 but experienced a drastic increase in 2021. This steep increase is attributed to the prolonged trend of underinvestment in critical water-related infrastructure, in addition to some wastewater treatment plants outliving their expected life cycle. (3) Out of the 16,000 wastewater treatment plants (WWTPs) in the United States, 81% are functioning at an average of their design capacity, while 15% have reached or even exceeded it. (15) In 2019, 63% of the water and wastewater infrastructure capital needs were not met, signaling a gap of $81 billion, contributing to the drastic increase of funding gap in 2021. (15) Additionally, there has been an increase in the operation and maintenance (O and M) cost associated with wastewater systems as system components near their expected service lives. (15) In 2019 alone, the funding gap due to O and M in the United States was $10.5 billion. Furthermore, with the growing trend of asset failures due to treatment plant aging, many utility companies find their operation and maintenance costs increasing to accommodate the extra capital needed to fix these assets. Therefore, due to growing operation and maintenance costs, most wastewater companies cannot appropriate the necessary funds to close the enormous funding gap associated with the aging infrastructure.
To address budget constraints, utility companies are advised to adopt a systematic approach to repairing their WWDNs section by section. This strategy minimizes asset failure and ensures compliance with the Clean Water Act (CWA) standards. Asset management plans (AMPs), which are structured and methodical procedures for sustainable asset system management, are commonly employed by these companies. (16) However, the United States Environmental Protection Agency (US EPA) has established only a few federal regulations regarding the content of wastewater AMPs without endorsing a specific plan, leading to diverse approaches based on EPA guidelines. (17) Each AMP typically includes an inspection method that aligns with the financial constraints of the utility company and informs a general life cycle assessment for the WWDN. (18) Due to the subterranean nature of most WWDNs, direct visual inspections are not viable. Utilities, therefore, resort to alternative methods, such as ultrasonic and eddy current tools or magnetic fields, to detect system flaws. A prevalent tool for inside-pipe inspection is the “pig” (pipeline integrity gauge), which, despite its effectiveness, incurs a high cost─around $35,000 per mile. (19) The extensive U.S. network of about 800,000 miles of public sewers system would require approximately $28 billion to inspect entirely using the “pig” method. Consequently, many utility companies perform limited inspections with pipeline integrity gauges and apply the findings to their broader AMPs. The prohibitive cost of comprehensive inspections often leads to reactive rather than proactive management, which does not adequately prevent malfunctions or minimize maintenance expenses. (20,21)
In the face of economic and environmental challenges, coupled with increasing populations and aging infrastructure, there is a pressing need for the wastewater industry to refine its AMPs using predictive models. These models could proactively manage infrastructure by forecasting potential failures and recommending timely maintenance actions. The integration of advanced computational and data-driven techniques, especially machine learning (ML), can revolutionize asset management. With the potential to significantly enhance the management of extensive and critical pipeline networks, ML applications in this field are increasingly being explored for predictive maintenance, improving decision-making, and improving system reliability. Researchers are actively investigating the efficacy of ML to accurately predict the conditions of sewer systems, thereby informing more strategic AMPs. For example, in their study, Nguyen et al. (2022) (22) evaluated various ML models for predicting sewer conditions. They utilized 19 distinct features, including nine physical attributes such as the age, diameter, depth, and type of pipe and ten environmental factors like rainfall, geology, population density, groundwater presence, and traffic volume. They experimented with 15 different ML models, including support vector machine (SVM), random forest (RF), artificial neural network (ANN), and logistic regression (LR), and found that the RF model excelled in predictive accuracy. The research highlighted that the construction material and age of the pipelines were the most significant contributors to sewer deterioration. Similarly, Winkler et al. (2018) (23) demonstrated that boosted decision tree methods, including AdaBoost, random forest, and RUSBoost, can predict the failure probability of sewer pipelines with a high degree of accuracy, reaching 96%. Fontecha et al. (2021), (24) also adopted a two-step ML approach to evaluate the failure risk of urban sewer infrastructure. Their model first determines the likelihood of an asset being at risk before classifying the type of risk for assets deemed vulnerable. This methodology proves beneficial for decision-makers in prioritizing corrective measures for high-risk pipelines. The XGBoost model was noted for its superior accuracy and F1 scores among the four ML models that they created.
Collectively, these studies suggest that ensemble ML methods are more effective than other models for predicting the conditions of sewer systems. Building upon this insight, coupled with the challenges facing asset management in the utility companies that we established earlier, we have developed an ML-based asset management model and accompanying software tool to assist industries in making well-informed decisions regarding their assets, thereby mitigating adverse economic and environmental impacts resulting from asset failures.

2. Materials and Methods

Click to copy section linkSection link copied!

In this section, we discuss the asset management framework and machine learning model building. Later we discuss the development of a computational software tool for predicting risk factors associated with wastewater assets using a case study from our partnered facility, the Atlantic County Utilities Authority (ACUA), that serves 14 New Jersey municipalities for their wastewater collection, distribution, and treatment systems.

2.1. Conventional Asset Management Strategy

The initial step in designing an asset management model consists of systematic information collection on standard asset management practices. (17) The cyclical process depicted in Figure 2 is the basis of many popular asset management models. In step #1, utility companies keep an inventory of all the assets within their network. During steps #2 and #3, engineers and technicians evaluate the status and life cycle cost of each asset. Afterward in step #4, the companies determine the operating conditions, such as flow rate, and pressures, among others, needed to keep a consistent level of service to customers. In step #5 strategies for optimizing cost are implemented, such as operations and maintenance costs are minimized through maximizing equipment performance, effective repairs, among others. Finally, in step #6, all the information gathered from the previous steps are combined to make a comprehensive asset management plan for the company. If new assets are added, then the cycle is restarted to update the asset management plan.

Figure 2

Figure 2. Development process of an asset management plan.

While the underlined process helps in developing a generic asset management plan, one major challenge in this iterative process is the introduction of substantial human error in steps #2 and #3. This is due to the subjective nature of these two steps due to the total dependence on the experience of the engineer or technician conducting the WWDN asset inspection. Thus, an experienced engineer may provide a better assessment, while an inexperienced one may not, accounting for varying degrees of bias and error within the system. These inefficiencies in building an asset management plan contribute to asset failures, resulting in significant economic and environmental consequences. Additionally, there are further complications when companies fail to plan their maintenance schedules efficiently during step #5, resulting in overspending on the operation and maintenance costs. These compounded errors in the process result in extra costs and resources for the utility companies in their attempts to repair their WWDNs and maintain a constant level of service.
In view of these challenges, industries have tried to implement mathematical models for determining asset conditions rather than relying solely on the experiences of engineers and technicians. Furthermore, many WWTPs have been exploring ML methods that rely on mathematical relationships which are difficult to capture and detect through observations and human experience. (25−27) While ML methods have the advantage of discovering the patterns inherent in data and can build a model from these relationships, they are also subject to bias. (25) However, unlike humans, who may bias their judgments based on personal experience, a ML model has no experience. Instead, ML bias comes from the data it is trained with, leaning in any direction in which the data itself is skewed. Therefore, if the data presented to the ML algorithm is accurate and reliable, the model developed will be just as reliable. (25) Another advantage offered by ML models is that they lack the rigidity of the models developed from human experience, as they can learn from new data, allowing the model to be adaptable when receiving a more accurate and precise set of data than the original. Furthermore, when ML models learn from new data, the model will make better predictions for the future─something difficult for models developed from human experience. This allows ML models to accurately predict when WWDN will fail but does not indicate when WWDN repairs need to occur. Consequently, to improve the operation and maintenance costs, an additional optimization method needs to be implemented to fulfill the requirement outlined by step #5 in Figure 2. Together, these methods offer a holistic, adaptive, and predictive asset management plan that minimizes costs and eliminates the inefficiencies of human error.

2.2. Overall Preventive Measurement Number (OPMN) Metric

One of the ways to estimate the criticality of assets in water and wastewater treatment systems is to use a ranking system to calculate the risk factor associated with the asset. (28) Accessing the criticality of an asset requires an examination of the likelihood and the consequences of failure. (28−32) Once this analysis is done, assets with the greatest likelihood of failure, risk probability (RP), and the most significant consequences upon failure, failure impact (FI) could then be prioritized above others in terms of maintenance, repairs, and replacement. To use this system, the RP and the FI are calculated for each asset, and a multiplication of these two components gives the corresponding Risk Factor for the asset. Therefore, in this work, we use this methodology to estimate the condition of each asset in the form of the risk factor (overall preventive measurement number, OPMN), by developing two ML models, random forest classification (RFC) and eXtreme gradient boosting classifier (XGBC) for the prediction of RP and FI based on the properties and characteristics of WWTPs pipeline networks that connect the 14 municipalities to the ACUA plant. (33,34)

2.3. Analysis Framework

Developing a computational tool, where ML is the embedding back-end and accurately evaluates the condition of an asset of a utility company, is an intricate process. Generally, the asset data are in their raw state, requiring preprocessing. Therefore, the tool must first be capable of processing the raw data input from the utility company and organizing it in a manner ready for ML training. Once the data are available, the ML model can be developed based on training, validation, and testing, and the model can be packaged as a computational tool for the utility company. In this work, the generalized framework for developing the asset management tool is shown in Figure 3. The first step is data input management, where we investigate data preprocessing. In this step, we preprocess the data and determine the feature sets (i.e., the ML model inputs) needed for the ML model development. These features can be shared between the two estimates of RP and FI. The second step is the asset management model formulation, where we develop an ML model for prediction of asset OPMN. To overcome the challenge associated with data limitations, we also adopt strategies of cross validation and sensitivity analysis. Finally, we developed a computational tool. The following subsections describe the details for each step.

Figure 3

Figure 3. Framework for the development of a generalized asset management tool for use in utility authorities.

2.3.1. Management of Data Input

The Atlantic County Utilities Authority (ACUA) is a wastewater treatment and waste collection facility in Southern New Jersey. The company provides services to 14 municipalities within Atlantic County, having the capability of treating up to 40 million gallons of wastewater per day. The historical data provided by the ACUA include over 60 miles of infrastructure assets, approximately 250 manholes and 295 pipelines. However, in recent years, ACUA has encountered asset failures due to unforeseen and premature pipe failure modes that were not anticipated from typical industry subjective asset management systems. One such asset failure in recent years has been their Ventnor–Margate force main, which serves three municipalities (Ventnor, Margate, and Longport); (35) however, the ACUA is required to have a consistent level of service, even in the event of an asset failure. Amid growing concerns over the recurring asset failures experienced by the ACUA, we developed a predictive asset management model that can accurately assist in forecasting potential failures and recommend timely asset rehabilitation.
A total of 295 data points is acquired from the ACUA. Each data point in the data set corresponds to a pipeline asset. The feature set from this data set comprised location of pipeline, the total length of the pipeline, the pipe type, the type of flow through the pipeline, the diameter of the pipeline, the original life span and year of installation of the pipeline, the replacement or rehabilitation year, remaining life span, failure impact, and the risk probability of the pipelines. The location of the pipeline indicates the cities where each of the pipelines is located within the jurisdiction of the ACUA. The pipe type indicates the different materials used in the construction of the pipeline. There are nine types, namely, ductile iron pipe (DIP), asbestos cement pipe (ACP), polyvinyl chloride (PVC), reinforced concrete pipe (RCP), thermoplastic composite pipe (TCP), vitrified concrete pipe (VCP), steel pipe, and high-density polyethylene (HDPE). The type of flow through the pipelines is by gravity flow or force main flow by using the pumps. The pipe type and flow type were converted to numerical data by using label-encoding. Additionally, the population density for each city was included in the analysis since that plays a significant role regarding the impact to the surrounding communities due to an asset failure. Based on expert knowledge by engineers at ACUA, we narrowed down the feature set to the ones shown in Table 1. For the labeled data for FI and RP, they range from a categorical value of 1 to 5, with 1 having the lowest consequences and 5 the highest. Table 2 shows the meaning for each of the categories in the RP and FI model developed.
Table 1. Features within Both Categories
Risk Probability (RP)Failure Impact (FI)Data Type
Flow Types (−)Flow Types (−)Categorical
Pipe Size – Diameter (in)Pipe Size – Diameter (in)Numerical
Years since last Inspection (yr)Years since last Inspection (yr)Numerical
Population density (people/mi2)Population density (people/mi2)Numerical
Pipe Material Type (−)Pipeline Placement in WWDN (−)Categorical
Pipe Segment Length (ft)Flow Rates (MGD)Numerical
Original Installation Year (yr) Numerical
Remaining Life (yr) Numerical
Table 2. Categorical Meanings for both RP and FI
CategoryRPFI
1Little to no chance of asset failure in the near futureLittle to no impact on the surrounding environment in the event of asset failure
2Minor chance of asset failure in the near futureMinor impact on the surrounding environment in the event of asset failure
3Moderate chance of asset failure in near futureModerate impact on the surrounding environment in the event of asset failure
4Major chance of asset failure in near futureMajor impact on the surrounding environment in the event of asset failure
5High chance of asset failure in near futureExtreme impact on the surrounding environment in the event of asset failure
Regarding missing data, we dropped the rows as using imputation methods such as mean imputation can skew your data. We used the z-score criteria of outlier detection to remove any data that falls outside a ± 3 z-score from the data set. Thus, after data preprocessing, the total data points reduced to 281, which was used to build the model. For the label set in the preprocessed data, there are 81, 60, 5, 130, and 4 data points for category “1″, “2″, “3″, “4″, and “5” in the RP, respectively. For the FI, there are 4, 140, 52, 52, and 33 data points for category “1″, “2″, “3″, “4″, and “5″, respectively. Figure S1 presents the distribution of the feature set after final data preprocessing, showing the minimum and maximum values.

2.3.2. Asset Management ML Model Development

In this study, two primary machine learning (ML) models were developed: random forest classification (RFC) and XGBoost classification (XGBC). Both models are ensemble ML algorithms and have demonstrated promising results in predicting sewer conditions, as reported in recent literature. (22,24) The data was normalized using a standard scalar and split into 70% for training and 30% for testing, employing stratified splitting to ensure proportional representation of each class in both data sets. The Figure S2 shows the sensitivity analysis of the RFC model accuracy with the percentage of data split in training set vs the number of simulated trees. This analysis also supported the training testing split ratio of 70:30 as it yielded better model prediction accuracy.
For model validation and hyperparameter tuning, stratified k-fold cross-validation with k = 3 was utilized, combined with the Hyperopt (36) library. The objective function aimed to maximize the mean accuracy score across 1000 iterations. Key hyperparameters tuned included the number of trees (“n_estimators”) and the maximum depth of a tree (“max_depth”). Feature importance was assessed using the “feature_importance” method in the scikit-learn RFC module, which calculates average impurity decrease across all decision trees for each feature. For the XGBC model, feature importance was determined by counting the number of times each feature was chosen for a split across all trees, with more frequently selected features considered to be more important. The importance of each feature was then expressed as a percentage of the total importance of the model. We devised two approaches to calculating the OPMN feature importance: generic WWDN average (GWA) and weighted score average (WSA). The RP and FI predicted scores are not used for the GWA, but a geometric mean is applied to the feature importance of both models. This approach allows us to view how each factor impacts the overall system. For the WSA, the RP and FI scores are used with eq 1 to calculate the importance of each factor for an individual asset.
Feature Importance=[(RPScore5)×FeatureImportanceRP+(FIScore5)×FeatureImportanceFI]/2
(1)
This approach gives the feature importance on a per asset basis. However, to calculate the overall feature importance for the OPMN, we computed the mean over all of the assets. By using both approaches, we can determine the importance of each factor for the whole system and how it varies for each asset.

2.3.3. Setup and Computational Workflow

Once an ML model is developed for predicting the OPMN, the next stage is to develop an interactive software tool to help utility companies manage their assets efficiently. In the development of the software tool, the programming platform used is Python. MySQL is used for data management and access at the back-end due to its efficient data retrieval and storage capabilities. As indicated earlier, the RFC algorithm within the scikit-learn package is used for model development. On the front-end, the Graphical User Interface (GUI) for the application is designed using Tkinter. Tkinter is a standard Python library that is used to create GUI applications. It was chosen for its ease of use, cross-platform compatibility, and ability to create simple and intuitive interfaces. Finally, to visualize the results, matplotlib and seaborn are utilized. The combination of these two libraries allowed for the creation of high-quality visualizations used to interpret and communicate the results of the application. Figure 4 summarizes the packages used for application development.

Figure 4

Figure 4. Programming and software packages used for application development.

3. Results and Discussions

Click to copy section linkSection link copied!

3.1. Optimal Hyperparameters and Cross-Validation

The optimal hyperparameters for each of the developed models and the corresponding mean cross-validation score for each model are listed in Table 3. From the cross-validation score, it can be noted that the RFC model gives the better accuracy score for the RP factor while XGBC gives the best accuracy for FI factor. However, both models can give a good generalization of predicting the RP and FI metrics with an accuracy greater than 90%.
Table 3. Optimal Hyperparameters for both ML Models for RP and FI Predictions
 Random Forest Classification (RFC)XGBoost Classification (XGBC)
HyperparameterRPFIRPFI
Number of trees31011010110
Depth of trees31914
Cross-validation score (%)93.491.392.392.9

3.2. Model Evaluation

The Table 4 shows the F1 score, precision score, and accuracy score for the RFC and XGBC models for RP and FI predictions on the test set. The F1 score is a widely used metric for evaluating the performance of classification models, particularly in scenarios where the data classes are imbalanced. This score combines precision and recall into a single metric by taking their harmonic mean. A high F1 score suggests that the model has robust recall and precision values, implying fewer false positives and false negatives. The precision-score quantitatively measures the accuracy of the model by identifying only relevant instances. Thus, in the context of a classifier, precision is defined as the number of true positives divided by the sum of true positives and false positives. Accuracy score on the other hand is the ratio of correctly predicted observations to the total observations. To estimate the F1 score and precision score of the developed models, we used the weighted average method. In this method, the score of each class is multiplied by a weight proportional to the prevalence of the true instances for that class in the data set. Thus, the contribution of the score of each class to the overall average is adjusted according to how frequently each class appears in the data. This helps to account for class imbalance, and hence gives us a more informative way of model evaluation. From Table 4, we observe that XGBC performs better compared with RFC based on the metrics. This difference in performance can be attributed to the random feature of each method in splitting and combining different subsets to optimize the performance of the model. However, both models present a good generalization in terms of RP and FI predictions.
Table 4. Model Evaluation for RFC and XGBC
 RFCXGBC
MetricRPFIRPFI
F1 score (%)90.294.792.095.9
Precision score (%)88.894.391.395.3
Accuracy score (%)91.895.392.996.5

3.3. Model Feature Importance

Figure 5 depict a comparative analysis of feature importance as determine by the two ML models, both fitted to predict the RP and FI. As seen in both figures, the “year since last inspection” dominate among the feature set and are crucial to predicting both RP and FI. From previous research by Nguyen et al. (2022), (22) in their work indicated that the feature with the highest importance to predicting sewer conditions is the pipe type, followed by the age of the asset. While their study places emphasis on the intrinsic qualities of the pipeline materials and the temporal extent since installation, our research introduces “years since last inspection” as a paramount predictor, a feature that provides a nuanced reflection of the pipeline age in relation to its maintenance history. The age of a pipeline, as highlighted by Nguyen et al. (2022), serves as an indicator of potential degradation, with older pipelines presumably being at higher risk of failure. However, this perspective may not fully encapsulate the actual condition of the infrastructure. “years since last inspection” extends the dialogue by integrating the temporal dimension of care and monitoring. It implies that regular inspections can significantly alter the risk profile of a pipeline, irrespective of its chronological age. For instance, an older pipeline subjected to frequent and thorough inspections could be in a better state than a relatively newer pipeline that has been neglected. This distinction is crucial because it shifts the focus from an inherent characteristic (age) to an actionable characteristic (inspection intervals). The correlation between inspection frequency and pipeline integrity is supported by the principle that proactive maintenance can considerably prolong the functional lifespan of assets and pre-empt catastrophic failures. One notable observation from the XGBC model for RP prediction is that pipe type, original installation year, and population density have less effect on the model, even though these features have some degree of significance in the RFC model. We believe that since XGBoost utilizes gradient boosting, a technique where new models are created that predict the residuals or errors of prior models and then are combined to make the final prediction, this additive model approach focuses on correcting the mistakes of previous trees, and as such, it tends to focus on the most influential variables early in the process. Consequently, if pipe type, original installation year, and population density do not present strong, initial gradients or if their contribution to reducing the error is marginal compared to other features, XGBC may assign them a lower importance. However, the RFC model operates on the principle of bagging, or bootstrap aggregating, where many individual trees vote on the outcome. As a result, this model gives more equal consideration to all features, which may explain the higher significance attributed to the pipe type, original installation year, and population density. Similar analogies can be derived from the FI model.

Figure 5

Figure 5. Breakdown of feature importance: (a) RP using the RFC model, (b) RP using the XGBC model, (c) FI using the RFC model, and (d) FI using the XGBC model.

Finally, Figure 6 shows the overall feature importance based on eq 1. The feature importance is an average based on the total number of pipelines considered. In both models, the year since the last inspection has the highest impact, primarily due to this feature having the same highest importance in both the RP and FI models. If we consider the first two features for the RFC model, it indicates that most of the pipelines need to be reinspected; however, the inspection should start with larger pipelines. For the XGBC model, there should be prioritization of an inspection schedule; however, the inspection should start based on the type of flow (gravity or force main).

Figure 6

Figure 6. Breakdown of the feature importance for the OPMN using eq 1). (a) Mean feature importance using RFC model. (b) Mean feature importance using XGBC model.

3.4. Asset Management Software Tool

Following the finalization of the machine learning model, an asset management plan (AMP) tool is developed using the Tkinter library in Python. Developing the graphical user interface (GUI) obviates the need for programming knowledge and hence provides a user-friendly interface for the wastewater sector, particularly the ACUA. The GUI development was subdivided into three primary sections (or tabs): high-risk pipelines, the pipeline database, and model information. The first section of the GUI is shown in Figure 7. In this section, the user has the option to visualize the top 10 assets with the highest OPMN values together with their locations within the pipeline network structure. Furthermore, the “Generate Report” button allows users to generate a detailed Excel report of their wastewater infrastructure network.

Figure 7

Figure 7. GUI tab showing high-risk pipelines.

Users can view each pipeline in detail in the “Pipeline Database” tab. By clicking on a pipeline, details such as the location, pipeline length, type of flow, material of construction, and years since the last inspection can be observed as shown in Figure 8. Furthermore, a pie-chart is displayed where users can visualize the percentage distribution of the main factors contributing to the RP and FI scores. Another important feature of this tab is the ability for users to change asset data and add a new asset or delete an existing asset. This flexibility provides users with the opportunity to keep an updated database of their assets.

Figure 8

Figure 8. GUI tab showing detailed information about the individual asset and its corresponding OPMN.

Upon data modification or asset addition or deletion, users can rerun the model to get new predictions from the model under the “Model Information” tab, as shown in Figure 9. Furthermore, users can add new features to the model and subsequently delete existing features and visualize how the model performs upon model rerun by clicking on the “Run New Models” button. Additionally, the confusion matrix, one way to inspect how well a machine learning classification model performs, can be visualized by a dropdown button near the “Feature Importance” cell and selecting “Confusion Matrix”.

Figure 9

Figure 9. GUI tab showing the feature importance for the base and updated model.

The asset management tool developed in this study focuses specifically on the infrastructure network within wastewater treatment network systems and not the treatment facilities themselves. The model was developed to predict the risk factor associated with each asset of the network based on features such as pipeline condition, flow rates, and installation years, which are crucial components of the infrastructure used to transport wastewater. Although the tool was developed using wastewater-specific data, the feature set includes general characteristics that are equally relevant to water treatment systems. Whether dealing with wastewater or drinking water pipelines, utilities face similar challenges: aging infrastructure and increased risks due to deteriorating assets. As a result, the developed model can easily be adapted for use in a water treatment system. For example, transitioning the tool to a drinking water system would only require replacing the data set with water treatment-specific data and retraining the model and the corresponding hyperparameters, without any need to modify the underlying machine learning framework.
While the model has demonstrated robust performance with the small data set (see Figure S1 for RFC model sensitivity analysis), it is important to recognize that a larger data set could enhance its predictive power and generalization capabilities. Machine learning models generally benefit from larger data sets, which allow them to capture more diverse patterns and variations, improving their ability to make accurate predictions across a range of conditions. To address the small data set limitation, several strategies can be pursued in future work to increase the data set size and further strengthen the reliability of the model. One way to increase the data set is by collaborating with other utility companies that manage similar infrastructure. Data sharing agreements between utility companies could provide access to larger data sets from comparable wastewater systems. From the data engineering perspective, synthetic data generation techniques such as Monte Carlo simulations can generate plausible pipeline feature scenarios and conditioned data. (37,38) Furthermore, transfer learning is another method that can be explored to improve model performance with small data sets. Transfer learning entails pretraining the model on a larger data set, and then fine-tuning it on the smaller target data set. (39,40) Thus, while the current data set size presents a limitation, the strategies discussed here can be implemented in future work to address this challenge.
For most of the assets analyzed, the importance of the “years since the last inspection” dominated the factor. This indicates the need for frequent inspections to improve the state of these assets. Additionally, we do observe that the average flow rate contributes significantly to the OPMN due to the high volume of wastewater generated by the 14 municipalities under ACUA jurisdiction for treatment. Currently, the data used for this software development are from the ACUA for their wastewater asset management; however, we believe this strategy can easily be adapted to other sectors for their asset management.

4. Conclusions

Click to copy section linkSection link copied!

We developed a machine learning predictive tool to accurately evaluate asset conditions and facilitate the creation of a comprehensive asset management plan for the wastewater treatment industry. From the software application developed, we observe that the “years since the last inspection” dominates the predictions, as well as the pipe size and the flow rate of wastewater. This is because the years since the last inspection correlate to unknown conditions a typical pipeline may experience. Higher flows have larger environmental impacts as they affect more areas and customers and usually cost more to fix and repair. Thus, with regular inspection and maintenance, we believe industries can make well-informed decisions regarding the state of their assets. We also observe that the flow rate of wastewater also significantly impacts the OPMN prediction, which was anticipated due to the various seasonal changes in wastewater generation. By leveraging the Tkinter package using Python, we developed this software application that simplifies the asset management process, enabling stakeholders to access the output of the model without requiring programming expertise. This new asset management framework, coupled with the software application, ensures that ACUA and other stakeholders in the industry can continuously monitor pipeline conditions, mitigate risks, and optimize pipeline rehabilitation costs. The success of this research also highlights the potential of machine learning and computational models for asset management. In addition, it provides a blueprint for future water and wastewater treatment industry developments.

Data Availability

Click to copy section linkSection link copied!

The data and machine learning model are available from the kmygroup GitHub Repository at https://github.com/kmygroup/Wastewater-Asset-Management-Project. The repository includes Python source code, and MySQL source code.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsestwater.4c00608.

  • Description of each Python file in the GitHub Repository (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
  • Authors
    • Jake Stengel - Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    • Emmanuel Aboagye - Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United StatesOrcidhttps://orcid.org/0000-0002-6090-527X
    • Phuong Le - Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    • Matt DeNafo - Atlantic County Utility Authorities (ACUA), Atlantic City, New Jersey 08401, United States
    • Dylan Snyder - Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
    • Nathanial Nelson - Department of Chemical Engineering, Rowan University, Glassboro, New Jersey 08028-1700, United States
  • Author Contributions

    CRediT: Jake P Stengel data curation, formal analysis, methodology, software, visualization, writing - original draft; Emmanuel A Aboagye formal analysis, methodology, supervision, writing - review & editing; Phuong Le methodology; Matt DeNafo resources, supervision, validation; Dylan Snyder methodology; Nathaniel Nelson methodology; Kirti Maheshkumar Yenkie conceptualization, funding acquisition, project administration, resources, supervision, writing - review & editing.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

The authors would like to thank the Department of Chemical Engineering at Rowan University for their resources and support. This work was funded and supported by the Atlantic County Utilities Authority.

References

Click to copy section linkSection link copied!

This article references 40 other publications.

  1. 1
    Arif, U.; Javid, M.; Khan, F. N. Productivity Impacts of Infrastructure Development in Asia. Econ. Syst. 2021, 45 (1), 100851,  DOI: 10.1016/j.ecosys.2020.100851
  2. 2
    Munim, Z. H.; Schramm, H.-J. The Impacts of Port Infrastructure and Logistics Performance on Economic Growth: The Mediating Role of Seaborne Trade. J. Shipp. Trade 2018, 3 (1), 1,  DOI: 10.1186/s41072-018-0027-0
  3. 3
    ASCE. Infrastructure. American Society of Civil Engineers. https://www.asce.org/infrastructure/. accessed 2021 April 07.
  4. 4
    Kodongo, O.; Ojah, K. Does Infrastructure Really Explain Economic Growth in Sub-Saharan Africa?. Rev. Dev. Finance 2016, 6 (2), 105125,  DOI: 10.1016/j.rdf.2016.12.001
  5. 5
    Rahbaralam, M.; Modesto, D.; Cardús, J.; Abdollahi, A.; Cucchietti, F. M. Predictive Analytics for Water Asset Management: Machine Learning and Survival Analysis. arXiv , 2020 DOI: 10.48550/arXiv.2007.03744 .
  6. 6
    US EPA Federal water pollution control actUS EPA1972
  7. 7
    Houck, O. A. The Clean Water Act TMDL Program: law, Policy, and Implementation; Environmental Law Institute, 2002.
  8. 8
    Raghuvanshi, S.; Bhakar, V.; Sowmya, C.; Sangwan, K. S. Waste Water Treatment Plant Life Cycle Assessment: Treatment Process to Reuse of Water. Proc. CIRP 2017, 61, 761766,  DOI: 10.1016/j.procir.2016.11.170
  9. 9
    Turner, J. San Simeon Community Services Districe - Estimated WWTP Life Expectancy Analysis. Phoenix Civil Engineering, Inc. 2016.
  10. 10
    Ugarelli, R.; Venkatesh, G.; Brattebø, H.; Di Federico, V.; Sægrov, S. Asset Management for Urban Wastewater Pipeline Networks. J. Infrastruct. Syst. 2010, 16 (2), 112121,  DOI: 10.1061/(ASCE)IS.1943-555X.0000011
  11. 11
    Mazumder, R. K.; Salman, A. M.; Li, Y.; Yu, X. Performance Evaluation of Water Distribution Systems and Asset Management. ASCE Libr. 2018, 24 (3), 24,  DOI: 10.1061/(ASCE)IS.1943-555X.0000426
  12. 12
    ASCE Wastewater Asce’s 2021 Infrastructure Report Card. https://infrastructurereportcard.org/cat-item/wastewater/. accessed 2021 June 08.
  13. 13
    Bakri, B.; Arai, Y.; Inakazu, T.; Koizumi, A.; Yoda, H.; Pallu, S. Selection and Concentration of Pipeline Mains for Rehabilitation and Expansion of Water Distribution Network. Procedia Environ. Sci. 2015, 28, 732742,  DOI: 10.1016/j.proenv.2015.07.086
  14. 14
    ASCE Infrastructure Report Card , 2017. https://www.infrastructurereportcard.org/cat-item/wastewater/.
  15. 15
    ASCE A Comprehensive Assessment of America’s Infrastructure. A Comprehensive Assessment of America’s Infrastructure, https://infrastructurereportcard.org/wp-content/uploads/2020/12/National_IRC_2021-report.pdf.
  16. 16
    Physical Asset Managment. Physical Asset Management. Hastings, N. A. J.;eds. Springer: London. 2010, pp. 8388.  DOI: 10.1007/978-1-84882-751-6_7 .
  17. 17
    Allbee, S.; Rose, D. Fundamentals of Asset Management Session 0 - Exectutive Overview. US EPA Mar 2017. https://www.epa.gov/sustainable-water-infrastructure/asset-management-workshops-training-slides.
  18. 18
    Eiber, B. Overview of Integrity Assessment Methods for Pipelines; Washington Cities and Counties Pipeline Safety Consortium, 2003.
  19. 19
    Mitchell, C. WOOO – PIG – SOOIE!” - The Business of Pipeline Integrity II. https://rbnenergy.com/node/688. accessed 2020 December 15.
  20. 20
    Baah, K.; Dubey, B.; Harvey, R.; McBean, E. A Risk-Based Approach to Sanitary Sewer Pipe Asset Management. Sci. Total Environ. 2015, 505, 10111017,  DOI: 10.1016/j.scitotenv.2014.10.040
  21. 21
    Fenner, R. A. Approaches to Sewer Maintenance: A Review. Urban Water 2000, 2 (4), 343356,  DOI: 10.1016/S1462-0758(00)00065-0
  22. 22
    Nguyen, L. V.; Bui, D. T.; Seidu, R. Comparison of Machine Learning Techniques for Condition Assessment of Sewer Network. IEEE Access 2022, 10, 124238124258,  DOI: 10.1109/ACCESS.2022.3222823
  23. 23
    Winkler, D.; Haltmeier, M.; Kleidorfer, M.; Rauch, W.; Tscheikner-Gratl, F. Pipe Failure Modelling for Water Distribution Networks Using Boosted Decision Trees. Struct. Infrastruct. Eng. 2018, 14 (10), 14021411,  DOI: 10.1080/15732479.2018.1443145
  24. 24
    Fontecha, J. E.; Agarwal, P.; Torres, M. N.; Mukherjee, S.; Walteros, J. L.; Rodríguez, J. P. A Two-Stage Data-Driven Spatiotemporal Analysis to Predict Failure Risk of Urban Sewer Systems Leveraging Machine Learning Algorithms. Risk Anal. 2021, 41 (12), 23562391,  DOI: 10.1111/risa.13742
  25. 25
    Zhang, Y. New Advances in Machine Learning; BoD – Books on Demand, 2010.
  26. 26
    Hammond, P.; Suttie, M.; Lewis, V. T.; Smith, A. P.; Singer, A. C. Detection of Untreated Sewage Discharges to Watercourses Using Machine Learning. Npj Clean Water 2021, 4 (1), 110,  DOI: 10.1038/s41545-021-00108-3
  27. 27
    Torregrossa, D.; Leopold, U.; Hernández-Sancho, F.; Hansen, J. Machine Learning for Energy Cost Modelling in Wastewater Treatment Plants. J. Environ. Manage. 2018, 223, 10611067,  DOI: 10.1016/j.jenvman.2018.06.092
  28. 28
    Mashford, J.; Marlow, D.; Tran, D.; May, R. Prediction of Sewer Condition Grade Using Support Vector Machines. J. Comput. Civ. Eng. 2011, 25 (4), 283290,  DOI: 10.1061/(ASCE)CP.1943-5487.0000089
  29. 29
    Salman, B.; Salem, O. Risk Assessment of Wastewater Collection Lines Using Failure Models and Criticality Ratings. J. Pipeline Syst. Eng. Pract. 2012, 3 (3), 6876,  DOI: 10.1061/(ASCE)PS.1949-1204.0000100
  30. 30
    Syachrani, S.; Jeong, H. D.; Chung, C. S. Advanced Criticality Assessment Method for Sewer Pipeline Assets. Water Sci. Technol. 2013, 67 (6), 13021309,  DOI: 10.2166/wst.2013.003
  31. 31
    NJDEP. Asset Management Technical Guidance. https://www.nj.gov/dep/assetmanagement/pdf/asset-management-plan-guidance.pdf. accessed 2023 October 29).
  32. 32
    EFC, N. M. T. Asset Management: A Guide For Water and Wastewater Systems. https://dec.vermont.gov/sites/dec/files/dwgwp/capacitydev/pdf/EFC%20Asset%20Management%20Guide%202006.pdf. accessed 2023 October 29.
  33. 33
    Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12 (85), 28252830
  34. 34
    Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. 2013,  DOI: 10.48550/arXiv.1309.0238 .
  35. 35
    ACUA. ACUA Bader Field Sewer Leak Contained, Ultimate Repair Underway. Bader Field Sewer Leak Contained, Ultimate Repair Underway. http://www.acua.com/newsItem.aspx?id=8562. accessed 2020 December 15.
  36. 36
    Bergstra, J.; Yamins, D.; Cox, D. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms; SciPy: Austin: TX, 2013; pp 1319. DOI:  DOI: 10.25080/Majora-8b375195-003 .
  37. 37
    Młyński, D.; Bugajski, P.; Młyńska, A. Application of the Mathematical Simulation Methods for the Assessment of the Wastewater Treatment Plant Operation Work Reliability. Water 2019, 11 (5), 873,  DOI: 10.3390/w11050873
  38. 38
    Hawari, A.; Alkadour, F.; Elmasry, M.; Zayed, T. Simulation-Based Condition Assessment Model for Sewer Pipelines. J. Perform. Constr. Facil. 2017, 31 (1), 04016066,  DOI: 10.1061/(ASCE)CF.1943-5509.0000914
  39. 39
    Niu, S.; Liu, Y.; Wang, J.; Song, H. A Decade Survey of Transfer Learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1 (2), 151166,  DOI: 10.1109/TAI.2021.3054609
  40. 40
    Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M. A. Transfer Learning: A Friendly Introduction. J. Big Data 2022, 9 (1), 102,  DOI: 10.1186/s40537-022-00652-w

Cited By

Click to copy section linkSection link copied!

This article has not yet been cited by other publications.

ACS ES&T Water

Cite this: ACS EST Water 2024, 4, 12, 5555–5565
Click to copy citationCitation copied!
https://doi.org/10.1021/acsestwater.4c00608
Published November 18, 2024

Copyright © 2024 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Article Views

376

Altmetric

-

Citations

-
Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Funding gap representing the capital investment needed for the WWDN infrastructure across the United States reported by the ASCE.

    Figure 2

    Figure 2. Development process of an asset management plan.

    Figure 3

    Figure 3. Framework for the development of a generalized asset management tool for use in utility authorities.

    Figure 4

    Figure 4. Programming and software packages used for application development.

    Figure 5

    Figure 5. Breakdown of feature importance: (a) RP using the RFC model, (b) RP using the XGBC model, (c) FI using the RFC model, and (d) FI using the XGBC model.

    Figure 6

    Figure 6. Breakdown of the feature importance for the OPMN using eq 1). (a) Mean feature importance using RFC model. (b) Mean feature importance using XGBC model.

    Figure 7

    Figure 7. GUI tab showing high-risk pipelines.

    Figure 8

    Figure 8. GUI tab showing detailed information about the individual asset and its corresponding OPMN.

    Figure 9

    Figure 9. GUI tab showing the feature importance for the base and updated model.

  • References


    This article references 40 other publications.

    1. 1
      Arif, U.; Javid, M.; Khan, F. N. Productivity Impacts of Infrastructure Development in Asia. Econ. Syst. 2021, 45 (1), 100851,  DOI: 10.1016/j.ecosys.2020.100851
    2. 2
      Munim, Z. H.; Schramm, H.-J. The Impacts of Port Infrastructure and Logistics Performance on Economic Growth: The Mediating Role of Seaborne Trade. J. Shipp. Trade 2018, 3 (1), 1,  DOI: 10.1186/s41072-018-0027-0
    3. 3
      ASCE. Infrastructure. American Society of Civil Engineers. https://www.asce.org/infrastructure/. accessed 2021 April 07.
    4. 4
      Kodongo, O.; Ojah, K. Does Infrastructure Really Explain Economic Growth in Sub-Saharan Africa?. Rev. Dev. Finance 2016, 6 (2), 105125,  DOI: 10.1016/j.rdf.2016.12.001
    5. 5
      Rahbaralam, M.; Modesto, D.; Cardús, J.; Abdollahi, A.; Cucchietti, F. M. Predictive Analytics for Water Asset Management: Machine Learning and Survival Analysis. arXiv , 2020 DOI: 10.48550/arXiv.2007.03744 .
    6. 6
      US EPA Federal water pollution control actUS EPA1972
    7. 7
      Houck, O. A. The Clean Water Act TMDL Program: law, Policy, and Implementation; Environmental Law Institute, 2002.
    8. 8
      Raghuvanshi, S.; Bhakar, V.; Sowmya, C.; Sangwan, K. S. Waste Water Treatment Plant Life Cycle Assessment: Treatment Process to Reuse of Water. Proc. CIRP 2017, 61, 761766,  DOI: 10.1016/j.procir.2016.11.170
    9. 9
      Turner, J. San Simeon Community Services Districe - Estimated WWTP Life Expectancy Analysis. Phoenix Civil Engineering, Inc. 2016.
    10. 10
      Ugarelli, R.; Venkatesh, G.; Brattebø, H.; Di Federico, V.; Sægrov, S. Asset Management for Urban Wastewater Pipeline Networks. J. Infrastruct. Syst. 2010, 16 (2), 112121,  DOI: 10.1061/(ASCE)IS.1943-555X.0000011
    11. 11
      Mazumder, R. K.; Salman, A. M.; Li, Y.; Yu, X. Performance Evaluation of Water Distribution Systems and Asset Management. ASCE Libr. 2018, 24 (3), 24,  DOI: 10.1061/(ASCE)IS.1943-555X.0000426
    12. 12
      ASCE Wastewater Asce’s 2021 Infrastructure Report Card. https://infrastructurereportcard.org/cat-item/wastewater/. accessed 2021 June 08.
    13. 13
      Bakri, B.; Arai, Y.; Inakazu, T.; Koizumi, A.; Yoda, H.; Pallu, S. Selection and Concentration of Pipeline Mains for Rehabilitation and Expansion of Water Distribution Network. Procedia Environ. Sci. 2015, 28, 732742,  DOI: 10.1016/j.proenv.2015.07.086
    14. 14
      ASCE Infrastructure Report Card , 2017. https://www.infrastructurereportcard.org/cat-item/wastewater/.
    15. 15
      ASCE A Comprehensive Assessment of America’s Infrastructure. A Comprehensive Assessment of America’s Infrastructure, https://infrastructurereportcard.org/wp-content/uploads/2020/12/National_IRC_2021-report.pdf.
    16. 16
      Physical Asset Managment. Physical Asset Management. Hastings, N. A. J.;eds. Springer: London. 2010, pp. 8388.  DOI: 10.1007/978-1-84882-751-6_7 .
    17. 17
      Allbee, S.; Rose, D. Fundamentals of Asset Management Session 0 - Exectutive Overview. US EPA Mar 2017. https://www.epa.gov/sustainable-water-infrastructure/asset-management-workshops-training-slides.
    18. 18
      Eiber, B. Overview of Integrity Assessment Methods for Pipelines; Washington Cities and Counties Pipeline Safety Consortium, 2003.
    19. 19
      Mitchell, C. WOOO – PIG – SOOIE!” - The Business of Pipeline Integrity II. https://rbnenergy.com/node/688. accessed 2020 December 15.
    20. 20
      Baah, K.; Dubey, B.; Harvey, R.; McBean, E. A Risk-Based Approach to Sanitary Sewer Pipe Asset Management. Sci. Total Environ. 2015, 505, 10111017,  DOI: 10.1016/j.scitotenv.2014.10.040
    21. 21
      Fenner, R. A. Approaches to Sewer Maintenance: A Review. Urban Water 2000, 2 (4), 343356,  DOI: 10.1016/S1462-0758(00)00065-0
    22. 22
      Nguyen, L. V.; Bui, D. T.; Seidu, R. Comparison of Machine Learning Techniques for Condition Assessment of Sewer Network. IEEE Access 2022, 10, 124238124258,  DOI: 10.1109/ACCESS.2022.3222823
    23. 23
      Winkler, D.; Haltmeier, M.; Kleidorfer, M.; Rauch, W.; Tscheikner-Gratl, F. Pipe Failure Modelling for Water Distribution Networks Using Boosted Decision Trees. Struct. Infrastruct. Eng. 2018, 14 (10), 14021411,  DOI: 10.1080/15732479.2018.1443145
    24. 24
      Fontecha, J. E.; Agarwal, P.; Torres, M. N.; Mukherjee, S.; Walteros, J. L.; Rodríguez, J. P. A Two-Stage Data-Driven Spatiotemporal Analysis to Predict Failure Risk of Urban Sewer Systems Leveraging Machine Learning Algorithms. Risk Anal. 2021, 41 (12), 23562391,  DOI: 10.1111/risa.13742
    25. 25
      Zhang, Y. New Advances in Machine Learning; BoD – Books on Demand, 2010.
    26. 26
      Hammond, P.; Suttie, M.; Lewis, V. T.; Smith, A. P.; Singer, A. C. Detection of Untreated Sewage Discharges to Watercourses Using Machine Learning. Npj Clean Water 2021, 4 (1), 110,  DOI: 10.1038/s41545-021-00108-3
    27. 27
      Torregrossa, D.; Leopold, U.; Hernández-Sancho, F.; Hansen, J. Machine Learning for Energy Cost Modelling in Wastewater Treatment Plants. J. Environ. Manage. 2018, 223, 10611067,  DOI: 10.1016/j.jenvman.2018.06.092
    28. 28
      Mashford, J.; Marlow, D.; Tran, D.; May, R. Prediction of Sewer Condition Grade Using Support Vector Machines. J. Comput. Civ. Eng. 2011, 25 (4), 283290,  DOI: 10.1061/(ASCE)CP.1943-5487.0000089
    29. 29
      Salman, B.; Salem, O. Risk Assessment of Wastewater Collection Lines Using Failure Models and Criticality Ratings. J. Pipeline Syst. Eng. Pract. 2012, 3 (3), 6876,  DOI: 10.1061/(ASCE)PS.1949-1204.0000100
    30. 30
      Syachrani, S.; Jeong, H. D.; Chung, C. S. Advanced Criticality Assessment Method for Sewer Pipeline Assets. Water Sci. Technol. 2013, 67 (6), 13021309,  DOI: 10.2166/wst.2013.003
    31. 31
      NJDEP. Asset Management Technical Guidance. https://www.nj.gov/dep/assetmanagement/pdf/asset-management-plan-guidance.pdf. accessed 2023 October 29).
    32. 32
      EFC, N. M. T. Asset Management: A Guide For Water and Wastewater Systems. https://dec.vermont.gov/sites/dec/files/dwgwp/capacitydev/pdf/EFC%20Asset%20Management%20Guide%202006.pdf. accessed 2023 October 29.
    33. 33
      Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12 (85), 28252830
    34. 34
      Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. 2013,  DOI: 10.48550/arXiv.1309.0238 .
    35. 35
      ACUA. ACUA Bader Field Sewer Leak Contained, Ultimate Repair Underway. Bader Field Sewer Leak Contained, Ultimate Repair Underway. http://www.acua.com/newsItem.aspx?id=8562. accessed 2020 December 15.
    36. 36
      Bergstra, J.; Yamins, D.; Cox, D. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms; SciPy: Austin: TX, 2013; pp 1319. DOI:  DOI: 10.25080/Majora-8b375195-003 .
    37. 37
      Młyński, D.; Bugajski, P.; Młyńska, A. Application of the Mathematical Simulation Methods for the Assessment of the Wastewater Treatment Plant Operation Work Reliability. Water 2019, 11 (5), 873,  DOI: 10.3390/w11050873
    38. 38
      Hawari, A.; Alkadour, F.; Elmasry, M.; Zayed, T. Simulation-Based Condition Assessment Model for Sewer Pipelines. J. Perform. Constr. Facil. 2017, 31 (1), 04016066,  DOI: 10.1061/(ASCE)CF.1943-5509.0000914
    39. 39
      Niu, S.; Liu, Y.; Wang, J.; Song, H. A Decade Survey of Transfer Learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1 (2), 151166,  DOI: 10.1109/TAI.2021.3054609
    40. 40
      Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M. A. Transfer Learning: A Friendly Introduction. J. Big Data 2022, 9 (1), 102,  DOI: 10.1186/s40537-022-00652-w
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsestwater.4c00608.

    • Description of each Python file in the GitHub Repository (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.