Data-Driven Approach to Grade Change Scheduling Optimization in a Paper Machine

: This paper proposes an e ﬃ cient decision support tool for the optimal production scheduling of a variety of paper grades in a paper machine. The tool is based on a continuous-time scheduling model and generalized disjunctive programming. As the full-space scheduling model corresponds to a large-scale mixed integer linear programming model, we apply data analytics techniques to reduce the size of the decision space, which has a profound impact on the computational e ﬃ ciency of the model and enables us to support the solution of large-scale problems. The data-driven model is based on an automated method of identifying the forbidden and recommended paper grade sequences, as well as the changeover durations between two paper grades. The results from a real industrial case study show that the data-driven model leads to good results in terms of both solution quality and CPU time in comparison to the full-space model.


INTRODUCTION
Modern industrial production processes are typically complex, multistage processes, and much research has been conducted in order to improve efficiencies of these processes using analytical techniques, such as mathematical optimization.The application scope for optimization ranges from large-scale supply network planning to short-term production scheduling of individual machines.On the broader scope, Papageorgiou 1 provides a review of supply chain management and optimization.For short to medium terms, Maravelias and Sung 2 review the methodologies on production planning and scheduling.
Scheduling can be considered as a key element of enterprisewide optimization 3 and has been an area of intense research activity over the last 20 years.The major feature in the modeling of process-oriented scheduling problems is the time representation, which can be discrete or continuous. 4,5This subject has been widely studied in many scheduling applications 6−10 and concerns the division of the time horizon of interest into a specific number of slots.The discrete-time representation divides the time horizon into slots of fixed duration, whereas the continuous-time representation relaxes such an assumption.The major advantage of discrete-time approaches against their continuous-time counterparts is their simplicity and a tighter LP relaxation. 11As discrete-time approaches need many more time slots to account for accurate data, generating larger problems, the continuous-time representation will be used in this paper.
The objective of this work is to propose an optimization framework for single-stage production scheduling of paper reels in a paper machine producing several paper grades with sequence-dependent changeovers.Changeovers occur when two different tasks are processed in the same unit in consecutive process operations.The changeovers are often associated with changing the operating conditions or with the cleaning of the equipment.−15 Oh and Karimi 12 presented a novel mixed integer nonlinear programming (MINLP) formulation to solve the single machine economic lot-scheduling problem on a single multiproduct production facility.Liu et al. 13 developed a hybrid discrete− continuous-time mixed integer linear programming (MILP) model for the medium-term planning of single-stage multiproduct plant with one processing unit.In their model, the planning horizon is composed of a discrete number of weeks with a known demand, and each week is modeled by a continuous-time approach.
The problems, occurring in the paper industry, share similarities with many other production scheduling problems.For example, when changing the paper grade being produced on a paper machine, the production process does not stop, but the paper being produced during the transition may not meet customer quality criteria and is typically converted back to pulp for reuse.The time required to make the transition between the products depends on the starting and ending grades of the grade change.These sequence-dependent changeover times, or costs, are a common feature of paper production scheduling models.
Keskinocak et al. 16 developed a relatively comprehensive scheduling approach for paper production on a company scale, including order allocation to mills in addition to the actual production scheduling (including sequence-dependent setup times) and trimming on individual machines.Santos and Almada-Lobo 17 addressed the combination of paper production scheduling with sequence-dependent setup times and material flows in an integrated pulp and paper mill with a twoweek planning horizon.Examples of sequence-dependent setup costs can also be found from outside the paper industry.For example, Tang et al. 18 presented a scheduling solution for the iron and steel industry using a multiple traveling salesman approach.
This article has been motivated by a real-world application from the Finnish paper making industry.The specific goal is to explore how historical data can be used to narrow down the decision space of a rigorous scheduling model addressing the sequence-dependent changeovers.The contribution of this work is threefold: first, we develop a continuous-time MILP model for the production scheduling of different paper grades in a real-world paper machine.The model allows the consideration of multiple intermediate due dates in the production process.Second, we use automatic methods to derive the changeover durations between two paper grades as well as define the forbidden and recommended changeovers.Third, we use the past "best practices" to guide the optimization in order to reduce the search space of the fullspace MILP model.We show that by combining data analytics and rigorous scheduling models, more efficient decisions can be made while incorporating the rule-based operations into the optimization model.
In the next section, we give a brief description of the papermaking process.Section 3 develops a novel MILP formulation based on generalized disjunctive programming 19−23 for the production scheduling of different paper grades on a paper machine.Section 4 applies data analytics techniques to extract information from historical data to support the full-space model developed in Section 3. Section 5 provides numerical examples to validate the proposed decision support tool, and conclusions are given in Section 6.

PROBLEM STATEMENT
Figure 1 presents a schematic overview of the papermaking process. 24,25First, the paper machine is fed by pulp, the main ingredient of paper, to produce large rolls of paper called jumbo reels, which are often of fixed width for each paper machine.The jumbo reels then continue to another machine called winder to be cut into rolls of smaller dimensions.The rolls either go straight to the customer or are cut into sheets at the cutters.In the end, the final packaged products are dispatched to customers or distribution centers.In this paper, we will only focus on the production of different grades of jumbo reels that are typically around 8−10 m wide and 25−35 km long.
Because of market requirements and global competition, more paper grades are manufactured nowadays, and consequently frequent grade changes are inevitable in the paper machine.The grade change is a transition from one set of operating conditions (e.g., speed, stock flow, and steam pressure) to another and requires from few minutes to hours depending on the subsequent grades.As the paper machine is never idle, the paper produced during the transition may not be sellable due to quality reasons and is typically recycled again into the pulper.Therefore, minimizing the grade change transition can lead to a significant reduction in production loss and the efforts and energy needed to recycle off-spec waste.
Given the number of jumbo reels needed to be produced during a specific time horizon, we aim to sequence the production of paper grades in the paper machine while minimizing the grade change transitions.It is a challenging objective function to deal with because its linear relaxation is typically zero, often far away from the actual MILP solution.It is worth mentioning that, in practice, paper machine operators usually use a cyclic scheduling strategy to meet customer orders and consequently, historical process data can be used to reduce the domain of variables and constraints, which has a profound impact on the CPU time of the full-space MILP model.Industrial & Engineering Chemistry Research

MATHEMATICAL FORMULATION
We now present a continuous-time MILP formulation for production scheduling of jumbo reels in a paper mill.Such an MILP model (hereafter referred to as the full-space model) is capable of finding the optimal sequence of paper grades in a paper machine in a single step.
3.1.Sequencing Production Runs.We adopt a continuous-time formulation with the common reference grid given in Figure 2. It is made of a set of chronologically ordered r ∈ R = {1,2,...,|R|} production runs.Therefore, run r must be performed after run r − 1.As the continuous variable SR r represents the starting time of production run r and LR r its duration, we will have eq 1. Equation 2 enforces that all production runs are located within the given time horizon h.Note that the start time of the first run is a known datum, that is, SR 1 = 0.
3.2.Size and Length of Each Run.Let the Boolean variable X i,r = True indicate if run r is processing grade i, and let X r no i = True if no grade is being processed during run r.In the former case, the processing of grade i is associated with the production of a certain number of jumbo reels (V i,r ).In the latter case, the variables V i,r will be equal to zero.Notice that the disjunction in eq 3 is exclusive, meaning that at most one grade can be processed during each run r ∈ R.
Equations 4 and 5 are obtained using the convex hull reformulation of disjunction 3. Note that the binary variable x i,r that has a one-to-one relationship with the Boolean variable X i,r can be assigned to X i,r to transform disjunction 3 to an MILP representation.
Equation 6 computes the length of a run r ∈ R processing grade i.
The number of production runs (|R|) as a tuning parameter of the proposed model should be postulated by the user.Because the resulting problem for |R| is a relaxation of the one for |R| − 1, a typical procedure is to increase the cardinality of the set R by one until no better schedule is discovered.In case the number of predefined runs is greater than the optimal number of runs to be performed, the problem will result in some dummy runs (X r no i = 1 or ∑ i∈I X i,r = 0).To reduce the symmetry, dummy runs should be confined to the end of ordered set R, see eq 7.
This can be reformulated into eq 8.

∑ ∑
3.3.Grade Change Transition and Forbidden Grade Changes.When two consecutive runs process two different grades, a transition period occurs between the runs.If grade i has been processed at run r − 1, and run r is processing grade Industrial & Engineering Chemistry Research i′,there will be a transition time TT r (= τ i,i′ ) between the two runs.We have thus the following conditions The reformulation of logic proposition 9 and the convex hull reformulation of the disjunction 10 result in the following constraints Note that constraints 11 and 12 can be replaced by a single inequality equation in constraints 13 by combining eqs 11 and 12 .
Note that the binary variable x i,i′,r GChange in eqs 11 and 12 can be treated as a continuous 0−1 variable.This is simply because (i) when both x i,r−1 and x i′,r take one, the constraint 11 enforces x i,i′,r GChange to be 1, and (ii) when at least one of the variables x i,r−1 and x i′ ,r is zero; consequently x i,i′,r GChange must be zero.The latter is due to the fact that variable TT r has a profound impact on the value of objective functions in eqs 29 and 30 which are minimized.Except Ex 2, we will use constraints 11 and 12 in our implementations.
The start time of run r should be after completing the end of the previous run r − 1 and the subsequent changeover operation between the two runs Because of prohibitively long transition times, some grades cannot be processed in consecutive runs.This condition is imposed by the logic proposition 15, which is reformulated into eq 16 3.4.Inventory and Due Date Constraints.Let us suppose that the horizon of interest is partitioned into a number of time periods t ∈ T = {t0,t1,t2,...} and ep t is end of period t (note that ep t=t0 = 0, see Figure 3).The aim is to produce a predefined number of reels i during each time period.Clearly, when a run starts, it should complete in exactly one of the time periods.If run r is completed during time period t (Y r,t = True), its completion time SR r + LR r should satisfy ep t−1 ≤ SR r + LR r ≤ ep t .These conditions can be described through the following disjunction The convex hull reformulation of the disjunction gives rise to the following due dates timing constraints Note that it is not necessary for a run that has started at period t to be completed in the same run.This is because eqs 19 and 20 are only applied to the completion time (SR r + LR r ) of a run r and not to the start time of that run (SR r ), meaning that y r,t = 1 can only lead to ep t−1 ≤ SR r + LR r ≤ ep t .For instance, run 4 processing grade 17 in Figure 3 completed at time period 2 starts during the first time period.
Alternative due date timing constraints can be developed using the big-M constraints in eqs 21−23.They describe that if run r is processing a grade, it should complete in one of the time periods and its completion time should satisfy ep t−1 ≤ SR r + LR r ≤ ep t .If, run r is a dummy run, its completion time is relaxed.As constraints 18−20 result in a tighter linear relaxation, they will be used in our implementations. , The inventory balance at the end of each time period is defined in eq 24, stating that the number of reels of grade i at the end of time period t (IV i,t ) is equal to the number of reels i at period t − 1 plus the number of reels produced during time period t minus the numbers sent to the downstream processor (nw i,t ).To satisfy the market demand on time, the number of jumbo reels of grade i sent to the downstream processor should be exactly equal to nw i,t , the number of jumbo reels of grade i that needs to be produced during period t.The initial inventory of each grade is a known parameter as stated by eq 25.
Note that the bilinear term V i,r y r,t (=VY i,r,t ) can simply be linearized by the following equations.
Figure 3 is a simple example illustrating constraints 18−25.The time horizon h = 72 h has been divided into three time periods t1, t2, and t3, with each period being composed of one day (24 h).There are a total of 8 production runs during the scheduling horizon, and each run is completed in exactly one Industrial & Engineering Chemistry Research time period.Let us assume that from the planning level, we need 6, 8, and 24 jumbo reels of grade 7 and 0, 20, and 0 of grade 17 during periods t1, t2, and t3, respectively.As can be seen from Figure 3, 14 reels of grade 7 are produced during the first period and none during the second one.The initial inventory of grade 7 is zero, and during the first period 6 reels of it should be sent to the downstream processor.Thus, at time 24 h, the inventory of grade 7 will be 8 reels, which are used in the second period.The production of grade 7 is restarted at time point 48 h to satisfy 20 units of its demand needed during the third period.Run 4 processing grade 17 starts during period t1 and finishes within the second period t2.Note however, we can only calculate the number of reels of grade 17 at the end of run 4, which happens within the second period, meaning that the inventory of grade 17 at the end of period t1 will be 0. In other words, the 20 units of grade 17 produced during run 4 will be only available at the end of the second period.

Industrial & Engineering Chemistry Research
3.5.Objective Function.Two alternative objective functions are considered: minimum makespan in eq 29 and minimum cost in eq 30 including inventory cost, grade change transition cost, and the cost for executing a production run.
For the objective of minimization of makespan, the continuous variable MS should be the upper bound on the end time of last production run (r = |R|), as given in eq 31.

DATA ANALYTICS
Historical process data contains information is useful for defining the future operation of the process.The paper machine we consider in this work produces 20 groups of grades, each of which has a specific paper weight.Further, each of these groups contains various grades with different colors, brightness, coating material, and so on.However, in order to keep the size of the problem manageable, we only consider the groups of grades and refer to them simply as grades.In this section, we analyze a dataset recorded on the paper machine during one year of operation from the beginning of July 2017 to the end of June 2018.We extract two types of information from the dataset: (i) the number of pairwise grade transition occurrences and (ii) their average duration.We consider eight signals, which correspond to paper weight and moisture and coating weight measurements, in different parts of the paper machine.
A historical dataset of such process may either be labeled (i.e., the grade identifiers are recorder) or unlabeled.In this work, we consider the worse of these situations, that is, an unlabeled dataset.We demonstrate that useful information can still be extracted, even if the grade identifiers are unknown, using data clustering.In the following, we describe our signal processing approach to identify the grade transition periods (Section 4.1) and our clustering approach to summarize the information into useful tables (Section 4.2).
4.1.Signal Processing.The dataset contains both measurement and set point signals for each of the measured quantities.The set point signal is the target value at which the control system aims to retain the measurement signal.The sampling interval of the measurement signals is around 30 s.However, these samplings are not synchronized and contain idle periods, during which no measurements are recorded.We apply the Kalman filter to the measurement signal, in order to obtain a representative continuous signal.Figure 4 shows the measurement, its Kalman filtered and the set point signal for all eight measured quantities during a representative operation window of 5 h.For Kalman filtering and general data processing, we use the Python modules Pykalman 26 (version 0.9.5) and Pandas 27 (version 0.23.0), respectively.
Our procedure to identify the transition periods from these signals is the following.First, for each measured signal, we flag time periods where the absolute difference between the Kalman filtered measurement and the set point signal exceeds a threshold of r a = 5% anomaly, that is, 1.96 times the standard deviation.In addition, we flag all time periods where the set point signal remains unchanged for less than a predefined minimum duration of a batch, t batch .We use the value of t batch = 30 min in this work.The operators of the paper machine indicate this as the minimum time to produce a single grade.Figure 4 shows the flagged time periods for the measured quantities.
Second, the time periods at which any of the signals is flagged are marked as potential transition periods (see the second last subplot of Figure 4).Here, as the sampling of measurement signals are not synchronized, the missing values are forward filled.In reality, all signals might temporarily lay within the anomaly threshold of r a soon after the set point change but exceed the threshold after a short time (cf., e.g., a controlled variable overshoot).Therefore, we merge two potential transition periods into one, if they are separated by less than a settling time of t settle = 2 min.In other words, the end of a potential transition period is marked as the time point, from which onward, no flagging occurs for 2 min.
Potential transition periods may include instances where a disturbance causes the flagging, instead of an intended grade change.In order to exclude such instances, we, finally, mark those of the potential transition periods, during which the set point signal of the representative paper weight signal changes, as transition periods (see the last subplot of Figure 4).The procedure avoids marking short-term disturbances falsely as transition periods.An example of such an instance is the first potential transition period in Figure 4.It is worth noticing that, in our approach, if a disturbance causes flagging within the settling time t settle of a transition period, it is included in the transition period.Such disturbance, occurring soon after a grade transition, might well be caused by the transition.Therefore, its inclusion in the transition period is also appropriate.
Thus, our signal processing approach involves three hyperparameters, which need to be chosen before analyzing a dataset.The hyperparameters are the anomaly threshold r a , the minimum batch duration t batch and the settling time t settle .
4.2.Data Clustering.As a result of the signal processing, described in the previous section, we obtain the start and end times of the transition periods.The time windows in between the transition periods are referred to as production periods.We enumerate the production periods between the transition periods and determine the average value for the representative Industrial & Engineering Chemistry Research paper weight during each period.We then cluster these values using k-means clustering 28 with a priori defined number of 20 clusters.The k-means clustering is implemented using the Python module Scikit-learn 29 (version 0.19.1). Figure 5 visualizes the clustered production periods, according to their representative paper weight, as well as the centers of the clusters.
We then assign each transition their source and destination grades.The grade identifiers are arranged in an increasing order of representative paper weight.Finally, we filter transitions with the same source and destination grade or unrealistically long duration (>3 h).With this procedure, we are able to identify 498 grade transitions in the dataset.Figures 6 and 7 show matrices of transition occurrences from a source to a destination grade and their average durations, respectively.

Constraints Derived from Data Analysis.
The main goal of this work is to show how data analytics methods can support the solution of large-scale scheduling models.Figure 8a shows all possible sequences between grades (some sequences are forbidden), whereas Figure 8b is a directed graph based on the grade change table in Figure 6.It can be seen that historical data help to transform a dense graph into a sparse one, by omitting a significant number of links (sequences) between grades.
Taking into consideration the sparse graph, one can see that if, for instance, run r is processing grade 1, then the next run should process either grade 3 or grade 5. Now if we assume that I i is a set of grades that can follow grade i, constraint 33 which is derived from logic proposition 32 can be added to the problem formulation.
By incorporating constraint 33 in the full-space model, we can guarantee that no forbidden sequence will occur during the production sequence.Note that constraint 33 also ensures that any dummy runs will place at the start of the production sequence.Moreover, constraint 33 enables us to reduce the domain of variable x i,i′,r GChange , replacing i′ ∈ I by i′ ∈ I i .
Note that again we can replace constraints 34 and 35 by a single inequality constraint 36.Except Ex 2, we will use constraints 34 and 35 in our implementations.

COMPUTATIONAL RESULTS
To illustrate the capabilities of the proposed data-driven model, a case study based on an industrial size test case with different demand scenarios has been considered.In all cases, the paper machine produces 20 different types of paper grades (G1−G20), and the processing time of a jumbo reel of paper is 16.70 min (pt i = 0.2778 h).The MILP models were implemented in GAMS 24.9.1 and solved with CPLEX 12.7.1 running in parallel deterministic mode using up to four threads.The hardware consisted an Intel i5-7300U (2.60

Industrial & Engineering Chemistry Research
GHz, 8 GB of RAM), running Windows 10, 64-bit operating system.The termination criteria were either a relative optimality tolerance of 10 −6 or a maximum computational time of 3 h for data-driven model and 5 h for the full-space model.
Remark: Note that, our observation on historical data indicates that no two consecutive runs have processed the same grade and this is why the diagonal elements of grade change table in Figure 7 are zero.This limitation, which is not considered in the full-space model, will be relaxed for the datadriven model in Ex 1 and Ex 4, replacing 0 with 1 in the diagonal of grade change table in Figure 6.
5.1.Ex 1.In this example, the aim is to manufacture a certain number of jumbo reels during a horizon of 300 h.The number of grades that should be produced is given in Table 1.The grade change transition duration between two grades (based on Figure 6) is given in Table 2.For a pair of grades that there is no link between them in the sparse graph in Figure 8b, the grade transition time is set to 3 h.Some grade transitions indicated with the cross mark (×) in Table 2 are forbidden.We assume that the minimum and maximum number of jumbo reels that can be produced during a process run are 4 and 240, respectively.Moreover, the fixed cost for performing a run is $1000, and the cost resulting from a 1 h transition is $10,000.The unit inventory cost is set to zero.Note also that in this example, we allow two consecutive runs to process the same grade, leading to replacing 0 with 1 in the diagonal of the occurrence matrix in Figure 6.
Here, we first solve Ex 1 for makespan minimization using data-driven model with both big-M and convex hull reformulations of due dates timing constraints.As can see from Table 1, 14 paper grades should be manufactured.Because each run can process one grade at a time, the minimum number of process runs |R| is fourteen (|R| ≥ 14).The results in Table 3 indicate that both big-M and convex hull need the same number of discrete and continuous variables as well as constraints for the same number of process runs.However, the convex hull reformulation performs quite well when we use constraints 34 and 35 instead of eq 36.The convex hull also exhibits a tighter LP relaxation, as expected.In the remainder of this paper, we will use the convex hull type of due dates timing constraints.
Table 4 shows the model statistics and computational results for the full-space and data-driven models.For makespan minimization, it can be observed that both full-space and datadriven models yield the same solution with 14 runs and confirm it with 15 runs.However, the data-driven model is much faster, that is, about three times faster with 14 runs and six times faster for 15 runs.From the results, both formulations need the same number of binary variables, but there is a considerable difference in the number of continuous variables and constraints.The latter is because in the data-driven model, the set I in the domain of changeover time constraint 11 and variable x i,i′,r GChange is replaced by I i , that avoids duplicating the number of continuous variables and constraints.
For cost minimization, both models generate the same cost of $41,961.1 with 14 runs in a reasonable CPU time.The datadriven model performs quite well with 15 processing runs, being able to improve the solution to $40,213.8 in 231.2 s.This represents a one-order-of-magnitude time saving compared to the full-space model, which needs about 2 h to reduce the cost to $40,213.8.To prove the solution optimality, both models with 16 runs are again solved.From Table 4, it follows that the same optimum is found with both models.However, the data-driven model is roughly two times faster than the full-space model.
Figure 9 depicts optimal schedules for Ex 1 using data-driven and full-space models.The sequences of grades in both models when minimizing cost are identical (see the middle part of Figure 9).When minimizing the makespan, the grade sequences are quite different, but correspond to the same makespan and total changeover transition times.This indicates that the problem can suffer from symmetric solutions.
5.2.Ex 2. Ex 1 is tackled again in Ex 2, but this time, we add an intermediate due date for demand at time 168.Therefore, Ex 2 considers two due dates for grades to be manufactured, with the first one being at 168 and the second one at 300.The number of grades that should be produced in each period is given in Table 5.All the values presented in Table 5 (also in Table 7) are generated based on historical data.In this    Table 6 shows the model statistics and computational results of Ex 2 for the full-space and data-driven models.For makespan minimization, we again start by solving the models with 14 runs.As can be seen from Table 6, setting |R| = 14    Industrial & Engineering Chemistry Research leads to an infeasible MILP using data-driven model and returns no solution in 5 h using the full-space model.The same situations are observed when considering 15 process runs.
Increasing the number of process runs to 16, both models yield roughly the same feasible solution.The data-driven model, however, performs very fast, enabling to find a makespan of 290.48 in 680 s, that is, one order of magnitude faster than the full-space model that cannot close the optimality gap within 5 h.With 17 runs, the data-driven model generates the same makespan encountered in 16 runs, but the CPU time increases by a factor of two.However, the full-space model with 17 runs did find a slightly better solution (289.93 vs 290.48) up to maximum time limit.This is because unlike the data-driven model, the full-space model is allowed to process the same grade in two consecutive runs.This is apparent from Figure 10 where the full-space model for minimum makespan processes grade 13 in two consecutive runs 11 and 12 while the datadriven model processes the same grade in run 4 and run 11.
For the cost minimization, the full-space model generates a solution of $58,536.1 with 16 runs but cannot close the

Industrial & Engineering Chemistry Research
Table 8 shows the model statistics and computational results of Ex 3. When choosing 19 runs, the solver is unable to return a feasible solution using the full-space model in the computational time limit of 36,000 CPUs.The data-driven model finds a makespan of 355.39 and a cost equals $45,844.44 within 3 h of computational time.With 20 runs, the full-space model again fails to generate any feasible solutions for the problem.No improvement is observed using the data-driven model with 20 runs.The best-found schedules for Ex 3 using the data-driven model with 19 runs is depicted in Figure 11.
5.4.Ex 4: Sensitivity Analysis on the Number of Due Dates.In this example, we analyze the impact of the number of demand points (due dates) on the computational effort of the data-driven model.The aim is to produce 976 units of jumbo reels with minimum cost.We consider seven scenarios for the production of 976 jumbo reels during a time horizon of two weeks featuring multiple due dates (|T| = 1,2,...,7).Data for each scenario is presented in Tables S1−S7 (see the Supporting Information).The initial inventory is set to zero for all grades, except for the grades 19 and 20, which is considered to be 70 and 60, respectively.Table 9 shows the computational results for Ex 4. When there is only one due date (|T| = 1), the data-driven model requires 10 production runs to generate the first feasible solution worth $27,783.3.By adding an intermediate due date at time point 168 h, the minimum number of runs raises to 13 and the CPU time increases to 1939.7 s.When demands need to be considered in three due dates (|T| = 3), the data-driven model cannot close the optimality gap within 3 h of CPU time.This also happens when the number of due dates is considered four and seven (| T| = 4, 7).Overall, it can be seen from Table 9, both the solution quality and CPU time highly depend on how the required number of jumbo reels (967) is distributed among different due dates.

CONCLUSIONS
This paper has presented a data-driven decision support tool for the production scheduling of different paper grades in a paper machine.Some of problem constraints were first modeled using disjunctive programming and propositional logic and then translated into a MILP model that relies on continuous-time representation.We identified pairwise grade transition occurrences and their durations from a historical dataset.The former led to constraints appended to the fullspace model, which reduce the domain of variables having a profound impact on the CPU time of the full-space model.The data-driven model was illustrated by four real-world test problems and was shown to be very efficient, allowing the solution of a large-scale problem where the model fails to generate even a feasible solution.In future work, the proposed data-driven model will be generalized to tackle the long-term planning and scheduling problem arising at an integrated pulp and paper mill.
Seven scenarios for the production of 976 jumbo reels during a time horizon of two weeks featuring multiple due dates (PDF)

Figure 1 .
Figure 1.An overview of the papermaking process.

Figure 4 .
Figure 4. Example time window of transition period identification.

Figure 5 .
Figure 5. Clustering of production batches based on the average value of the representative paper weight signal.The centers of the clusters are marked by vertical black ticks.The colors indicate the points that belong to the same cluster.

Figure 6 .
Figure 6.Occurrences of changes from a source to a destination grade.

Figure 7 .
Figure 7. Average transition durations from a source to a destination grade.

Figure 8 .
Figure 8. Data analysis helps to omit a significant number of links between grades.

Figure 9 .
Figure 9. Schedules for Ex 1 using data-driven and full-space models.

Figure 10 .
Figure 10.Schedules for Ex 2 using data-driven and full-space models.

Table 1 .
Number of Jumbo Reels Needed during the Scheduling Horizon

Table 3 .
Results of Ex 1 for Minimum Makespan Using Big-M and Convex Hull

Table 4 .
Computational Results for Ex 1 a Relative gap.

Table 5 .
Number of Jumbo Reels Needed during Each Period in Ex 2

Table 6 .
Computational Results for Ex 2 a Relative gap.b Run 17 is dummy at termination.
Department of Chemical and Metallurgical Engineering, School of Chemical Engineering, Aalto University, 02150 Espoo, Finland; ABB Power Grids Research, 68309 Mannheim, Germany; Email: iiro.harjunkoski@aalto.fiBinary variables x i,r 1 if grade i is being produced during run r x i,i′,r GChange 1 if grade i is changed to grade i′ during run r (can be treated as a continuous 0−1 variable, see Subsection 3.3 ) y r,t 1 if run r ends during period t Discrete variable VY i,r,t number of jumbo reels of grade i produced by run r during period t Continuous variables LR r duration of run r (h) SR r start time of run r (h) TT r transition time between run r − 1 and r V r number of jumbo reels of grade i produced during run r IV i,t number of jumbo reels of grade i at the end of period t MS makespan