Using design of experiment and Python for GaN growth experiment

10 min readFeb 26, 2021

For some reason, throughout my graduate study I was blissfully unaware of the method called “Design of Experiments”, or DOE for short. It wasn’t until later, when I was looking for a job that I found out that most of the openings at semiconductor foundries or production lines usually require candidates to be familiar with DOE principles.

Since I was curious about the whole thing, I decided to do a quick Google search to find more information about it. After watching an informative YouTube video from Gemba academy, I found out that the DOE method is very useful even for someone doing academic research like me.

The idea

The main idea behind DOE is to characterize and optimize a system — can be a reactor, sputtering system, growing plants, or even baking a cake — in a systematic way. The way we usually optimize our system’s output is by deciding on a set of baseline parameters, and change one factor at a time (OFAT) to see how each parameter affect the output. From here, we have some idea about which set of parameters give the best outcome.

While this method may work, it doesn’t take into account the interaction between our parameters. For example, if we vary temperature and pressure individually, we may not detect synergistic effects between the two if we only change one parameter at a time.

This will be another challenge during optimization, where we attempt to find the parameters that maximizes (or minimizes) our output. For a system with a large number of parameters, it can be difficult to perform optimization as using OFAT usually does not scale well.

If we have an idea about how our system works, we may try to optimize the process by tweaking some parameters, maybe one or two at a time, and change the parameters for the next experiment based on the current result. This is called best-guess approach. While it gives reasonable results (I have to admit that throughout my PhD I kind of did my optimization this way), it doesn’t always guarantee the best solution, and there’s also the possibility that we just end up changing parameters repeatedly without succeeding.

This is illustrated in the figure below for an experiment with two factors, X1 and X2. We change X1 until we get the best output before we start tweaking X2. Then we iterate the process, where we change X2 until we get the best output before changing X1 again. In the end, we didn’t get the optimum parameters that gives the best result, and it will be too time consuming if we do this with three or more factors

Process optimization using OFAT and best guessing

If you are interested to learn the procedure in detail, I really recommend taking a Coursera course called Experimentation for Improvement, where the theory is discussed in more detail and they have code-along sessions using R. Since I already know how to use Python and learning R is not my top priority right now, I found out how to do the same thing using Python after searching around for a bit. Another consideration is that Python is free compared to commercial tools such as Minitab, which will be costly if you don’t have access to it.

For a deeper understanding of the theory, I’m using the book “Design and Analysis of Experiments” by Douglas C. Montgomery

The system in question

As my guinea pig, I’m using the deposition system that I’m currently in charge of: the magnetron sputtering epitaxy (MSE) chamber, Nidhögg.

Nidhögg system,belonging to Linköping University’s Thin Film Physics Division

The system is an ultra-high vacuum sputter epitaxy system for the deposition of GaN. If you are interested to learn more about it, here’s a review paper (which I wrote, wink wink) about GaN sputtering in general. Currently, I’m trying to figure out which set of parameters would give the highest GaN film deposition rate.

Screening experiment

To begin our DOE process, we begin by probing the characteristics of the system using a screening experiment. Ideally, depending on the number of factors that we want to test (k), we need to perform 2^k experiments to properly identify the strength of the main effects and all the interaction terms between the effects.

For the initial screening, however, we can reduce the number of experiment we perform at the cost of reduced resolution. In DOE, you can think of resolution as how accurate your model of the system is. While we want to have the maximum resolution possible, it is also worth noting that in real-world condition conducting experiment can be costly and time consuming. Therefore, it’s a matter of tradeoff — whether the increased resolution is worth the additional cost or not.

Factors, level selection, and coded level

For the MSE experiment, I would like to investigate how each process parameter affects the growth rate of the GaN film.

First, I will select which factors I would like to investigate, and determine two level for each factor — one low and one high. Note that when doing this screening experiment, it’s a good idea to include as many factors as possible although it will result in less resolution. In my case, my factors and levels are:

Total chamber pressure: 10 mTorr and 20 mTorr
Percentage of nitrogen in process gas: 50% and 70%
Substrate bias: 75 V and 125 V
Ga target power: 10 W and 16 W

These are the four factors which I would like to investigate. Note that while I could have included substrate temperature in this experiment, I didn’t do so as one of the goal of MSE growth is using low temperature growth. In this case the temperature is kept constant at 700 °C for all experiment runs.

For the level selection, note that the levels should be sufficiently separated to avoid the effects of system noise, but close enough to prevent nonlinearities (for the initial screening we will fit the system response into a linear model). Another consideration, is of course the limits of your system. In my case, I know that at around 20 W of power the liquid Ga target would form bubbles resulting in system instabilities which is why I put the high level of target power at 16 W.

Next is to code all the low and high level into -1 and +1. This is done so that we can determine the relative importance of each factors at this parameter range. For example: chamber pressure ranges from 10 mTorr to 20 mTorr. Therefore, 10 mTorr is -1, and 20 mTorr is +1. In addition, 15 mTorr would be 0, and 5 mTorr would be -2. I don’t feel like writing a formal formula but I hope you get what I’m trying to say 😛.

The half factorial table

We begin by making a half-factorial table. As mentioned earlier, a full factorial for four factors would have resulted in 16 experiment runs. In my case, for the initial trial I decided to perform a half-factorial with 8 runs instead. If by any chance a full factorial is needed, we can easily append the experiment plan with the remaining 8 experiments.

The table is set-up so that all possible combination of factor levels can be performed. For half-factorial design, one of the factors is constrained so that its value is a multiplication of the other factors.

To implement DOE using Python, I’m using the DOE package dexpy for the factorial design, and statsmodels for calculating the main effects and the interaction terms.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import dexpy.factorial
import dexpy.alias#build screening experiment
gan_growth = dexpy.factorial.build_factorial(4,2**3)
gan_growth.columns = ['N2_composition', 'Substrate_bias','Pressure','Target_power']#equation used for linear regression.
eq = "("+" + ".join(gan_growth.columns)+")**2"#baseline
baseline = [[0,0,0,0]]#append baseline to factorial table
gan_growth = gan_growth.append(pd.DataFrame(baseline, columns = gan_growth.columns))
gan_growth.index = np.arange(0,len(gan_growth))
gan_growth

From this code, we will get the following table:

Notice that the value for target power is basically a multiplication of nitrogen composition, substrate bias, and pressure. In addition, I appended the baseline parameter (all values set to 0).

It is useful to see the table in actual unit for experiment planning

actual_lows = {'N2_composition':0.5, 'Substrate_bias':75,'Pressure':10,'Target_power':10}
actual_highs = {'N2_composition':0.7, 'Substrate_bias':125,'Pressure':20,'Target_power':16}
actual_design = dexpy.design.coded_to_actual(gan_growth, actual_lows, actual_highs)
actual_design

Which gives us the table with the actual parameter values

Half-factorial design with real parameter values

The last step before performing the experiment is to randomize the experiment order. By doing this, we ensure that the experiment results are not confounded by other factors. Notice that experiment with standard order 4–7 uses high (+1 or 70%) value for nitrogen composition. If we don’t randomize the experiment order, then the effect of nitrogen composition will be confounded by other effects which may arise as we use the chamber e.g. sputter target depletion, deteriorating chamber condition, etc.

experiment_order = actual_design.sample(frac=1)
experiment_order

Actual experiment order after randomization

The experiment order looks random enough now, so we will go ahead with this experiment order. Note that typically the baseline (no. 8) experiment is run first before running the rest of the experiments.

Determining the main effects and interaction terms

Running the half factorial experiments including the thickness measurement took me about two weeks — which highlights the versatility of doing a half-factorial design, as doing a full factorial would result in almost a month of experiments. We then obtain the following table:

Half-factorial design table including the result

Now we will do a linear regression, where we calculate the strength of each factor and whether there is any interaction term between the factors. For a two factor experiment, the model is as follow:

Where y is the output, β0 is the intercept term (calculated by averaging all the results), β1 is the coefficient of factor X1, β2 is the coefficient of factor X2, β12 is the interaction term between factor X1 and X2, and ∈ is the error term. This equation can be expanded for a k-factor experiment and taking into account the interaction between terms. Typically the interaction term for three or more factor are negligible, so we can chooose to ignore that.

The regression using least square method can be done in Python by using ols from the statsmodels package.

from statsmodels.formula.api import ols
lm = ols("Thickness ~" + eq, data = gan_growth).fit()
print(lm.summary2())

Which gives us the following table:

From this table we can see the relative effect of each factor towards the thickness of the GaN layer. Positive coefficient means increasing the factor results in increased output (thickness), and vice versa. At this point, I would like to point out that you can’t just rely on the DOE, but you should also have an understanding of the system properties to make sure that your results make sense. From this table, we learn that:

Increasing nitrogen composition in the process gas reduces the thickness of the GaN film. This means that at our operating point, the growth is limited by the amount of gallium delivered to the film
Increasing substrate bias helps grow thicker film. Currently the reason is still under investigation, but it is thought to have something to do with increased sputtering yield.
Increasing pressure reduces the thickness. We initially believed that increasing the pressure would help increase the growth rate as it will help supply more ions and active species. However, from our result it appears that the pressure is too high and it ends up reducing the ionization rate of the process gas, hence the lower growth rate. This is an example of where it is important to understand the process to make sense of the result.
Increasing target power increases the thickness, which is to be expected as more gallium is sputtered with more target power.

There is also some interaction terms. However, you may notice that some of them has the exact same coefficients. This is due to the aliasing effect from doing a half-factorial design. So, is our half factorial design good enough, or do we need to do a full factorial?

The model predicts the baseline with ~14 nm difference. This difference is due to nonlinearities in the system. However, 14 nm is well below the coefficient of the main effects, so it is acceptable.
The interaction terms are also well below the coefficient of the main effects.

Therefore, our half-factorial screening design is good enough to illustrate the system. Now that we have a starting point from the first screening, we can decide on how to optimize our system.

The next step is to change the system parameters in the direction of steepest incline (or decline). This can be easily calculated by looking at the coefficients of the main effects and calculating the ratio between them (formally this would be calculating the gradient of our fitted linear model). Therefore, any change to the parameters should have a ratio of -1:0.7:0.6:1 (in coded units) for nitrogen composition, substrate bias, pressure, and target power respectively. The parameters should be changed until the error between the result and the regression model is comparable to the coefficient of the main effect. Afterwards, another set of factorial design experiment can be done to recalculate the regression model.

Once the system shows very strong nonlinearity, the factorial design and regression model can be modified to include quadratic term. At this point, it is possible to calculate the location of the peak, and achieve the best combination of parameters.

While this might sound tedious, in the long run it is more effective compared to doing OFAT. Currently I also haven’t done the second set of experiment yet due to some changes with the current project that I’m working on.

So there you have it, a basic way of doing DOE using Python, which will help you optimize your process. Let me know if you have any questions or suggestions!

Using design of experiment and Python for GaN growth experiment

Written by Aditya Prabaswara