Bayseian Optimization

Welcome to Bayesian Optimization

Ever wish there was a more efficent way to find an optimal set of conditions for your experiment? This module will guide you on how to be able to use Bayesian Optimization to be able to find optimal conditions for the highest yeild using a few given data points.

Our module will be able to help teach about a specific machine learning topic even with NO coding experience. Now, you may be wondering, "What is machine learning?" Machine learning (ML) is subfield of Artificial Intelligence (AI) that can learn from given data without having the explicit programming to do so. ML is used in the engineering, science, and business industries to solve complex problems that can not be easily defined by a model. In this module, a photobioreactor example will be used to help demostrate the topic by predicting an optimal lutein, the product, concentration.

CLICK on the next tab to start learning!

Front page image

NOTE: If the module below is not appearing, try opening this page in a private/incognito window.


Due to the size of the dataset, it may take a few seconds to load.


Jump to Module

What is a Photobioreactor?

A photobioreactor (PBR) is a reactor used to cultivate products from phototrophic organisms, using light as their main source of energy. PBRs are made of transparent materials such as plastics or glass to allow for this light to shine through. Conditions can be easily controlled to keep a healthy environment for the organisms. The parameters of this reactor in this module can be found under the Instructions tab. or by clicking on a ❓ in the interactive module.


About Lutein

Our particular photobioreactor in this model is used to grow lutein. Lutein is a carotenoid which is a red-orange-yellow dye that is found primarily in marigolds. What makes lutein so special is that it can be used in medical treatments. Common treatments are to help prevent cataracts in the human eyes and delay of cancers and macular degeneration. It also has common use in food dyes. However, marigolds are low in lutein content. This has led scientists to try and find other sources of higher lutein content. The source of focus for this module will be microalgae which shows promise to be a more efficient way of extracting Lutein than marigolds. To produce lutein from microalgae, nitrate is used as a nutrient and light is shined into the photobioreactor from either side. However, too much nitrate is not good for lutein production since microalgae produce more lutein under stress. Too much biomass is also not good for lutein production as overcrowding can block the light source.

PBR_image
Image Source: https://www.researchgate.net/figure/Schematic-representation-of-the-photobioreactor_fig1_259772693

How can this be put to action?

The main challenge faced when trying to make microalgea a feasible source of lutein is industrializing it. First off, the microalgae is very sensitive to high temperatures making it difficult to industrialize. The second part, our focus in this module, is optimizing the lutein output. Scaling up is an expensive process, so it is important that the lutein is in high yield and cost efficient. The parameters of the experiemnt must be optimized in order to help solve this industrialization challenge, but how do we find the optimal conditions when experiments are expensive to conduct? This is where Bayesian Optimization comes into play.

Basics of Bayesian Optimization

What is it?

Bayesian Optimization (BO) is an optimization technique that uses expensive trials to quickly arrive at an optimum point in some unknown function. The functions are called black box functions These are systems where the internal function is unknown. .

Why use it?

Since these black box functions have no model, Bayesian Optimization is used to find these unknown optimal values. In the photobioreactor system, a few different conditions will be tested, varying the parameters of each one. This information can be used for Bayesian Optimization to find a likely set of parameters for the optimal lutein extraction.

How does it work?

BO uses a combination of exploration A prediction is made at an unknown set of parameters to find out more about the function. and exploitation Information about known and predicted parameter sets are used to hone in on a possible optimum. . If the parameter set, known or guessed, is considered far away from the optimum, it will not be considered, but if it is considered close to a theoretical optimum, guesses will be made closer to that parameter set. BO is a coherent combination of an acquisition function and a surrogate model. Acquisition function can be thought of making an educated guess based off of the ground work a surrogate model has laid. The surrogate model effectively creates a pseudo function for the acquisition function to guess within.






iteration data

optimizer convergence

situation with best parameters

Surrogate Model

This is used together with the acquisition function to find the optimum set of parameters. The surrogate model’s part of the job is to create a function that encapsulates many possibilities for functions given the data set. Using given data points the surrogate model can create a probabilistic function that is representative of the unknown real function. The surrogate model uses exploration and exploitation by creating a smaller confidence limit near points of known data and creating a larger confidence limit in areas further away from known data points.


Surrogate Model
Image Source: https://neurocomputinglab.com/jobs/surrogate-modeling/

Acquisition Function

On the other side of the coin, there is the acquisition function. This uses the surrogate model to make predictions of an optimal value. The way the acquisition function makes its predictions is through a balance of exploration and exploitation. Predictions will be made in high areas of uncertainty, commonly areas far away from the original data set or previous predictions. The other way predictions are made is by checking areas with a high mean compared to the original dataset and previously predicted points. A prediction that has a higher value compared to the mean will be considered more optimal.

Acquisition Function
Image Source: https://ekamperi.github.io/machine%20learning/2021/06/11/acquisition-functions.html
Image for GP
Image Source: https://www.php.cn/faq/662604.html

Gaussian Process (GP)

The Gaussian Process (GP) is the most commonly used surrogate model in Bayesian Optimization due to its strength in dealing with smooth functions. Unlike other regression models, GP uses a distribution over possible functions that could explain the observed data. This can be visualized as each point along this surrogate function having its own unique confidence uncertainty and mean prediction. This is done through kernels Functions that measure the similarity between two data points. For example, comparing point x to x' to help calculate the mean and uncertainty. . All the points together create the confidence interval used by the acquisition function to make predictions.

Image for RF
Image Source: https://medium.com/@roiyeho/random-forests-98892261dc49

Random Forest (RF)

As per what it sounds like, Random Forest (RF) is a “forest.” This “forest” is a large group of many decision trees that each have slightly different approximations of the true function that may give different results. These different functions are based off of a bootstrap method A method where new training sets are created by randomly selecting data points with replacement from the original set. . These may look like A, B, C, etc. where a random combination of these parameters are made to be some like A, A, C or B, A, C where either a condition(s) is not accounted for, the order is randomized, or a combination of both. Each tree will be assigned a bootstrap to be trained on. The result of these decision trees are averaged to find an average result. This result is then also used to predict the variance in the surrogate by comparing the average result to each result of each decision tree.

Image for ET
Image Source: https://www.researchgate.net/figure/Shows-the-diagram-of-the-extra-trees-regressor-32_fig1_392660477

Extra Trees (ET)

Extra trees (ET) can be confused with RF due to their similarity in use of decision trees. However, there are key differences between the two. For instance, ET does NOT use bootstrapping. ET uses also increases the randomness at the node. Each decision node makes decisions randomly, once again, unlike RF which proactively tries to make the best decision. ET really emphasizes randomness. The prediction from each of these trees and then average and a confidence limit is derived based on the variance in predictions.

Expected Improvement (EI)
& Probable Improvement (PI)

EI uses the known data points and the unknown areas in order to predict the magnitude of improvement of a prediction to help decide which prediction to make. PI focuses on predicting a spot with the highest chance of improvement. Comparing these two, EI could be said to explore more in order to find that big jump in improvement while EI is more conservative and makes guesses closer to known good values in order to see any improvement.

Lowest Confidence Boundary (LCB)

LCB will look for low valued areas then exploit and exploit off of that. LCB is usually used for minimization.

GP Hedge

Unlike the other acquisition methods in this module, GP Hedge uses multiple strategies, and in fact, it combines the previous three strategies. This uses a softmax which is just the way GP Hedge gives a probability of improvement to each acquisition strategy. This is then converted to a number that helps decide which one to test. The strategy will then be given a score. This will be repeated until the best acquisition function is decided.

gerneral acquisition function
Image Source: https://www.researchgate.net/figure/Acquisition-functions-computed-from-eight-observations-green-of-an-otherwise-unknown_fig4_322306222?__cf_chl_tk=6CMGvV628QdPH540UlF6yilT.szL7MXFDZC_lTQf9lA-1750786891-1.0.1.1-C.xnlFsXMGOfVK1z6op7MwsR9BAD6ie_PzThDwFCKcc

Random Sampling

It's as basic as they come. In random sampling, any part of the population has an equal chance of being selected. For the PBR, any set of parameters is just as likely to be sampled as the next.








Image for Random Sampling
Image Source: https://www.researchgate.net/figure/Static-sampling-strategies-A-Random-sampling-B-Latin-hypercube-sampling-C-Sobol_fig1_370656695

Sobol

This and LHS are quasi-random sampling methods. Quasi random sampling is random to an extent but with that catch that there will not be large spaces without a sample taken unlike what is possible in random sampling. Sobol is deterministic which means that if given a set of data or like the parameters in the PBR, the sampling will always result as the same (unlike random sampling where it is random every time). Sobol also is effective for sampling higher dimensioned problems.


Image for Sobol
Image Source: https://www.researchgate.net/figure/Static-sampling-strategies-A-Random-sampling-B-Latin-hypercube-sampling-C-Sobol_fig1_370656695

Latin Hypercube Sampling

Latin Hypercube Sampling (LHS) is another quasi-random sampling technique that is in between random sampling and sobol. The way this method works is by ensuring each variable/parameters range is covered. An easy way to visualize this is by imagining ingredients for a cake. The samples could be 1-2 cups of flour, 2-3 cups of flour, and 3-4 cups of flour. While it does not matter where in the range of cups is chosen, the sample must cover that area such as sampling 1.1, 2.5, and 3.2 cups.

Image for LHS
Image Source: https://www.researchgate.net/figure/Static-sampling-strategies-A-Random-sampling-B-Latin-hypercube-sampling-C-Sobol_fig1_370656695

How to use the module:

  1. Define Parameter Search Space

    This is where you will define the range for each parameter. Defining the range tells the machine what range it is allowed to make guesses from. The first parameters you can vary are the initial concentration of biomass and initial concentration of nitrate. These materials are already in the reactor at the start of the experiment. You will then change the flowrate and the concentration of nitrate in. This lets the machine know how much nitrate will be entering the reactor at the start of the experiment and rate of entry. Finally the initial light intensity can be manipulated which is the amount of light being shined through the reactor glass.

    Image for step 1
  2. Configure Initial Sampling & Model

    In this section, you can choose the surrogate model, acquisition function, sampling method, and number of initial points The starting dataset that BO will use to make the first prediction. .

    Image for step 2
  3. Run Experiment Workflow

    Once the first two sections have been set-up and chosen, the final step is the start running the experiment. The first part is to generate the inital data points. Next, calculate lutein for those data points. Then, showthe prediction and the suggested next epxperiment. Based of of the inital trial and first optimization run, a suggested point from the model will be created. This suggested run can then be ran and produce an updated model as a result. This process may be repeated Back and forth between buttons C & D until the user is happy with their optimized result. The experiment can then be reset to try other models.

    Image for step 3

Guiding Questions:

Parameter Questions:

  1. What is the optimal lutein concentration with the default settings on the module?
  2. How does the optimal lutein concentration change when the light intensity range slider is shifted to 300-400 μmol/m2s
  3. How does the optimal lutein concentration change when the initial concentration of biomass is increased to 5-6 g/L? Repeat this for the inital nitrate concentration. Which one seems to have a greater effect on lutein concetration? Does this make sense?
  4. Increase inltet nitrate's concentration range to 20-30 g/L. What effect does this have on lutein concentration. After testing this question and question 3, what does the overall trend for lutein concentration seem to be when initial concentrations are increased?
  5. Change the inlet flow rate to 0.0750-0.1001 L/h. What is the value of lutein's concentration? Why does the concentration increase/decrease like this compared to question 1.

Modeling Questions:

  1. When the default settings of the moduel are used, the optimal lutein outcome should be 0.0107 g/L. Try each surrogate model with 10 initial points and 10 additonal trials, or until 0.0107 g/L of lutein is reached. Do this three times for each. Which model seems to be the best fit for this PBR?
  2. Repeat question 6 for each acquisition function.
  3. Repeat question 6 for each sampling method.
  4. Change the initial samples to 5. Run a few trials until 0.0107 g/L is reached, and then average the number of additional trials to reach that lutein concentration. Do this for 15 intial samples too. Based off of comparing the averages of additional trials needed, does having more inital samples before bayesian optimization is used help speed up the computational arrival to optimal lutein concentration?



Clutch AI Chat