Photobioreactor Data-Driven Digital Lab Twin

Welcome to Digital Lab Twin

This module will dive into the creation, analysis, and optimization of a data-driven model. Neural networks are powerful, statistical models that, over the course of analysing past data, can learn to predict new outputs. We'll be using one to predict the outputs of a photobioreactor over a week's experimentation.

A photobioreactor is a container, like a fish tank, filled with water and special microscopic plants called algae. It provides the algae with light, nutrients, and carbon dioxide to help them grow. The algae use sunlight to make their own food through a process called photosynthesis. The photobioreactor allows us to grow algae and use their biomass to produce clean and renewable energy, like biofuels. It also helps clean up the environment by absorbing harmful pollutants, such as carbon dioxide. In simple terms, a photobioreactor is a special container that helps tiny plants called algae grow using light, nutrients, and carbon dioxide to make clean energy and help the planet.

Example of a Photobioreactor

NOTE: If the module below is not appearing, try opening this page in a private/incognito window.

Due to the size of the dataset, it may take a few seconds to load.

Jump to Module


Photobioreactor Info

Photobioreactor

This module lends itself to optimize a photobioreactor. A photobioreactor utilizes characteristics of microalgae to obtain a desired product. Whether that product be solely biomass or a valuable pharmaceutical product, this diverse category of microorganisms can be summarized by the use of solar energy to fix inorganic material into organic products.

In this specific case, the focus product is Lutein whose property is highlighted in the pharmaceutical and food industries. The secondary product, biomass, can be used in the fields of biofuel and nutrition.

What You'll Change - PBR

In this module, you will seek to understand how various inputs will affect the output of the photobioreactor. The variation in the inputs will allow discovery of optimal culture operating conditions.

The flexible parameters for this photobioreactor are light intensity, inlet flow of nitrates, inlet concentration of nitrates, initial nitrate concentration, and initial biomass value. Static features include CO2 percentage and its aeration rate.

How to Evaluate

To evaluate this module, one can look at the maximum production of your desired product. Another option is performing a cost-benefit analysis, considering the value of either product as well as the cost of the materials used.

Neural Network Info

Machine Learning

A subsection of data science that focuses on using data and algorithms to mimic human styles of learning.

Neural Network

A machine learning architecture where data is fed through layers of processing nodes called neurons. They excel at finding patterns in sets of data, and can be used for both prediction and classification. Named for their ability to mimic the function of the human brain, subconsciously tying together inputs to their respective outputs.

About This Module

Here, a problem has been identified and a dataset generated for use. The goal of the network is to take an input of the photobioreactor's state at a given time, and predict the state of that photobioreactor at the next measured time. By looping these single time-step predictions into each other, the neural network generates a whole experiment's worth of predicted values.

Data Background

The model is built upon a set of 20,000 datapoints spanning 100 different experiments. Each experiment spans 200 datapoints over the course of about a week (150 hours). Experimental parameters were chosen randomly within a range of reasonable values-otherwise known as random sampling.

The model uses 6 inputs: the current state of the three concentrations we are measuring, the light intensity, the inlet flow rate and inlet concentration of nitrate (food) solution. Three outputs are produced: the concentrations at the next timestep. Think of the concentrations as being dynamic-they'll change with each timestep. The other three input parameters are static, only changing once per experiment.

What You'll Change - Neural Network

You get to decide the inner workings of this neural network, and evaluate it based on a variety of accuracy measures. Start by choosing an optimizer and loss function, as these are the framework upon which all network adjustments are made. Then, choose how granular training will be, focusing on achieving a balance between high accuracy and quick runtime.

How to Evaluate

There are several tools at your disposal to judge a model's accuracy. The simplest is the test loss, which applies the same function you trained your model on to a set of holdout data. The lower these values, the better. A parity plot is also drawn, comparing actual values against model predicted values. A perfectly accurate parity plot will draw a 45 degree line from bottom left to top right. Finally, a test experiment is run, comparing actual values against predicted ones. This will allow you to see if the model can handle repeated predictions over time.

Tooltips

Trainsplit: Determine as a percentage how much of the data will be used to teach the model. What is left out of training will be used to validate and test.

Number of Neurons: Determine how dense each neural network layer is. The network contains 3 layers, with an activator function in between each. Denser networks are resource intensive, but thinner networks may compromise accuracy.

Epochs: Determine how many times the network will read over the training data. This heavily impacts the model's processing time-generally, the more times the network reads over a set of data, the closer it will fit to that data.

Batch Size: Determine how many data points to feed the network at one time. An ideal batch size will help optimize runtime and model accuracy.

Learning Rate: Choose a maximum value by which the optimizer may adjust neuron weights, the internal parameters by which each neuron makes decisions. The lower this is, the smaller the changes any given epoch will have on the model.

Optimizer: Choose an algorithm by which the neural network will adjust its inner neurons. Both choices can be efficient, but may require further tuning of other parameters.

Loss Function: Choose an algorithm to measure the accuracy of your predictions. MSE judges by square loss, whereas MAE judges by absolute loss.

Instructions

To use the model, simply adjust sliders and dropdowns to your liking. Press the "Run" button on the "Test" tab to create a new neural network. Analyse your model's performance with the "Evaluate" tab. Aim to lower your model's error and increase it's accuracy! To optimize your model, use the "Optimize" tab. Change experimental parameters to your liking, and hit "Run" to generate new predictions. To save predictions to your local machine, hit "Export Data".

Sliders

To generate data, you can change the following values with the coresponding slider:

Initial Nitrate Concentration: Set the initial nitrate concentration in g/L (0.2 - 2)

Initial Biomass Concentration: Set the initial biomass concentration in g/L (0.2 - 2)

Inlet Flow: Control the inlet flow of nitrate into the reactor (0.001 - 0.015 L/h)

Inlet Concentration: Inlet concentration of nitrate feed to the reactor (5 - 15 g/L)

Light Intensity: Control the light intensity in the range of 100 - 200 umol/m2-s

Hover Tool

Hover the cursor over the line and you will be able to the element name, time, and element concentration

Note: the Lutien concentration is 1000 times greater than what the actual concentration is, so that you are able to see the Lutine curve

Reset Button

This Button will reset the graph to the original position based on the initial conditions before the sliders were changed

Run Button

This Button will take the slider conditions that you have and will create a new plot based on those new conditions

Note: There are two run buttons the run button on the Train Tab changes the graph in the Train tab and the Optimize tab, the run button on the Evaluate Tab on will only change the graph of the evaluate tab

Export Button

This Button will take the data points of the Time, Nitrate Concentration, Biomass concentration, and Lutine concentration and put them in a csv file and this csv file will be located in your downloads folder the file will be named "exported_data_{timestamp}.csv" the timestamp is the current time and will be formated as year-month-day-hour-minuete-second

Help Button

In the Train tab you can see little question mark buttons next to the interactive elements, these buttons will give you information on what this tool is used for and how it will can change your graph

Guiding Questions

Machine Learning:

  1. Run the model under default conditions, and take note of the lowest loss. Change the training split to 0.1, and run again. What happens to loss?
    • Move to the evaluate tab. Are your predictions over time more or less accurate using less training data?
  2. Do your predictions always make sense, understanding they represent concentrations in a solution? If not, what might this imply about the data-driven model?
  3. What single variable do you believe has the greatest impact on model accuracy?
  4. What single variable has the greatest impact on training time?
  5. Run the model under the same conditions multiple times. What results change upon each run?
    • Continue to adjust parameters, this time with a focus on improving model consistency.
  6. After running the model many times, do you notice any trends in where the model tends to fail? Is it more accurate predicting one or more outputs than others?
    • What steps might you take as a data scientist to improve your training data for the future?
  7. This module explores only a neural network as a tool for prediction. What other tools would be useful in solving this particular problem?

PBRs:

  1. Explore each of the inputs using the sliders. Which of these variables has the biggest effect on lutein production? Why might this be the case?
  2. Adjust the sliders with the intent to optimize lutein production. What are the inputs of this scenario?
  3. Another way to optimize the photobioreactor is to perform a cost-benefit analysis. With the measurements you have taken within your digital experiment, calculate the profit of lutein and biomass using an estimation of their value. Compare this with the variability of the nitrate cost. After trying different combinations, record your highest value to cost ratio.
  4. List the different phases of algal growth. Looking at the biomass output, are you able to visualize these different stages?
  5. Why may the outputs of this model be inherently different from experimental data?
  6. Once an experiment is run, choose a way to calculate the difference between theoretical and experimental data.
  7. Once an experiment is run, input the values used in the experiment into the model. Once a graph is generated, export the data and graph your theoretical and experimental results together.
  8. Under high nitrate concentrations, biomass tends to increase rapidly. Why might we not see as high of an increase in lutein production?