This module will dive into the creation, analysis, and optimization of a data-driven model. Neural networks are powerful, statistical models that, over the course of analysing past data, can learn to predict new outputs. We'll be using one to predict the outputs of a photobioreactor over a week's experimentation.
A photobioreactor is a container, like a fish tank, filled with water and special microscopic plants called algae. It provides the algae with light, nutrients, and carbon dioxide to help them grow. The algae use sunlight to make their own food through a process called photosynthesis. The photobioreactor allows us to grow algae and use their biomass to produce clean and renewable energy, like biofuels. It also helps clean up the environment by absorbing harmful pollutants, such as carbon dioxide. In simple terms, a photobioreactor is a special container that helps tiny plants called algae grow using light, nutrients, and carbon dioxide to make clean energy and help the planet.
NOTE: If the module below is not appearing, try opening this page in a private/incognito window.
Due to the size of the dataset, it may take a few seconds to load.
Photobioreactor
This module lends itself to optimize a photobioreactor. A photobioreactor utilizes characteristics of microalgae to obtain a desired product. Whether that product be solely biomass or a valuable pharmaceutical product, this diverse category of microorganisms can be summarized by the use of solar energy to fix inorganic material into organic products.
In this specific case, the focus product is Lutein whose property is highlighted in the pharmaceutical and food industries. The secondary product, biomass, can be used in the fields of biofuel and nutrition.
What You'll Change - PBR
In this module, you will seek to understand how various inputs will affect the output of the photobioreactor. The variation in the inputs will allow discovery of optimal culture operating conditions.
The flexible parameters for this photobioreactor are light intensity, inlet flow of nitrates, inlet concentration of nitrates, initial nitrate concentration, and initial biomass value. Static features include CO2 percentage and its aeration rate.
How to Evaluate
To evaluate this module, one can look at the maximum production of your desired product. Another option is performing a cost-benefit analysis, considering the value of either product as well as the cost of the materials used.
Machine Learning
A subsection of data science that focuses on using data and algorithms to mimic human styles of learning.
Neural Network
A machine learning architecture where data is fed through layers of processing nodes called neurons. They excel at finding patterns in sets of data, and can be used for both prediction and classification. Named for their ability to mimic the function of the human brain, subconsciously tying together inputs to their respective outputs.
About This Module
Here, a problem has been identified and a dataset generated for use. The goal of the network is to take an input of the photobioreactor's state at a given time, and predict the state of that photobioreactor at the next measured time. By looping these single time-step predictions into each other, the neural network generates a whole experiment's worth of predicted values.
Data Background
The model is built upon a set of 20,000 datapoints spanning 100 different experiments. Each experiment spans 200 datapoints over the course of about a week (150 hours). Experimental parameters were chosen randomly within a range of reasonable values-otherwise known as random sampling.
The model uses 6 inputs: the current state of the three concentrations we are measuring, the light intensity, the inlet flow rate and inlet concentration of nitrate (food) solution. Three outputs are produced: the concentrations at the next timestep. Think of the concentrations as being dynamic-they'll change with each timestep. The other three input parameters are static, only changing once per experiment.
What You'll Change - Neural Network
You get to decide the inner workings of this neural network, and evaluate it based on a variety of accuracy measures. Start by choosing an optimizer and loss function, as these are the framework upon which all network adjustments are made. Then, choose how granular training will be, focusing on achieving a balance between high accuracy and quick runtime.
How to Evaluate
There are several tools at your disposal to judge a model's accuracy. The simplest is the test loss, which applies the same function you trained your model on to a set of holdout data. The lower these values, the better. A parity plot is also drawn, comparing actual values against model predicted values. A perfectly accurate parity plot will draw a 45 degree line from bottom left to top right. Finally, a test experiment is run, comparing actual values against predicted ones. This will allow you to see if the model can handle repeated predictions over time.
Trainsplit: Determine as a percentage how much of the data will be used to teach the model. What is left out of training will be used to validate and test.
Number of Neurons: Determine how dense each neural network layer is. The network contains 3 layers, with an activator function in between each. Denser networks are resource intensive, but thinner networks may compromise accuracy.
Epochs: Determine how many times the network will read over the training data. This heavily impacts the model's processing time-generally, the more times the network reads over a set of data, the closer it will fit to that data.
Batch Size: Determine how many data points to feed the network at one time. An ideal batch size will help optimize runtime and model accuracy.
Learning Rate: Choose a maximum value by which the optimizer may adjust neuron weights, the internal parameters by which each neuron makes decisions. The lower this is, the smaller the changes any given epoch will have on the model.
Optimizer: Choose an algorithm by which the neural network will adjust its inner neurons. Both choices can be efficient, but may require further tuning of other parameters.
Loss Function: Choose an algorithm to measure the accuracy of your predictions. MSE judges by square loss, whereas MAE judges by absolute loss.
To use the model, simply adjust sliders and dropdowns to your liking. Press the "Run" button on the "Test" tab to create a new neural network. Analyse your model's performance with the "Evaluate" tab. Aim to lower your model's error and increase it's accuracy! To optimize your model, use the "Optimize" tab. Change experimental parameters to your liking, and hit "Run" to generate new predictions. To save predictions to your local machine, hit "Export Data".
To generate data, you can change the following values with the coresponding slider:
Initial Nitrate Concentration: Set the initial nitrate concentration in g/L (0.2 - 2)
Initial Biomass Concentration: Set the initial biomass concentration in g/L (0.2 - 2)
Inlet Flow: Control the inlet flow of nitrate into the reactor (0.001 - 0.015 L/h)
Inlet Concentration: Inlet concentration of nitrate feed to the reactor (5 - 15 g/L)
Light Intensity: Control the light intensity in the range of 100 - 200 umol/m2-s
Hover the cursor over the line and you will be able to the element name, time, and element concentration
Note: the Lutien concentration is 1000 times greater than what the actual concentration is, so that you are able to see the Lutine curve
This Button will reset the graph to the original position based on the initial conditions before the sliders were changed
This Button will take the slider conditions that you have and will create a new plot based on those new conditions
Note: There are two run buttons the run button on the Train Tab changes the graph in the Train tab and the Optimize tab, the run button on the Evaluate Tab on will only change the graph of the evaluate tab
This Button will take the data points of the Time, Nitrate Concentration, Biomass concentration, and Lutine concentration and put them in a csv file and this csv file will be located in your downloads folder the file will be named "exported_data_{timestamp}.csv" the timestamp is the current time and will be formated as year-month-day-hour-minuete-second
In the Train tab you can see little question mark buttons next to the interactive elements, these buttons will give you information on what this tool is used for and how it will can change your graph
Machine Learning:
PBRs: