Optimal Control of Lettuce Greenhouse Horticulture using Model-Free Reinforcement Learning
Summary
A greenhouse is an important growing system that can provide a controlled climate environment and
allow for crop growth in a changing outdoor climate. Due to the high energy cost, labor and resource
scarcity, optimal and automated control of greenhouse horticulture is becoming more and more important
with the aim of optimizing resource usage while maximizing crop production. Outdoor weather is
a critical disturbance when controlling greenhouse climate and it complicates the modelling and optimization
processes.
With the development of Artificial Intelligence (AI) and improved sensing techniques, Reinforcement
Learning (RL) is getting more and more attention due to its learning-based control strategies,
independent from having a good model. Up to now, most of the RL applications in greenhouse climate
control do not consider outdoor weather forecast while making control decisions, which means plenty
of useful information is missed and this might lead to control actions which are not optimal. Therefore
in this project, we investigated how weather forecast horizons will affect optimal control of greenhouse
horticulture using reinforcement learning.
After going through different deep RL approaches, Soft Actor-Critic (SAC) and Twin-Delayed
Deep Deterministic Policy Gradient (TD3) stand out because of their capacity to consider continuous
state-action space. As the weather prediction will mainly work well in the short-term due to forecast
uncertainty, moreover, long-term weather forecast will bring various unnecessary noise and a large state
space. As a result, our work mainly focused on short-term weather forecast horizons of 0, 3, 7, 11, 15,
19, 23 time steps of fifteen minutes. To investigate how horizons can affect the control performance,
these seven different horizons were used in experiments using the state-of-the-art continuous control
algorithms, SAC and TD3.
After demonstrating the proposed approaches in a lettuce greenhouse, we found that SAC consistently
performed better than TD3 with higher rewards in terms of crop production and net profit,
while resource use was comparable. Furthermore, inclusion of weather forecasts proved essential for
both algorithm learning stability, as well as its training and generalization performance, resulting in
increased yields and net profits, while reducing resource use and the amount of indoor climate constraint
violations. Moreover, we can also conclude that four hours of weather forecast is the best option.
Longer predictions did not increase performance, whereas using shorter forecasts quickly degraded
performance.