Reinforcement Based Recommendations For An Entity In A Two Sided

< Back

Reinforcement Based Recommendations For An Entity In A Two Sided Electricity Market

Abstract: A method and system for reinforcement based recommendations for an entity in a two-sided electricity market is disclosed. The conventional systems for reinforcement based recommendations in a two-sided electricity market are specifically designed for either a buyer entity or a seller entity and are mostly cognizant of the uncertainties in forecasting the aggregate market supply and demand curve. The disclosed method of reinforcement based recommendation framework provides a generic framework, to be used by either a buyer entity or a seller entity participating in the two-sided electricity market. Further the method disclosed herein specifically trains a reinforcement model based on real time as well as forecasted electricity market state variables such as market demand, supply, clearing prices and clearing quantities and to mimic a clearing mechanism ( market model ) in a two-sided electricity market.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 November 2020

Publication Number

25/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2025-10-30

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai - 400021, Maharashtra, India

Inventors

1. SARANGAN, Venkatesh

Tata Consultancy Services Limited, IIT-Madras Research Park, Block A, Second Floor, Phase - 2, Kanagam Road, Taramani, Chennai - 600113, Tamil Nadu, India

2. SUBRAMANIAN, Easwar

Tata Consultancy Services Limited, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad - 500081, Telangana, India

3. BICHPURIYA, Yogesh

Tata Consultancy Services Limited,Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411013, Maharashtra, India

4. MAHILONG, Nidhisha

Tata Consultancy Services Limited,Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune - 411013, Maharashtra, India

5. PEDASINGU, Bala Suraj

Tata Consultancy Services Limited, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad - 500081, Telangana, India

Specification

Claims:

A processor-implemented method (300) for reinforcement based recommendations for an entity in a two-sided electricity market comprising:
receiving a first set of input parameters associated with a double-sided electricity market, via one or more hardware processors, the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity (302);
forecasting a set of state variable parameters for a pre-defined timeslot, by the one or more hardware processors, through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast (304);
receiving a second set of input parameters and a third set of input parameters, via the one or more hardware processors, wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity(306);
generating a market model for the bidding entity, by the one or more hardware processors, based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique (308); and
training a reinforcement model for the bidding entity, by the one or more hardware processors based on a reinforcement learning technique by modelling the interaction of the set of state variable parameters and the market model ( 310) comprising:
defining a state space for the reinforcement model based on the first set of input parameters, the set of state variable parameters, the second set of input parameters and the third set of input parameters (310A);
defining an action space for the reinforcement model based on the set of state variable parameters (310B); and
defining a reward function for the reinforcement model based on the clearing price and the clearing quantity obtained from the market model through the optimization technique (310C).

The processor-implemented method of claim 1, recommending a bidding parameter at a bidding time slot for the bidding entity, by the one or more hardware processors, by processing one or more real-time inputs from the double-sided electricity market using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market.

The processor-implemented method of claim 1, the double-sided electricity market is a wholesale electricity market wherein multiple bids or asks per auction are placed one day in advance of electricity delivery by plurality of entities, the plurality of entities includes a plurality of buyers of electricity and a plurality of sellers of electricity and the bidding entity is one of a buyer or a seller of electricity in the double-sided electricity market.

The processor-implemented method of claim 1, wherein the time-series forecast modelling technique includes one of an integrated auto-regressive moving average (ARIMA) and an artificial deep neural network.

The processor-implemented method of claim 1, wherein the optimization techniques include an operations research, a linear programming, a mathematical programming and a genetic algorithm.

The processor-implemented method of claim 1,wherein the reinforcement learning technique includes a double deep-Q-network (DDQN), a delayed Q learning, an actor-critic based approaches using Q-learning based Deep-Q-Network (DQN) technique.

The method as claimed in claim 1, wherein the step of defining the state space comprises of identifying a set of state space parameters for time slots in a day starting from a first time slot (t) until a last bidding time slot ( t + H ), wherein the set of state space parameters comprises of the first set of input parameters, the set of state variable parameters, and the second set of input parameters and the third set of input parameters are identified and H is a number of time slots in a day.

The method as claimed in claim 1, wherein the step of defining the action space constitutes determining a plurality of price bands (p) and a plurality of quantity (q) based on the set of state space parameters, wherein the q is determined based on a plurality of distribution profiles of the third set of input parameters and the p is determined based on the first set of input parameters .

The method as claimed in claim 1, wherein the reward function is defined for the buyers of electricity and a plurality of sellers of electricity in the electricity market and is expressed as:
r_(t+h)=ß_1×p_(t+h)^*×C_(a,t+h)+ß_2×p_(t+h)^*×U_(a,t+h)
wherein:
ß_1 and ß_2 are hyperparameters representing a specific buyer/seller ;
C_(a,t+h) refers to a quantity cleared by the market for an bidding entity a – which is a function of the bids placed by entity a.
U_(a,t+h) refers to the quantity of generation or demand that is not cleared by the market for the bidding entity a.
C_(a,t+h) and p_(t+h)^* are received from the market model.

A system (100), comprising:
an input/output interface (106);
one or more memories (102); and
one or more hardware processors (104), the one or more memories (104) coupled to the one or more hardware processors (104), wherein the one or more hardware processors (104) are configured to execute programmed instructions stored in the one or more memories (102), to:
receive a first set of input parameters associated with a double-sided electricity market, via one or more hardware processors, the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity;
forecast a set of state variable parameters for a pre-defined timeslot, by the one or more hardware processors, through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast;
receive a second set of input parameters and a third set of input parameters, via the one or more hardware processors, wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity;
generate a market model for the bidding entity, by the one or more hardware processors, based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique; and
train a reinforcement model for the bidding entity, by the one or more hardware processors based on a reinforcement learning technique by modelling the interaction of the set of state variable parameters and the market model comprising:
define a state space for the reinforcement model based on the first set of input parameters, the set of state variable parameters, the second set of input parameters and the third set of input parameters;
define an action space for the reinforcement model based on the set of state variable parameters; and
define a reward function for the reinforcement model based on the clearing price and the clearing quantity obtained from the market model through the optimization technique.

The system of claim 10, wherein the one or more hardware processors are configured by the instructions to recommend a bidding parameter at a bidding time slot for the bidding entity by processing one or more real-time inputs using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market.

The system of claim 10, wherein the one or more hardware processors are configured by the instructions to implement the step of defining the state space comprises of identifying a set of state space parameters for time slots in a day starting from a first time slot (t) until a last bidding time slot ( t + H ), wherein the set of state space parameters comprises of the first set of input parameters, the set of state variable parameters, and the second set of input parameters and the third set of input parameters are identified and H is a number of time slots in a day.

The system of claim 10, wherein the one or more hardware processors are configured by the instructions to implement the step of defining the action space constitutes determining a plurality of price bands (p) and a plurality of quantity (q) based on the set of state space parameters, wherein the q is determined based on a plurality of distribution profiles of the third set of input parameters and the p is determined based on the first set of input parameters .

The system of claim 10, wherein the one or more hardware processors are configured by the instructions to implement the reward function defined for the buyers of electricity and a plurality of sellers of electricity expressed as:
r_(t+h)=ß_1×p_(t+h)^*×C_(a,t+h)+ß_2×p_(t+h)^*×U_(a,t+h)
wherein:
ß_1 and ß_2 are hyperparameters representing a specific buyer/seller ;
C_(a,t+h) refers to a quantity cleared by the market for an bidding entity a – which is a function of the bids placed by entity a.
U_(a,t+h) refers to the quantity of generation or demand that is not cleared by the market for the bidding entity a.
C_(a,t+h) and p_(t+h)^* are received from the market model.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
REINFORCEMENT BASED RECOMMENDATIONS FOR AN ENTITY IN A TWO-SIDED ELECTRICITY MARKET

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD

The disclosure herein generally relates to recommendations in electricity market for electricity procurement, and, more particularly, to a reinforcement based recommendations for an entity in a two-sided electricity market.

BACKGROUND

An important outcome of deregulation of the electricity industry is the creation of electricity markets for a variety of services such as whole-sale energy, balancing, and ancillary, to name a few. In electricity markets, power generating companies and electricity distribution utilities are integral participants for selling and buying electricity respectively. In such electricity markets, different entities/participants have their own objectives to optimize, wherein generators/sellers of electricity are interested in maximizing their own individual profits while the retail energy suppliers/buyers aim to reduce their energy purchase price. In a double-sided or a two-sided electricity market buyers and sellers place their bids one day before the actual delivery day, with multiple auctions for the same delivery day with one auction for each time block.
In traditional approaches, entities that participate in bidding process, had to rely on their expertise to make right assumptions with respect to bidding. However, this approach is prone to human errors, which in turn can cause losses. Further few existing techniques address the problem of efficient bidding mechanisms based on learning framework. The existing learning framework arrives at optimal bidding strategies based simplistic assumptions about the nature and mechanism of the auction process. Further the existing learning frameworks also are not generic and only address the requirements of a buyer or a seller alone. Further the existing bidding mechanisms are mostly cognizant of the uncertainties in forecasting the aggregate market supply and demand curves.

SUMMARY

Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for reinforcement based recommendations for an entity in a two-sided electricity market is provided. The method includes receiving a first set of input parameters associated with a double-sided electricity market the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity. The method further includes forecasting a set of state variable parameters for a pre-defined timeslot through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast. The method further includes receiving a second set of input parameters and a third set of input parameters wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity. The method further includes generating a market model for the bidding entity based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique. The method further includes training a reinforcement model for the bidding entity based on a reinforcement learning technique by modelling the interaction of the set of state variable parameters and the market model comprising: defining a state space for the reinforcement model based on the first set of input parameters, the set of state variable parameters, the second set of input parameters and the third set of input parameters, defining an action space for the reinforcement model based on the set of state variable parameters and defining a reward function for the reinforcement model based on the clearing price and the clearing quantity obtained from the market model through the optimization technique. The method further includes recommending a bidding parameter at a bidding time slot for the bidding entity by processing one or more real-time inputs from the double-sided electricity market using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market.
In another aspect, a system for reinforcement based recommendations for an entity in a two-sided electricity market is provided. The system comprises a memory for storing instructions and is connected to one or more Input/output (I/O) interfaces. The system further comprises one or more hardware processors coupled to the memory via the one or more I/O interface, wherein the one or more hardware processors are configured by the instructions to implement reinforcement based recommendations for an entity in a two-sided electricity market. The system is configured for receiving a first set of input parameters associated with a double-sided electricity market, via one or more hardware processors, the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity. The system is further configured for forecasting a set of state variable parameters for a pre-defined timeslot, by the one or more hardware processors, through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast. The system is further configured for receiving a second set of input parameters and a third set of input parameters, via the one or more hardware processors, wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity. The system is further configured for generating a market model for the bidding entity, by the one or more hardware processors, based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique. The system is further configured for training a reinforcement model for the bidding entity, by the one or more hardware processors based on a reinforcement learning technique by modelling the interaction of the set of state variable parameters and the market model comprising: defining a state space for the reinforcement model based on the first set of input parameters, the set of state variable parameters, the second set of input parameters and the third set of input parameters, defining an action space for the reinforcement model based on the set of state variable parameters and defining a reward function for the reinforcement model based on the clearing price and the clearing quantity obtained from the market model through the optimization technique. The system is further configured for recommending a bidding parameter at a bidding time slot for the bidding entity by processing one or more real-time inputs from the double-sided electricity market using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market.
In yet another aspect, a non-transitory computer readable medium for reinforcement based recommendations for an entity in a two-sided electricity market is provided. The program includes receiving a first set of input parameters associated with a double-sided electricity market the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity. The program further includes forecasting a set of state variable parameters for a pre-defined timeslot through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast. The program further includes receiving a second set of input parameters and a third set of input parameters wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity. The program further includes generating a market model for the bidding entity based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique. The program further includes training a reinforcement model for the bidding entity based on a reinforcement learning technique by modelling the interaction of the set of state variable parameters and the market model comprising: defining a state space for the reinforcement model based on the first set of input parameters, the set of state variable parameters, the second set of input parameters and the third set of input parameters, defining an action space for the reinforcement model based on the set of state variable parameters and defining a reward function for the reinforcement model based on the clearing price and the clearing quantity obtained from the market model through the optimization technique. The program further includes recommending a bidding parameter at a bidding time slot for the bidding entity by processing one or more real-time inputs from the double-sided electricity market using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for reinforcement based recommendations for an entity in a two-sided electricity market according to some embodiments of the present disclosure.
FIG.2 is a functional block diagram for reinforcement based recommendations for an entity in a two-sided electricity market according to some embodiments of the present disclosure.
FIG.3A and FIG.3B is a flow diagram illustrating a method of reinforcement based recommendations for an entity in a two-sided electricity market in accordance with some embodiments of the present disclosure.
FIG.3C is a flow diagram illustrating a method for generation of the reinforcement model 206 by defining a state space, an action space and a reward function in accordance with some embodiments of the present disclosure.
FIG.4 is a graph illustrating distribution profiles for defining the action space in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Electricity markets can be of different types such as day-ahead, intra-day, and balancing. The method disclosed herein, is more specific to a two-sided, day-ahead, wholesale electricity markets. In the two-sided markets, several entities such as a plurality of buyers and a plurality of sellers place their bids/asks with a market operator ( who is representative of the two-sided market) one day before the actual delivery day. Multiple auctions happen for the same delivery day with one auction for each time block. Further, the buyer or the seller can place multiple bids or asks per auction. In addition, the total volume of bids ( available generation capacity) and the total volume of asks ( smart city load to be met) can vary across auctions pertaining to a delivery day. The market operator matches the bids with the asks using a prescribed two-sided market clearing mechanism. The results of auction clearing is advertised to all participating entities. The participants honor their cleared commitments on the delivery day. The method disclosed herein, considers the problem of optimizing a revenue or a purchase cost of electricity for a market player/entity/buyer/seller who participates in the periodic double auctions of a day-ahead wholesale electricity market by recommending a plurality of optimal bidding parameters, wherein the plurality of optimal bidding parameters consist of selling price or purchase cost of electricity and the quantity of electricity
Referring now to the drawings, and more particularly to FIG.1 through FIG.4 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG.1 is a functional block diagram of a system 100 for reinforcement based recommendations for an entity in a two-sided electricity market in accordance with some embodiments of the present disclosure.
In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, a touch user interface (TUI) and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting a number of devices (nodes) of the system 100 to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
Further, the memory 102 may include a database 108, which may store data related to historical electricity parameters like a historic demand, a historic price, a historic clearing price and a historic clearing quantity and like. Thus, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Functions of the components of system 100 are explained in conjunction with functional overview of the system 100 in FIG.2 and flow diagram of FIG.3A and FIG.3B for reinforcement based recommendations for an entity in a two-sided electricity market.
The system 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of various components of the system 100 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 100 is implemented to operate as a stand-alone device. In another embodiment, the system 100 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 100 are described further in detail.
FIG.2 is a functional block diagram of the system of FIG.1, in accordance with some embodiments of the present disclosure. As depicted in the architecture, the FIG.2 illustrates the functions of the components of the system 100 for reinforcement based recommendations for an entity in a two-sided electricity market.
The system 100 for reinforcement based recommendations for an entity in a two-sided electricity market is configured for receiving a first set of input parameters associated with a double-sided electricity market, wherein the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity. The system 100 further comprises of a forecasting module 202 configured for forecasting a set of state variable parameters for a pre-defined timeslot through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast. The system 100 is further configured for receiving a second set of input parameters and a third set of input parameters, via the one or more hardware processors, wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity. The system 100 is further comprises a market model 204 configured for generating a market model for the bidding entity based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique. The system 100 further comprises a reinforcement model 206 configured for training a reinforcement model for the bidding entity based on a reinforcement training technique by modelling the interaction of the set of state variable parameters and the market model, wherein a state space, an action space and a reward function is defined for the reinforcement model 206. Further using the trained reinforcement model 206, the system 100 recommends a bidding parameter at a bidding time slot for the bidding entity, by the one or more hardware processors via a recommendation model 208, by processing one or more real-time inputs from the double-sided electricity market, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market. The functioning of the modules introduced in this section is further explained in detail in the FIG.3A and FIG.3B.
The various modules of the system 100 for reinforcement based recommendations for an entity in a two-sided electricity market are implemented as at least one of a logically self-contained part of a software program, a self-contained hardware component, and/or, a self-contained hardware component with a logically self-contained part of a software program embedded into each of the hardware component that when executed perform the above method described herein.
Functions of the components of the system 100 are explained in conjunction with functional modules of the system 100 stored in the memory 102 as depicted in FIG. 2 and further explained in conjunction with flow diagram of FIGS. 3A and 3B. The FIG.3A and FIG.3B, with reference to FIG.1, is an exemplary flow diagram illustrating a method 300 for using the system 100 of FIG.1 according to an embodiment of the present disclosure.
The steps of the method of the present disclosure will now be explained with reference to the components of reinforcement based recommendations for an entity in a two-sided electricity market system (100) and the modules (202-208) as depicted in FIG.2 and the flow diagrams as depicted in FIG.3A and FIG.3B. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
At step 302 of the method (300), the one or more hardware processors are configured for receiving a first set of input parameters associated with a double-sided electricity market, via one or more hardware processors, the first set of input parameters comprising of a set of historic weather parameters and a plurality of historic market clearing parameters, wherein the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity
In an embodiment, the set of historic weather parameters comprises of several parameters that include a temperature, a humidity, a solar radiation and a wind speed. Further the plurality of historic market clearing parameters comprising a historic clearing price and a historic clearing quantity that include a market clearing quantity of the entity and sum of market clearing quantity across all entities and the historic market clearing price a time slot of several quantities and prices
In an embodiment, the double-sided electricity market is a wholesale electricity market wherein multiple bids or asks per auction are placed one day in advance of electricity delivery by plurality of entities, the plurality of entities includes a plurality of buyers of electricity and a plurality of sellers of electricity and the bidding entity is one of a buyer or a seller of electricity in the double-sided electricity market.
At step 304 of the method (300), the one or more hardware processors 104 forecast via the forecasting module 202, the set of state variable parameters for a pre-defined timeslot, through a time-series forecast modelling technique based on the first set of input parameters, wherein the set of state variable parameters comprises a market demand forecast, a market supply forecast, a clearing price forecast and a clearing quantity forecast.
In an embodiment, the set of state variable parameters for a pre-defined timeslot (t +h ) , are: (i) the total market demand (D_( t+h)^ ) & supply (G_( t+h)^ ) at t + h; (ii) Expected market clearing price at t + h, p ^_( t+h)^* and (iii) Expected market clearing quantity at t + h, q ^_( t+h)^*. The time-series forecast modelling technique includes one of an integrated auto-regressive moving average (ARIMA) and an artificial deep neural network for forecasting.
The step of forecasting is elaborately explained in this section. The forecasting module 202 forecasts the set of state variable parameters through a series of transformations. The forecasting module 202 receives (i) z_k: an historical value of a specific parameter to be forecast; (ii) e_k: a set of exogenous inputs such as weather parameters including historical and real-time temperature, wind speed, humidity and solar insolation; (iii) c_k: a set of calendar inputs such as date and time. The set of state variable parameters are determined after a series of transformations that include a latent transformation (F_1), a concatenation (F_2) and a sophisticated regression coupled attention (F_3) which can be expressed as shown below:
Set of state variable parameters =F_3 (F_2 (F_1 (z_k,e_k,c_k )))
At step 306 of the method (300), the one or more hardware processors 104 receive the second set of input parameters and receiving the third set of input parameters, via the one or more hardware processors, wherein the second set of input parameters are associated with a plurality of entities participating in the double-sided electricity market, and the third set of input parameters are associated with a bidding entity, where a recommendation is to be generated for the bidding entity.
In an embodiment, second set of input parameters include a plurality of parameters such as the cumulative bid quantities and the bid prices of all other entities participating in the market and the third set of input parameters include the set of bid quantities and the bid prices of the entity.
At step 308 of the method (300), generate a market model 204, via the one or more hardware processors 104 for recommending the bidding entity based on the second set of input parameters, the third set of input parameters and the plurality of historic clearing parameters to obtain a clearing price and a clearing quantity from the market model through an optimization technique.
In an embodiment, the optimization techniques include an operations research, a linear programming, a mathematical programming and a genetic algorithm. An example of the optimization technique is expressed as:
max-(q_b,? q?_s )? (?_(b?B)¦?_(k=1)^(m^')¦p_(b,t+h)^k ×q_(b,t+h)^k-?_(s?S)¦?_(k=1)^m¦p_(s,t+h)^k ×q_(s,t+h)^k )

0=q_(b,t+h)^k=q_(b,t+h)^max?b,k

0=q_(s,t+h)^k=q_(s,t+h)^max?s,k
(?_b¦?_k¦q_(b,t+h)^k -?_s¦?_k¦q_(s,t+h)^k )=0
wherein:
q_(b,t+h)^k and p_(b,t+h)^k respectively refer to a kth ‘buy’ bid quantity cleared by the double-sided electricity market for a buyer (b) and the price placed by the buyer b for a time slot t+h;
q_(s,t+h)^k and p_(s,t+h)^k respectively refer to the kth ‘sell’ bid quantity cleared by the double-sided electricity market for a seller (s) and the price placed by the seller s for the time slot t+h ;
b and s respectively refer to a set of all buyers and a set of all sellers in the double-sided electricity market ; and
q_(b,t+h)^(k,max) and q_(s,t+h)^(k,max)refer to the actual kth bid quantity submitted by the buyer b and the seller s respectively for time slot t+h.
At step 310 of the method (300), the one or more hardware processors 104 train the reinforcement model 206 for the bidding entity by modelling the interaction of the set of state variable parameters and the market model as a Markov decision process (MDP). The generation of the reinforcement model 206 comprises of defining a state space, an action space and a reward function.
In an embodiment, the MDP for training the reinforcement model is solved using a reinforcement training technique such as a double deep-Q-network (DDQN), delayed Q learning, actor-critic based approaches using Q-learning based Deep-Q-Network (DQN) technique.
The disclosure proposed to implement Q-learning based DQN technique, wherein the conventional Q learning is combined with deep neural networks to solve MDPs. The state variable parameters are fed as input to the DQN. The error between the actual reward observed for the action taken and the reward predicted by the DQN for the same actin is used to train the DQN to predict the best possible action for a given input of state variables. The generation of the reinforcement model 206 comprises of defining a state space, an action space and a reward function
An exemplary flow diagram as shown in FIG.3C for generation of the reinforcement model 206 by defining a state space, an action space and a reward function as implemented by the system of FIG.1 and includes the following steps.
At step (310A), the method (300) includes defining the state space comprises of identifying a set of state space parameters for time slots in a day starting from a first time slot (t) until a last bidding time slot ( t + H ). The set of state space parameters comprises of the first set of input parameters, the set of state variable parameters, and the second set of input parameters and the third set of input parameters are identified. ‘H ‘is time slots in a day.
In an embodiment, at a given first time slot ‘t’, between t + 1, t+2 until last bidding time slot (t + H). When placing the bid for a time step t + h, 1<= h <= H, the state space for the reinforcement model 206 consists of the following variables ( obtained from the first set of input parameters, the set of state variable parameters, and the second set of input parameters and the third set of input parameters ) as explained below :
A capacity (consumption/generation) of the bidding entity a at t + h (U ^_(a,t+h)^ );
An expected total market demand (D ^_( t+h)^ ) and supply (G ^_( t+h)^ ) at t + h;
An Expected market clearing price at t + h, ((p) ^_( t+h)^*) and an expected market clearing quantity (q ^_( t+h)^*)
An actual capacity (consumption or generation) of the bidding entity a that was available at t+h-24 (U_(a,t+h-24)^ ) and t+h-168 (U_(a,t+h-168)^ );
A actual total market supply and a demand at t+h-24 (G_( t+h-24)^ ,D_( t+h-24)^ ) and t+h-168 (?G_( t+h-168)^ ,D?_( t+h-168)^ );
An actual market clearing price and a quantity at t+h-24 (p_( t+h-24)^*, q_( t+h-24)^*) and t+h-168 (p_( t+h-168)^*, q_( t+h-168)^*);
At step (310B), the method (300) includes defining the action space constitutes determining a plurality of price bands (p) and a plurality of quantity (q) based on the set of state space parameters, wherein the q is determined based on a plurality of distribution profiles of the third set of input parameters and the p is determined based on the first set of input parameters.
In an embodiment, the action space of the bidding entity for the reinforcement model 206 constitutes placing ?? bids (asks) for the pre-defined time slot t + h, 1<= h <= H, based on the values of the state variables at time t + h. The step of defining the action space is elaborately explained in the steps below :
Picking ?? equally spaced price values p_1^ ,…,p_m^ . from the interval [0, p_max^ ] where p_max^ is the maximum possible price value as seen in the historical logs.
Spreading the expected capacity (consumption or generation) of the given participant a at t + h (U ^_(a,t+h)^ ) across the m price values using one of the nine distribution profiles as illustrated in the graphs shown in FIG.4. The graph in FIG.4 illustrates the nine distribution profiles wherein the x-axis represents price and y-axis represents quality.
Obtaining a plurality of quantity (q) from the distribution profile based on different way of splitting U ^_(a,t+h)^ across the m price bands
Estimating m bids for the market participant a for the pre-defined time slot t+h, is given by {??,?}.
At step (310C), the method (300) includes the reward function is defined for the buyers of electricity and a plurality of sellers of electricity in the electricity market and is mathematically expressed as:
r_(t+h)=ß_1×p_(t+h)^*×C_(a,t+h)+ß_2×p_(t+h)^*×U_(a,t+h)
wherein:
ß_1 and ß_2 are hyperparameters representing a specific buyer/seller ;
C_(a,t+h) refers to a quantity cleared by the market for an bidding entity a – which is a function of the bids placed by entity a.
U_(a,t+h) refers to the quantity of generation or demand that is not cleared by the market for the bidding entity a.
C_(a,t+h) and p_(t+h)^* are received from the market model.
The values of p_(t+h)^* and C_(a,t+h) are received from the market module 204. The value of U_(a,t+h) is calculated as U_(a,t+h) = U ^_(a,t+h)^ - C_(a,t+h). Using the above values, the reward function is determined.
Further the bidding parameter is recommended at a bidding time slot for the bidding entity, by the one or more hardware processors 104 via the recommendation model 208, by processing one or more real-time inputs from the double-sided electricity market using the trained reinforcement model, wherein the recommendation is based on a reinforcement learning (RL) objective function and the one or more real-time inputs are associated with the double-sided electricity market. The bidding parameter is shared on the I/O interface 106.
In an embodiment, the RL objective function is expressed as:
J(·)=?_(h=1)^H¦E (?^h r_(t+h)¦s_t )
wherein:
? is a discount factor of the MDP;
s_t is the state at time slot t;
H refers to number of time slots (or auctions) in a trading day.
The step of recommending the bidding parameter is performed at a bidding time slot for the bidding entity using the recommendation model 208, by processing one or more real-time inputs is elaborately explained in the steps below:
Forecasting a set of state variable parameters for the bidding time slot, , through a time-series forecast modelling technique based on the first set of input parameter
Selecting the price bands and the distribution profile based on the RL model
Using the price bands and the distribution profile selected, the m bids for slot t + h, namely, {??,?}, are constructed and submitted to the two sided electricity market.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
EXPERIMENTAL RESULTS :
An experiment is conducted using the market data from European Power Exchange (EPEX) for the years 2016-2019 and the bidding data of a large generator. The large generator’s daily generation capacity varies between 4900 MW to 5150 MW. On this data, the performance of the disclosed reinforcement learning based bidding strategy is tested. The performance of the disclosed reinforcement based bidding in terms of its ability to maximize the revenue for the generating entity has been compared against the following baselines:
Ideal: A baseline is aware of the actual behavior of all other market participants (i.e., their bid/ask curves) without any error and determines the best action through an exhaustive evaluation of all actions available.
Exhaustive Action based on Forecasts (EAF): a baseline that estimates the behavior of other market participants through a simple moving average estimator before determining the best action in exactly the same manner as the Ideal baseline.
Historical: A baseline that derives historical revenue of G from market logs. Note that while the Ideal method is an upper bound on the performance that can be achieved, it is not realizable in practice. However, EAF can be realized in practice.
The data from years 2016 and 2017 is used to train the reinforcement model 206 and the performance of the disclosed technique along with the other methods is for all the above methods is shown in the table 1, below :

Table 1: Average daily revenue obtained by selling electricity in the market under various techniques for different generator capacities.
Thus, as depicted in Table 1 it can be observed that disclosed herein, for recommendations for an entity in a two-sided electricity market is specifically trained to forecast different electricity market state variables such as market demand, supply, clearing prices and clearing quantities and to mimic a clearing mechanism ( market model ) of the two-sided electricity market. The forecasted values and the market models are used to generate the reinforcement model, which is used for recommendation in a two-sided electricity market. The disclosed reinforcement is generic enough to be used by either a buyer entity or a seller entity participating in the two-sided electricity market thus making a more flexible, generic and efficient framework for recommending a bidding parameter.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Orders

Section	Controller	Decision Date
Section 43	DEVASENA AB	2025-10-30
Section 43	DEVASENA AB	2025-10-30

Application Documents

#	Name	Date
1	202021049385-STATEMENT OF UNDERTAKING (FORM 3) [12-11-2020(online)].pdf	2020-11-12
2	202021049385-REQUEST FOR EXAMINATION (FORM-18) [12-11-2020(online)].pdf	2020-11-12
3	202021049385-FORM 18 [12-11-2020(online)].pdf	2020-11-12
4	202021049385-FORM 1 [12-11-2020(online)].pdf	2020-11-12
5	202021049385-FIGURE OF ABSTRACT [12-11-2020(online)].jpg	2020-11-12
6	202021049385-DRAWINGS [12-11-2020(online)].pdf	2020-11-12
7	202021049385-DECLARATION OF INVENTORSHIP (FORM 5) [12-11-2020(online)].pdf	2020-11-12
8	202021049385-COMPLETE SPECIFICATION [12-11-2020(online)].pdf	2020-11-12
9	202021049385-Proof of Right [05-05-2021(online)].pdf	2021-05-05
10	202021049385-FORM-26 [14-10-2021(online)].pdf	2021-10-14
11	Abstract1.jpg	2021-10-19
12	202021049385-FER.pdf	2022-07-04
13	202021049385-OTHERS [29-07-2022(online)].pdf	2022-07-29
14	202021049385-FER_SER_REPLY [29-07-2022(online)].pdf	2022-07-29
15	202021049385-COMPLETE SPECIFICATION [29-07-2022(online)].pdf	2022-07-29
16	202021049385-CLAIMS [29-07-2022(online)].pdf	2022-07-29
17	202021049385-US(14)-HearingNotice-(HearingDate-12-07-2024).pdf	2024-06-19
18	202021049385-FORM-26 [05-07-2024(online)].pdf	2024-07-05
19	202021049385-Correspondence to notify the Controller [05-07-2024(online)].pdf	2024-07-05
20	202021049385-Written submissions and relevant documents [23-07-2024(online)].pdf	2024-07-23
21	202021049385-PatentCertificate30-10-2025.pdf	2025-10-30
22	202021049385-IntimationOfGrant30-10-2025.pdf	2025-10-30

Search Strategy

1	SearchHistoryE_04-07-2022.pdf
2	SearchHistoryAE_29-12-2023.pdf