Abstract: System and method(s) for data migration are described. The method may include identifying at least one functional module associated with a source database for migration where the at least one functional module represents a functional grouping of data in the source database. The method may further include determining a priority associated with each entity from amongst one or more entities of the-at least one functional module to create inter-modular dependencies and inter-modular parallelism. The priority associated with the functional modules and the entities therein is indicative of the order of migration of each entity from the source database to a destination database. The method may also include generating a parallel framework for the data migration from the source database to the destination database based on the inter-modular dependencies and the inter-modular parallelism where the parallel framework may include one or more data migration rules.
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: DATA MIGRATION
2. Applicants)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Indian Nirmal Building, 9th Floor, Nariman Point,
SERVICES LIMITED Mumbai 400021, Maharashtra, India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.
TECHNICAL FIELD
[0001] The present subject matter relates, in general, to database management and, in
particular, but not exclusively, to data migration from one database to another.
BACKGROUND
[0002] Data migration can be a nightmare exercise if not properly planned and executed.
Since software systems demand to be available 24X7 meeting the customer expectations, there is a need to carry out data migration with precision and efficiency. Data migration needs to happen when software systems are migrated from legacy systems to new systems or while upgrading from existing system to an improvised system. To adopt a new system, existing data from the old system needs to be moved or migrated to the new system where the data model might be different. There are possibilities that new attributes would have been created or existing attributes could have been merged. To address this challenge, data needs to be migrated to a new system in an optimal way.
SUMMARY
[0003] This summary is provided to introduce concepts related to data migration from
one database to another, which is further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0004] In one embodiment of the present subject matter, method(s) and a system for data
migration are described. The method of data migration may include identifying at least one functional module associated with a source database where the at least one functional module represents a functional grouping of data in the source database. The method may further include determining a priority associated with each entity of the at least one functional module to create inter-modular dependencies and inter-modular parallelism. The priority associated with the functional modules and the entities thereof is indicative of the order of migration of each entity from the source database to a destination database. The method may also include generating a parallel framework for the data migration from the source database to the destination database
*
based on the inter-modular dependencies and the inter-modular parallelism where the parallel framework may include one or more data migration rules.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The detailed description is described with reference to the accompanying figures.
In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[0006] Fig. 1 illustrates a computing environment, in accordance with an embodiment of
the present subject matter.
[0007] Fig. 2 illustrates components of a data migration system, in accordance with an
implementation of the present subject matter.
[0008] Fig. 3 illustrates a method to migrate data, in accordance with an implementation
of the present subject matter.
DETAILED DESCRIPTION
[0009] System(s) and method(s) for data migration from one database to another are
described. The systems and methods described herein can be implemented on a variety of devices, such as a server, a desktop computer, a notebook or a portable computer, a mainframe computer, a mobile computing device, and the like.
[0010] In today's e-business and competitive world, computing systems, software
applications, and web applications are expected to function round the clock and provide access to data at any given time and with guaranteed quality of service levels. At the same time, the systems and applications also evolve continuously to handle various factors like increase in the amount of data due to existing and new users, seasonal variations in workloads, etc.
[0011] Data migration involves two distinct parts: formulating a migration plan that
details the data to be moved along with the source(s) and destination(s) for the data, and executing the migration plan by moving the data to the specified destination(s). To ease the
planning phase and enable speedup of planning and implementing data migration process, specific frameworks are created to enable the same. Moreover, in many situations postponing a migration task to a later time could result in significant revenue losses, and in such situations data must be migrated from source to destination within minimum time and also without hindering the working of the system or the application.
[0012] The unwillingness of transitioning databases between systems or applications can
come from the amount of effort necessary to move the data from legacy systems to a new system. The amount of effort involves a large amount of human resources, time, and cost. Moving data is an inherently disruptive operation that results in downtime and business interruptions. In situations where a database is large, the time required to migrate data from one database to the other may be huge and as said, organizations today, to cater to the high number of users around the clock, may not be willing to risk the huge downtime of applications and servers due to data migration.
[0013] As a result, organizations are uncomfortable with the resources and time required
for the migration. Further, generally data entities are migrated from the source system to the destination system one after another in a serial manner, which may make data migration a resource and time consuming task. Furthermore, such a process may become more cumbersome in case huge volume of data is to be migrated.
[0014] Certain other data migration techniques, to reduce the required time of migration,
utilize the method of parallel migration. Parallel migration enables concurrent migration of multiple data entities from the source to the destination. However, the method of parallel migration often does not ensure functional dependencies present in the data. Further, parallel migration of data without proper implementation of the functional dependencies may cause irregularities in the data, and even in certain cases, corruption of the data.
[0015] According to an embodiment of the present subject matter, system(s) and
method(s) for data migration are described herein. The embodiments described herein allow data migration from a legacy system to a new system.
[0016] The data migration may occur from a legacy system, hereinafter referred to as
source, to a new system, hereinafter referred to as destination. Further, the data migration
between the source and destination may be performed through different networks, such as an on-chip network, telephone network, a computer network, a wireless network, and a peer-to-peer network.
[0017] Although present subject matter described in details with reference to data
migration from a source database to a destination database, however it will be understood that the concepts explained in context thereof can be extended to software and web applications as well. Additionally, for exemplary purposes, the data migration process involving parallel data migration is described with respect to a trading system and a health care system. However, it would be understood that the described systems and methods may be implemented in different systems and applications involving database management, such as inventory management systems, employee management systems, online shopping portals, banking systems, hotel management systems, and the like.
[0018] The data migration from source to destination is based on a parallel framework
developed for data migration involving technical and functional data migration process. In one implementation, for the purpose of data migration, the functional modules present at the source are identified. The functional modules may represent a functional grouping of data in a database. For example, in a trading system, the database may include different functional modules, such as 'master data', 'order data', 'trade data', and 'payments data'. All such functional modules present at the source are identified and a precedence, referred to as a priority hereinafter, associated to each functional module is determined. The priority related to each functional module may represent the order in which the functional module including data of various forms can be migrated from the source to destination.
[0019] For example, in the trading system described above, the 'master data' may have
the highest priority than any other functional module. Similarly, 'order data' may have higher priority than the 'trade data' but at the same time may have lower priority than the 'master data'. In determination of such priorities related to each functional module, it is also possible that multiple functional modules are identified to have equal priorities. In such a scenario, the functional modules with equal priorities are identified to be the modules that can be migrated in parallel from the source to the destination.
[0020] In one implementation of the present subject matter, upon identification of
priorities related to each functional module, the sequence in which the entities of each functional module can be migrated is identified. As described above for functional modules, similarly certain entities in a functional module may require migration from the source to the destination before the others. To ensure the dependency of entities during data migration, the entities of each functional module are also assigned a priority. For example, the functional module 'master data' of the trading system may include entities, such as 'User', and 'Accounts'. In such a scenario, the entity 'User' may require migration before the migration of the entity 'Accounts', and therefore the entity 'User' would be provided with higher priority than the entity 'Accounts'. It would be understood that similar to the functional modules, several different entities within each module may have equal priority and may be determined to be migrated in parallel from the source to the destination.
[0021] Based on the identified priorities of the functional modules and the priorities of
entities within each functional module, inter-modular dependencies and inter-modular parallelism are determined. According to an implementation of the present subject matter, the inter-modular dependencies signify the precedence of the functional modules and the entities in relation with the other functional modules and the entities, respectively. Similarly, the intermodular parallelism may signify the functional modules and the entities that can be migrated in parallel with other functional modules and entities, from the source to the destination. The intermodular dependencies and parallelism may therefore help in determining the data migration where migration of an entity or a functional module is triggered based on the priority associated with the entity or the functional module.
[0022] Further, based on the inter-modular dependencies and the inter-modular
parallelism, a parallel framework is created to enable data migration from the source to the destination. In one implementation, the parallel framework includes the data migration rules created based on the inter-modular dependencies and parallelism for data migration. The data migration rules may include the schedule of migration of different entities, the schedule of migration of different functional modules, the scripting language to enable the data migration, the schedule of time instances for log creation, etc. Such parallel framework for migration allows concurrent migration that feduces the time required for data migration from the source to the
destination without any conflict of functional dependencies. Therefore, such a parallel framework allows for optimized data migration from the source to the destination.
[0023] Further, the parallel framework may also define the restart logic and restart
parameters for the purpose of log keeping and optimized data migration. The restart logic may define the status of migration process, i.e., it may include the status in terms of migrated entities and the functional modules. In situations of restart of data migration process, the restart logic ensures that the migrated entities or functional modules are not migrated again. Further, the restart logic may also track the extent of migration of an entity or a functional module so that in situations of unexpected failure or restarts, the data migration may resume from the last migrated instance, thereby ensuring consistency between the source and the destination and reducing the time taken in roll back of re-migrations.
[0024] The above methods and systems are further described in conjunction with the
following figures. It should be noted that the description and figures merely illustrate the principles of the present subject matter. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the present subject matter and are included within its spirit and scope. Furthermore, all examples recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the present subject matter and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the present subject matter, as well as specific examples thereof, are intended to encompass equivalents thereof.
[0025] While aspects of described systems and methods for data migration can be
implemented in any number of different computing devices, environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s) and method(s).
[0026] Fig. 1 illustrates a computing environment 100 implementing a data migration
system 102, according to an embodiment of the present subject matter. The computing
environment 100, apart from the data migration system 102, includes at least one source computing system 104 connected to one or more destination computing systems 106-1, 106-2, ..., 106-N via a network 108. The source computing system 104 represents a legacy system connected to the one or more new or advanced destination computing systems 106-1, 106-2, ..., 106-N to which data from the source computing system 104 may be migrated. For the purpose of explanation and clarity, the destination computing systems 106-1, 106-2, ..., 106-N, are hereinafter collectively referred to as destination computing systems 106 and hereinafter individually referred to destination computing system 106.
[0027] The network 108 may be a wireless network, wired network, or a combination
thereof. The network 108 can be implemented as one of the different types of networks, such as intranet, telecom network, electrical network, local area network (LAN), wide area network (WAN), Virtual Private Network (VPN), internetwork, Global Area Network (GAN), the Internet, and such. The network 108 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the network 108 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices.
[0028] The data migration system 102, the source computing system 104, and the
destination computing systems 106 can be implemented as any of a variety of conventional computing devices including, for example, servers, a desktop PC, a notebook or a portable computer, a workstation, a mainframe computer, a mobile computing device, an entertainment device, and an Internet appliance. Although the source computing system 104 and the destination computing systems 106 are shown to be connected through a physical network 108, it would be appreciated by those skilled in the art that the source computing system 104 and the destination computing systems 106 may be distributed locally or across one or more geographic locations and can be physically or logically connected to each other.
[0029] Further, the data migration system 102, although shown separate from the source
computing system 104 and the destination computing systems 106, can be implemented by any of the source computing system 104 and the destination computing systems 106. Although
foregoing description is with respect to migration of data from the source computing system 104 to a single destination computing system 106; however it will be appreciated that the data may be migrated to multiple computing systems, such as the destination computing system 106-1, and 106-2, as well. Likewise, data may be migrated from multiple source computing system 104 to a single destination computing system 106 or to multiple destination computing systems 106.
[0030] The source computing system 104, apart from other things, includes a source
database 110 having data, which is migrated by the data migration system 102 to the destination computing system 106. In one implementation, the source database 110 may include data related to an organization or an enterprise that may be utilized for management purposes, such as employee data, assets data, liability data, revenue data, and the like. In another implementation, the source database 110 may include data associated with a website or an online portal which may include pages data, user data, accounts data, products data, and the like. Therefore, it would be understood by those skilled in the art that the source database 110 may include data related to different fields of operation depending upon the industry of implementation.
[0031] The destination computing systems 106, to receive the data from the source
database 110, may include a destination database, such as 112-1, 112-2, ..., 112-N respectively, collectively referred to as destination databases 112 and individually referred to as destination database 112. It would be understood that the source database 110 and the destination database 112 can be managed by any database management system (DBMS) known in the art, such as Oracle DBMS, Access and SQL Server from Microsoft, DB2™ from IBM, and the Open source DBMS MySQL. Although the database is managed by a DBMS, for the sake of clarity and simplicity, the combination of the database and its respective DBMS is referred as the source database 110 and the destination database 112.
[0032] The data migration system 102 includes, amongst other things, a framework
generation module 114. The framework generation module 114 can also be provided in an external storage media, which may interface with the data migration system 102. In one implementation, the framework generation module 114 generates a parallel framework for the data migration from the source computing system 104 to the destination computing system 106. According to an implementation of the present subject matter, the parallel framework includes data migration rules of migration of data from the source database 110 to the destination
database 112. The framework generation module 114 may generate the parallel framework based on various parameters associated with the source database 110, such as, functional dependencies of the functional modules and entities within of each functional module.
[0033] The parallel framework generated by the framework generation module 114 may
not only define the order of migration of data stored in the source database, but may also define the manner in which the migration may occur. For example, the parallel framework may define the scripting language to be utilized for data migration, the schedule of time instances where logs are to be maintained, etc.
[0034] The data migration system 102, performs the data migration from the source
database 110 to the destination database 112 based on the parallel framework. The details of the parallel framework generation along with the technique for data transfer are discussed in detail with respect to description of Fig. 2.
[0035] Fig. 2 illustrates various components of the data migration system 102, according
to an embodiment of the present subject matter. The data migration system 102 includes interface(s) 202, one or more processor(s) 204, and a memory, such as a memory 206, coupled to the processor(s) 204.
[0036] The interfaces 202 may include a variety of software and hardware interfaces, for
example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, and a printer. Further, the interfaces 202 may enable the data migration system 102 to communicate with different computing systems, such as the source computing system 104 (shown in Fig. 1) and the destination computing systems 106 (shown in Fig. 1). The interfaces 202 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example local area network (LAN), cable, etc., and wireless networks such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the interfaces 202 may include one or more ports for connecting a number of destination computing systems 106 to each other or to another computing system, such as the source computing system 104. In one implementation, the data migration system 102 communicates with the source computing system 104 and the destination computing systems 106 via the interfaces 202.
[0037] The processor 204 can be a single processing unit or a number of units, all of
which could include multiple computing units. The processor 204 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 204 is configured to fetch and execute computer-readable instructions and data stored in the memory 206.
[0038] The functions of the various elements shown in the figures, including any
functional blocks labeled as "processor(s)", may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage.
[0039] The memory 206 may include any computer-readable medium known in the art
including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 includes module(s) 208 and data 210. The modules 208, amongst other things, include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types.
[0040] The data 210 serves, amongst other things, as a repository for storing data
processed, received and generated by one or more of the modules 208. The modules 208 further include, for example, the framework generation module 114, a functional groups determination module 212, a priority determination module 214, data migration module 216, and other module(s) 218. The other modules 218 may include programs that supplement applications on the data migration system 102, for example, programs in the operating system. The data 210 includes data generated as a result of the execution of the one or more modules 208. The data
210 includes priority data 220, framework data 222 and other data 224. The other data 224 includes data generated as a result of the execution of one or more modules in the other module(s) 218.
[0041] In one implementation, the data migration system 102 optimizes the process of
data migration from the source database 110 to the destination database 112 by generating a parallel framework for data migration.
[0042] In operation, according to an implementation of the present subject matter, the
priority determination module 214 of the data migration system 102 identifies different functional modules in the source database 110. The functional modules present in the source database 110 are identified to enable migration of data based on functional dependencies, requirements and constraints. The functional modules represent a functional grouping of data in the source database 110. For example, data stored in the database of a trading system may include data related to multiple functional requirements of trading, stored in different functional modules, such as 'master data', 'order related data', 'trade related data', and 'payments related data'. Further, the data related to the field of trading may be stored in the source database under several entities that may not necessarily form one group due to the functional requirements.
[0043] In one implementation, functional modules of a database are defined based on the
industry of operation. Thus, in an example, a database implemented in a healthcare industry may include functional modules, such as master data, patients related data, staff related data, medicine related data, inventory data, billing data, and the like. The functional groups determination module 212 identifies such functional modules present in the database.
[0044] In one implementation, the priority determination module 214 may then associate
a priority to each of the identified functional modules of the source database 110. The priority determination module 214 may associate the priority to each functional module depending upon the requirement of the order of migration of each functional module. The order of migration may be dependent on the dependency of a functional module on another functional module, i.e., the functional dependency between the functional modules. For example, the functional module 'billing data' may inherit data from the functional module 'patients related data' and hence for consistency of data, 'patients related data' should be migrated before 'billing data' and should be
provided with higher priority. In other words, a functional module required to be migrated earlier than another functional module would be associated with a higher priority and the priority represents the order in which each functional module can he migrated from the source database 110 to the destination database 112. In said implementation, the priorities associated with each functional module may be stored in the priority data 220 by the priority determination module 214. In one implementation, the functional dependencies associated with the functional modules of the source database 110 are identified and stored in the DMTReplicateParent table of the source database.
[0045] For example, in the health care system described earlier, the master data may be
required to be migrated ahead of any other functional module and therefore may be associated with the highest priority, say PI. Similarly, the functional module 'patients data' may be required to be migrated after the 'master data' but before the 'medicine related data' and therefore, the priority determination module 214 may associate a priority, P2, to the functional module 'patients data' greater than that of 'medicine related data' but lesser than the priority of the 'master data'. In said situation, if the functional module 'medicine related data' can be migrated . concurrently .with the other functional module 'inventory data', the priority determination module 214 would associate an equal priority, P3 to both the functional modules.
[0046] Thus, the functional modules that are to be migrated concurrently, or to say,
parallely from the source database 110 to the destination database 112, may be associated with equal priorities. Such concurrent migrations may be hereinafter referred to as parallel migrations. It would be understood that the priority determination module 214 may identify to associate multiple functional modules with equal priorities however, depending upon the functional dependency, the priority determination module 214 may not associate equal priority with any of the identified functional modules. Similarly, according to one implementation, the priority determination module 214 may associate equal priorities with all the identified functional modules based on the identified possible order or migration.
[0047] It would be understood that the different functional modules may include multiple
entities involving functional dependency/relation with other entities. Each such entity may further have related attributes based on which the actual data is stored in the form of records at the source database 1 lO.For example, 'patients related data' may be a functional module in the
health care industry. This functional module may include multiple entities such as 'Patients', 'Disease', and 'Medical history'. Each entity such as 'Patients' may include multiple attributes, such as patients name, patients age, patients address, and patients nationality and under these attributes the actual data related to different patients is stored in the form of records.
[0048] According to an implementation of the present subject matter, the priority
determination, module 214 also associates priority with different entities of each functional module. As described above, each functional module may include several entities of the source database 110. Such entities may involve interdependencies or relations with other entities in the same functional module depending upon the structure of the source database 110. Hence, the priority determination module 214, determines the priority for each entity within each functional module based on the interdependencies of the entities. Although it has been described that within each functional module, there may be several entities however, it would be understood by those skilled in the art that there may be only a single entity in a functional module.
[0049] Similar to the association of priorities with the functional modules, the priority
determination module 214 may associate priorities with the entities of each functional module. Upcn assignment of the priorities to the entities of each functional module, an entity may be associated with a higher priority whereas; an other entity may be associated with a lower priority. Further, the priority determination module 214 may also associate multiple entities of a functional module with equal priorities. The association of equal priorities with different entities is indicative of the fact that such entities can be migrated concurrently from the source database 110 to the destination database 112. It would be appreciated that the priority of an entity is determined based on the other entities of the same functional module and the entities of different functional modules are not considered to be compared to determine the relative priorities with respect to each other.
[0050] In another implementation, the priority determination module 214 is further
configured to create inter-modular dependencies and inter-modular parallelism based on the associated priorities with the entities of each functional module. Since the priority associated with each entity within a functional module is indicative of the order in which the entities of that functional module can be migrated, the priority determination module 214 creates the details of
dependencies of entities within each functional module and further based on the dependencies of the functional modules to determine the overall dependency of entities relative to each other.
[0051] Referring to the example of health care, the functional module 'Patients data' may
include entities such as 'Patients', 'Disease', and 'Medical history'. The priority determination module 214 may have assigned priorities to these entities where the entity 'Patients' has the priority P1E1 and the entity 'Disease' and 'Medical history' have an equal priority P1E2. In said example, the PI associated with each entity priority El and E2 describes the priority of the functional module to which the entities belongs. The priority determination module 214 may store the data related to the priorities of the entities in the priority data 220.
[0052] Similarly, the functional module 'medicine related data' may include entities such
as 'Manufacturer' and 'Salts'. The priority determination module 214 may associate a priority of P3E1 and P3E2 with the entities 'Manufacturer' and 'Salts'. Here, it would be understood that since the priority of the functional module 'medicine related data' is P3, therefore priority of each entity priority may begin with P3 or in other words is a combination of a priority of the functional module and a priority of the entity under consideration. Therefore, such association of priorities with the entities and the functional modules helps in creating the inter-modular dependencies and the inter-modular parallelism. Further, it will be understood that the specific way in which the priority of a functional module, say, PI and priority of an entity, say, P1E2, have been discussed, are only for the purpose of explanation and not as a limitation.
[0053] The inter-modular dependencies may define a serial order in which the entities of
various functional modules be migrated. Similarly, the inter-modular parallelism may define the entities of various functional modules that are functionally independent and that can be migrated concurrently. The creation of the inter-modular dependencies and inter-modular parallelism for the entities and the functional modules may not only define the priority of entities within each functional module, but may also provide the relative priorities of the entities across various functional modules.
[0054] Upon creation of the inter-modular dependencies and inter-modular parallelism,
the framework generation module 114 generates a parallel framework based on the inter-modular dependencies and inter-modular parallelism. In one implementation of the parallel framework
generated based on the inter-modular dependencies and inter-modular parallelism includes data migration rules to define migration of data. The data migration rules included in the parallel framework may include, but not limited to, a schedule of migration of the entities, a schedule of migration of functional modules, a schedule of time instances for log creation, a scripting language to enable data migration, and the like. The framework generation module 114 may store the data related to the parallel framework and the data migration rules in the framework data 222.
[0055] Based on the' parallel framework, the data migration system 102 may migrate data
from the source database 110 to the destination database 112. It would be understood that since the parallel framework includes the schedule of migration of entities and functional modules based on the inter-modular dependencies and inter-modular parallelism, the migration of data from the soiyce database 110 can be done where multiple entities and multiple functional modules can be migrated in parallel based on functional dependencies. Such parallel framework for migration allows concurrent migration without any conflict of functional dependencies and therefore reduces the time required for data migration from the source database 110 to the destination database 112. Thus, the parallel framework created to enable the data migration from the source database 110 to the destination database 112 allows data migration in an optimized manner.
[0056] In another implementation of the present subject matter, the data migration system
102 includes the data migration module 216 configured to migrate data from the source database 110 to the destination database 112 based on the parallel framework. For this purpose, the data migration module 216 is configured to execute the parallel framework using a scripting language. In said implementation, the data migration module 216 may execute the parallel framework by the scripting language defined in the data migration rules of the parallel framework its"elf. However, it would also be understood that the data migration module 216 may execute the parallel framework using any other scripting language.
[0057] As described above, the data migration module 216, for migration of data from
the source database 110 to the destination database 112 may execute the parallel framework. The parallel framework defines the inter-modular dependencies and the inter-modular parallelism to define the parallel/serial migration of the functional modules and the entities. Hence, in one
implementation of the present subject matter, the data migration module 216 is configured to initiate multiple processes in parallel, through the scripting language, to concurrently migrate data based on the parallel framework.
[0058] In certain situation, it is possible that the parallel framework may define and allow
the concurrent migration of several functional modules or entities based on functional dependency of the database. In such situations, the data migration module 216 may initiate multiple processes in parallel to migrate data from the source database 110 to the destination database 112 but, the data migration module 216 may also be configured to validate and restrict the number of process initiated in parallel to migrate data. Each process initiated in parallel to already existing processes may add an overhead to the processing capability of the data migration system 102. In such cases, to balance the load on the data migration system 102, , the data migration module 216 may be configured to identify a threshold number of scripts, such as shell scripts that can be initiated in parallel for data migration based on the number of processors available with the data migration system 102. In another implementation, the data migration module 216 may identify the threshold number based on a user input or data stored by a user in other data 224.
[0059] For example, if the data migration system 102 has 'n' available processors, the
data migration module 216 may initiate any number of processes in parallel where the total number of such processes is less than the value 'n'. In other words, the number of process initiated in parallel is not more than the total number of available processors at the data migration system 102. It would be appreciated that the data migration processes initiated in parallel are validated before initiation to ensure that the data migration system 102 is not overloaded with the parallel processes to migrate data thereby not slowing down the overall data migration process.
[0060] The process of parallel data migration is further explained with the help of the
following example:
SHELL SCRIPT CORRESPONDING QUERY RUN BY THE SHELL SCRIPT
FORMAT SELECT 'DMT_PROCEDURE:'||
Run_Migration_Procedures.ksh Pl.DATA_FMLY||','||
DEP.REPLlCATlON,'Y',1,0))),MAXtPl.SEQUENCENO),Y;N') PROC
FROM --DMTREPLICATEPARENT PI
(SELECT * FROM DMT_REPLICATE PARENT
WHERE TABLENAME IN
('ST_DEAL7STJ}RDER7DB') )P1
LEFT OUTER JOIN (SELECT * FROM dmt_replicate_parent WHERE SEQUENCENO=l)DEP
ON P1.DEP_PARENT_FMLY = DEP.DATAFMLY
GROUP BY Pl.DATAFMLY, PI.REPLICATION
ORDER BY P1.DATA_FMLY
Table 1
[0061] , The above depicted shell script and the query run by the shell script in Table 1
illustrate an example of the data migration process followed by the data migration system 102, according to an implementation of the present subject matter. In said implementation, the data migration module 216 may execute the shell script iteratively and the shell script may query the source database 110 upon every execution to generate an output defining the migration process from the source database 110 to the destination database 112. The execution of the shell script may also define the status of the migration process with respect to different functional modules present in the source database 110. The execution command of the shell script, as described in Table 1, defines the general format of a shell script that the data migration module 216 may execute for data migration. According to the format specified in Table 1, in one implementation, the data migration module 216 may execute the shell script as:
Run_Migration_Procedures.ksh STTransacticnProcedures.sql 3 ...(1)
[0062] The above shell script execution command (1) defines that the data migration
module 216 would execute the shell script 'run_migration_procedures.ksh' to migrate the data of the source database 110 where the list of functional modules present in the source database 110 is available in the file ST_transaction_procedures.sql. The execution of the command (1) also defines that during the generation of an output defining the migration process, the maximum number of parallel migrations shall not increase 3. As describes before, based on the execution of the command (1), the shell script may run a query as depicted in Table 1. Based on the result of the query, the,data migration module 216 may generate an output, such as:
AFTER_MIG_REPLICATION,N,N MIG_CPM_OD_REPLICATE,N,N MIG_EXECUTIONS_REPLICATTON,Y,Y MIG_MERGE_INTERMEDIATE_TABLES,N,N MIG_RD_REPLICATION,Y,Y MIG_SAST_MOVEMENT_OFFSETl, N, N MIG_SAST_MOVEMENT_OFFSET2,N,N MIG_SAST_PROC_FX,N,N MIG_SA_MOVEMENT,Y,Y MIG_SA_PORT_MOVEMENT,Y,Y MIG_STSA_TLD_OFFSETl,N,Y MIG_ST_ADVICES2_REPLICATION,N,N - MIG_ST_ADVICE_DETAIL,N,N
MIG_ST_ADVICE_DETAIL_OD,N,Y MIG_ST_ADVICE_OD_REPLICATION,N,N
[0063] The above described output represents the list and order of functional modules
which need to' be migrated from the source database 110 to the destination database 112 based on the functional intra-modular dependencies and intra-modular parallelism. It would be understood by those skilled in the art that upon every iterative execution of the shell script, the list of functional modules along with the migration dependency may be updated depending upon the completion of migration of other functional modules. In other words, as the migration of one functional module is complete, the status of the functional module would be updated in the output and the migration status of other functional modules whose migration is directly dependent on the migration of such functional module would also be accordingly changed. This is further explained with the help of following description.
[0064] Each functional module described above may be represented in the format:
, ,
where all the modules are represented either as N, N, N, Y, Y, Y, or Y, N. In the representation, N, N, the first 'N' after the indicate that either the migration of the functional module has not yet started, or the migration has not yet completed successfully. The second 'N' represents that the all functional modules on which the migration of this functional module depends upon, have not yet successfully completed the migration.
[0065] Similarly, the representation N, Y indicate that either the
migration of the functional module has not yet started, or the migration has not yet completed successfully and all the functional modules on which the migration of this functional module depends upon, have been migrated successfully. Further, the representation of functional module in a manner such as Y, Y indicates that the functional module has been migrated successfully, and all the functional modules on which its migration depends on, have also been migrated successfully.
[0066] The shell script would be executed iteratively by the data migration module 216
until the migration of all the functional modules has been triggered. Each triggered migration of a functional module may cause an update in its status entry in the output where the status of the first 'N/Y' in the functional module representation format is updated to ' Y' if the migration of the functional module is completed successfully. This update will trigger a status update of all the functional modules dependent on functional module migrated successfully. In such a case, the status of such dependent functional modules may changes and the second 'N/Y' entry may be updated to Y, thereby changing the status to ' N,Y'. Such modules may then be triggered by the script in the next loop. It would be understood by those skilled in the art that the representation Y,N, is not possible and would not occur since no functional module would be migrated before the complete migration of the functional modules on which it depends.
[0067] The data migration module 216, apart from other things, may also implement a
restart logic. In situations of restart of data migration process, the restart logic may ensure that the migrated entities or functional modules are not migrated again. Therefore, the data migration
module 216 may implement the restart logic where a log of migration of each entity of the functional module from the source database 110 to the destination database 112 is maintained. This may be achieved by storing restart parameters, such as the extent of migration of records of an entity or a functional module, the time instance of the last migration, the completion status of the migrated entities, and the like. Hence, in situations of un-expected failure or restarts of the migration procedure, the migration may resume from the last migrated instance, thereby ensuring consistency between the source and the destination and reducing the time of roll back and re-migrations. The data migration module 216 may also store the restart parameters in the framework data 222.
[0068] In accordance with an implementation of the present subject matter, the data
migration module 216, during the migration of the data from the source database 110 to the destination database 112 may also utilize the intra-modular parallel options available in the DBMS of the source database 110 and the DBMS of the destination database 112. Based on the intra-modular dependencies, the data migration module 216 may utilize the functionalities provided by the DBMS of the source database 110 and the destination database 112 to migrate data more efficiently in addition to the usage of the parallel framework. For example, 7*+ parallel(,) */' hint may be utilized for bulky queries in oracle to achieve intra-modular parallel options.
[0069] The data migration module 216, during migration of data from the source
database 110 to the destination database 112 according to the parallel framework, may also implement bulk transfer approach of data migration. As would be understood by those skilled in the art, to utilize bulk transfer approaches, the entities to be migrated are divided into several groups and then each group may be transferred from the source to the destination in one migration process. The groups may be divided based on the logical constraints associated with each group. For example, an entity with 10 million records may be divided into 3 groups, grpl, grp2, and grp3 where grpl may include 5 million records, grp2 may include 3 million records, and grp3 may include 2 million records. The three groups identified may have different associated logical constraints, and therefore, may be migrated separately. However, each group including multiple records, such as, 5 million, 3 million, and 2 million may be migrated in one migration process.
[0070] Further, the data migration module 216 may utilize a non-recoverable approach
during bulk migrations. The utilization of non-recoverable approach allows the data migration module 216 to not to maintain records of such bulk migrations and therefore allows for quick and faster migration of data from the source database 110 to the destination database 112. Also, the data migration module 216 may also resize the undo, redo, and memory structures according to the requirements of parallel transfers and the bulk transfers performed by the data migration module 216.
[0071] In certain situations, while migrating data from the source database 110 to the
destination database 112, the configuration of data structure at the destination may be different from the configuration at the source. In such situations, the data migration module 216 may create staging tables to temporarily stage the data before migrating it to the destination database. The creation of staging tables and temporary storage of data may include partitioning approach to fasten table full scan operations that may further enable proper migration of data from the source to the destination. The data migration module 216 may also remove the staging tables upon completion of the migration process to free the used space and allow reusability of the intermediate storage space, thereby providing efficient data migration.
[0072] In addition, the data migration module 216 may also gather optimizer statistics on
the staged tables for better generation of database access paths and create appropriate indexes on the staged tables. The data migration module 216 may update the statistics of the entity tables data after a sizable data is migrated for the entity so that the data migration module 216 may generate migration plans for the entity tables. In one example, this may be done by the execution of dbms_stats.gatherjable_stats in an oracle™ DBMS. The optimizer statistics of staged tables and indexing of staged tables would be known to a person skilled in the art and hence, the description of the same has been omitted here for the sake of brevity.
[0073] The data migration system 102, according to another implementation of the
present subject matter, may utilize a multi-phased approach of data migration. In such an approach, the historical data, or the data which is has not been modified for quiet some time is migrated first. For this purpose, the data migration system 102 may first create a parallel framework for such kind of data and perform the migration from the source to the database. The data migration system 102 may then migrate the online transaction data which may have been
modified within a time period of 2-3 days and similarly, for the migration of such data as well, a different parallel framework may be prepared. The data migration system 102 may finally migrate recent transaction data which may have been modified with 12-24 hours of data migration. Although it has been described that for different kind of data, different parallel frameworks are generated, however, it may be understood that a single parallel framework may include data migration rules pertaining to migration of different kinds of data.
[0074] ' Fig. 3 illustrates an exemplary method 300 for data migration, in accordance with
an implementation of the present subject matter. According to an aspect, the concepts of data migration are described with reference to the data migration system 102 described above.
[0075] The exemplary method may be described in the general context of computer
executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0076] The order in which the methods are described is not intended to be construed as a
limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternative method. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. The method is explained with reference to a data migration system, however, it will be understood that the method 300 can be implemented for a plurality of data migration systems.
[0077] At block 302, at least one functional module associated with a source database is
identified. In one implementation, the data stored in the source database is divided into functional modules based on functional dependencies of the industry of operation. Such functional modules may represent the functional grouping of data in the source database and may
include multiple entities of the source database. It would be understood that multiple entities may include several attributes and the data of the database would be stored in the form of records under such attributes.
[0078] At block 304, priority associated with the one or more functional modules and
each entity of the one or more functional modules is determined. The priority associated with each functional module may be determined based on the functional dependency of the functional modules. The functional dependency of the functional modules may define the order in which the functional modules can be migrated from the source database to a destination database. Similarly, determination of a priority associated with the entities of each functional module is done based again on the order in which the entities can be migrated from the source database to the destination database.
[0079] During determination of priorities, multiple functional modules may also be
associated equal priority and similarly, multiple entities within a functional module may also be associated with equal priority of migration from the source database to the destination database. Association of an equal priority with more than one functional module or more than one entity may signify that these functional modules or entities can be migrated concurrently from the source database to the destination database. It would be understood that in situations when only one functional module is identified at the source database, the priority is only associated with the entities present in the functional module.
[0080] At block 306, inter-modular dependencies and inter-modular parallelism based on
the determined priority of each entity are created. Since, the priorities associated with the functional modules and the entities of such functional modules signify the order in which the functional modules and the entities can be migrated, inter-modular dependencies and intermodular parallelism are created based on the priorities. The inter-modular dependencies may define the serial order in which entities can be migrated whereas the inter-modular parallelism may define the entities that can be migrated in parallel from the source database to the destination database.
[0081] At block 308, a parallel framework for data migration based in part on the inter-
modular dependencies and inter-modular parallelism is generated. In one implementation of the
present subject matter, the parallel framework includes data migration rules that define the migration of data from the source database to the destination database. The data migration rules may include the schedule of migration of different entities, the schedule of migration of different functional modules, the scripting language to enable the data migration, the schedule of time instances for log creation, etc. Such parallel framework for migration allows concurrent migration of data that reduces the time required for data migration from the source to the destination without any conflict of functional dependencies. Therefore, such a parallel framework allows for optimized data migration from the source to the destination.
[0082] At block 310, a threshold number of shell scripts to be initiated in parallel for data
migration enabled by the parallel framework are identified. In one implementation, the parallel migration of data enabled by the parallel framework is validated by a data migration system 102, such as the data migration system 102. The data migration system initiates parallel processes to migrate data 'from the source database to the destination database based on the number of processors available to the data migration system 102. In another implementation, the data migration system 102 may query a user to define the threshold of processors to be utilized for the purpose of migration and based on the input; the data migration system 102 may identify the threshold number.
[0083] It would be understood that in situations where number of parallel migrations to
be done exceed the number threshold number, the data migration module 216 of the data migration system 102 may choose among the parallel migration and select up to a maximum of threshold number of migrations based on any arbitration logic known in the art.
[0084] Although embodiments for methods and systems for data migration have been
described in a language specific to structural features and/or methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary embodiments for data migration.
I/We claim:
1. A method for data migration, the method comprising:
identifying at least one functional module associated with a source database for migration, wherein the at least one functional module represents a functional grouping of data in the source database;
creating inter-modular dependencies and inter-modular parallelism based on a priority associated with each entity of the at least one functional module, wherein the priority is indicative of the order of migration of each entity from the source database to a destination database; and
generating a parallel framework for the data migration from the source database to the destination database based in part on the inter-modular dependencies and the intermodular parallelism, wherein the parallel framework includes one or more data migration rules.
2. The method as claimed in claim 0, wherein the determining further comprises associating a priority with each of the at least one functional module, and wherein the parallel framework generation is based in part on the priority associated with each of the at least one functional module.
3. The method as claimed in claim 2, wherein the priority associated with each of the at least one functional module is determined based on functional dependency between the each functional module.
4. The method as claimed in claim 0, wherein the rules of data migration comprise one or more of a schedule of migration of the entities, a schedule of migration of functional modules, a schedule of time instances for log creation, and a scripting language to enable data migration.
5. The method as claimed in claim 0, wherein the generating the parallel framework further comprises identifying a threshold number of scripts to be initiated in parallel for the data migration based on available processors to a data migration system.
6. The method as claimed in claim 0, wherein the parallel framework is executed utilizing a scripting language.
7. A data migration system (102) comprising:
a processor (204); and
a memory (206) coupled to the processor (204), wherein the memory (206) comprises:
a functional groups determination module (212) configured to identify at least one functional module associated with a source database for migration, wherein the at least one functional module represents a functional grouping of data in the source database;
a priority determination module (214) configured to determine a priority associated with each entity from amongst one or more entities of the at least one functional module to create inter-modular dependencies and inter-modular parallelism, wherein the priority is indicative of the order of migration of each entity from the source database to a destination database; and
a framework generation module (114) configured to generate a parallel framework for the data migration from the source database to the destination database based in part on the inter-modular dependencies and the inter-modular parallelism, wherein the parallel framework includes one or more data migration rules.
8. The data migration system (102) as claimed in claim 7, wherein the priority determination module (214) is further configured to determine priority associated with each of the at least one functional module based on functional dependency between the at least functional module.
9. The data migration system (102) as claimed in claim 7, wherein the data migration system (102) further comprises a data migration module (216) configured to identify a threshold number of scripts to be initiated in parallel for the data migration based on available processors to a data migration system.
10. The data migration system (102) as claimed in claim 7, wherein the priority determination module (214) is configured to associate priority with each entity from amongst one or more entities of the at least one functional module based on functional dependency between the each entity.
11. The data migration system (102) as claimed in claim 8, wherein the priority determination module (214) is configured to associate priority with each of the at least one functional module based on functional dependency between the each functional module.
12. The data migration system (102) as claimed in claim 8, wherein the data migration module (216) is further configured to execute the parallel framework utilizing a scripting language.
13. The data migration system (102) as claimed in claim 7, wherein the rules of data migration comprise one or more of a schedule of migration of the entities, a schedule of migration of functional modules, a schedule of time instances for log creation, and a scripting language to enable data migration.
14. A computer-readable medium having computer-executable instructions that when executed perform acts comprising:
identifying at least one functional module associated with a source database for migration, wherein the at least one functional module represents a functional grouping of data in the source database;
creating inter-modular dependencies and inter-modular parallelism based on a priority associated with each entity from amongst one or more entities of the at least one functional module, wherein the priority is indicative of the order of migration of each entity from the source database to a destination database; and
generating a parallel framework for the data migration from the source database to the destination database based in part on the inter-modular dependencies and the intermodular parallelism, wherein the parallel framework includes one or more data migration rules.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 2809-MUM-2011-FORM 18(12-10-2011).pdf | 2011-10-12 |
| 1 | 2809-MUM-2011-US(14)-HearingNotice-(HearingDate-12-04-2021).pdf | 2021-10-03 |
| 2 | 2809-MUM-2011-CORRESPONDENCE(12-10-2011).pdf | 2011-10-12 |
| 2 | 2809-MUM-2011-Written submissions and relevant documents [27-04-2021(online)].pdf | 2021-04-27 |
| 3 | 2809-MUM-2011-FORM 1(23-11-2011).pdf | 2011-11-23 |
| 3 | 2809-MUM-2011-Correspondence to notify the Controller [09-04-2021(online)].pdf | 2021-04-09 |
| 4 | 2809-MUM-2011-CORRESPONDENCE(23-11-2011).pdf | 2011-11-23 |
| 4 | 2809-MUM-2011-CLAIMS [19-09-2018(online)].pdf | 2018-09-19 |
| 5 | 2809-MUM-2011-POWER OF ATTORNEY(25-11-2011).pdf | 2011-11-25 |
| 5 | 2809-MUM-2011-COMPLETE SPECIFICATION [19-09-2018(online)].pdf | 2018-09-19 |
| 6 | 2809-MUM-2011-DRAWING [19-09-2018(online)].pdf | 2018-09-19 |
| 6 | 2809-MUM-2011-CORRESPONDENCE(25-11-2011).pdf | 2011-11-25 |
| 7 | Form-3.pdf | 2018-08-10 |
| 7 | 2809-MUM-2011-FER_SER_REPLY [19-09-2018(online)].pdf | 2018-09-19 |
| 8 | Form-1.pdf | 2018-08-10 |
| 8 | 2809-MUM-2011-OTHERS [19-09-2018(online)].pdf | 2018-09-19 |
| 9 | 2809-MUM-2011-FER.pdf | 2018-08-10 |
| 9 | Drawings.pdf | 2018-08-10 |
| 10 | ABSTRACT1.jpg | 2018-08-10 |
| 11 | 2809-MUM-2011-FER.pdf | 2018-08-10 |
| 11 | Drawings.pdf | 2018-08-10 |
| 12 | 2809-MUM-2011-OTHERS [19-09-2018(online)].pdf | 2018-09-19 |
| 12 | Form-1.pdf | 2018-08-10 |
| 13 | 2809-MUM-2011-FER_SER_REPLY [19-09-2018(online)].pdf | 2018-09-19 |
| 13 | Form-3.pdf | 2018-08-10 |
| 14 | 2809-MUM-2011-CORRESPONDENCE(25-11-2011).pdf | 2011-11-25 |
| 14 | 2809-MUM-2011-DRAWING [19-09-2018(online)].pdf | 2018-09-19 |
| 15 | 2809-MUM-2011-COMPLETE SPECIFICATION [19-09-2018(online)].pdf | 2018-09-19 |
| 15 | 2809-MUM-2011-POWER OF ATTORNEY(25-11-2011).pdf | 2011-11-25 |
| 16 | 2809-MUM-2011-CLAIMS [19-09-2018(online)].pdf | 2018-09-19 |
| 16 | 2809-MUM-2011-CORRESPONDENCE(23-11-2011).pdf | 2011-11-23 |
| 17 | 2809-MUM-2011-Correspondence to notify the Controller [09-04-2021(online)].pdf | 2021-04-09 |
| 17 | 2809-MUM-2011-FORM 1(23-11-2011).pdf | 2011-11-23 |
| 18 | 2809-MUM-2011-CORRESPONDENCE(12-10-2011).pdf | 2011-10-12 |
| 18 | 2809-MUM-2011-Written submissions and relevant documents [27-04-2021(online)].pdf | 2021-04-27 |
| 19 | 2809-MUM-2011-US(14)-HearingNotice-(HearingDate-12-04-2021).pdf | 2021-10-03 |
| 19 | 2809-MUM-2011-FORM 18(12-10-2011).pdf | 2011-10-12 |
| 1 | 2809mum2011_18-01-2018.pdf |