Migrating Data To And From Hadoop Systems

< Back

Migrating Data To And From Hadoop Systems

Abstract: In one example, a data migration system for migrating data to and from a Hadoop system comprises a connection manager to receive a request, from at least one user, for generating a connector wherein the connector, when executed, establishes a connection with at least one of the Hadoop system and one or more of the data repositories and generate the connector based on the request. The data migration system further comprises a data migration module to establish connection with the Hadoop system and the one or more of the data repositories using the generated connector and migrate the data to the Hadoop system. The data migration system includes a data validation module to generate a data validation file, indicative of validation parameters of the data to be migrated and validate the migrated data, based on the data validation file, after the migration of data to the Hadoop system is complete.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

07 February 2014

Publication Number

46/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2022-06-14

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. K.S., Ramesh Babu

9th Floor Tejomaya Building, Infopark, Kakkanad Kochi

2. K K, Deepak

9th Floor Tejomaya Building, Infopark, Kakkanad Kochi

Specification

FORM 2
THE PATENTS ACT, 1970 (39 of 1970) & THE PATENTS RULES, 2003
COMPLETE SPECIFICATION (See section 10, rule 13) 1. Title of the invention: MIGRATING DATA TO AND FROM HADOOP SYSTEMS
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Indian Nirmal Building, 9th Floor, Nariman
SERVICES LIMITED Point, Mumbai, Maharashtra 400021,
India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.

TECHNICALFIELD
[0001] The present subject matter relates, in general, to data migration and, in
particular, to migration of data to Hadoop systems from various data repositories and from the Hadoop systems to the various data repositories.
BACKGROUND
[0002] The growth of data storage technologies have seen the development of
many heterogeneous data repositories which are in use to store data and which act as
sources of data. Examples of the various types of data repositories, which vary in terms of
format and syntax in which the data is stored, include relational databases, file-systems,
mainframe systems, data warehouses, enterprise data management systems, and so on.
Nowadays, based on the type and volume of data being stored, a single enterprise stores
data in multiple types of data repositories. For example, unstructured data, such as
multimedia data and logs, may be stored in form of file systems, and structured data, such
as records of employees, may be stored in form of relational databases.
[0003] With time, the volume of data stored by the enterprise increases
exponentially. The enterprise usually exploits developments in networking technology to
store data in various types of data repositories which may be located in different
geographic region and which are connected over a network to facilitate retrieval and
analysis of data as and when necessary. However, the speeds at which the conventional
types of data repositories process and analyze data have not been able to match up with the
ever increasing rate at which data is generated and stored by the enterprise. Hence, the
processing and the analyzing of large volumes of data have emerged as a challenge for the
enterprise irrespective of its size and the industry to which it pertains.
[0004] Recently, a data analytics framework, called Hadoop system, has been
widely adopted, for processing large volumes of data, due to its capability of handling large sets of structured data as well as unstructured data. The Hadoop system is an open source framework for distributed processing of massive data sets using a cluster of a plurality of computing nodes. Processing data on the Hadoop system involves transferring or migrating data from the various data repositories to the Hadoop system and transferring or migrating the processed data from the Hadoop system to the various data repositories so that the processed data may be used by various applications. Since, each type of data repository has its own unique format and syntax of storing data, writing queries to extract

data from the data repository and executing the queries, data migration to and from the Hadoop is very challenging.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The detailed description is described with reference to the accompanying
figure(s). In the figure(s), the left-most digit(s) of a reference number identifies the figure
in which the reference number first appears. The same numbers are used throughout the
figure(s) to reference like features and components. Some embodiments of systems and/or
methods in accordance with embodiments of the present subject matter are now described,
by way of example, and with reference to the accompanying figure(s), in which:
[0006] Figure 1 illustrates a network environment implementing the data migration
system for migrating data to and from Hadoop systems, according to an example of the present subject matter;
[0007] Figure 2 illustrates a method for migrating data from a data repository to a
Hadoop system, according to an example of the present subject matter.
[0008] Figure 3 illustrates a method for providing incremental updates of data
from a data repository to a Hadoop system, according to an example of the present subject matter.
DETAILED DESCRIPTION
[0009] In order to utilize the faster processing and analyzing capabilities of
Hadoop system, an enterprise has to migrate data stored in various types of data repositories to the Hadoop system. Post-processing of the data by the Hadoop system, the enterprise has to migrate the processed data back to the data repositories. The conventional technique of migrating data from a data repository to Hadoop involves multiple steps. For example, to migrate data from a conventional data repository which follows the protocol of a relational database, a first set of command or a tool, compatible with the data repository, is executed to export the data from the data repository to a file (also referred to as an export operation). A second tool or a second set of commands, compatible with both the data repository and the Hadoop system, is then executed to convert the format of the file into a Hadoop compatible format. Thereafter, the Hadoop compatible format file is uploaded to the Hadoop system.

[0010] Thus, the conventional techniques of migrating data to the Hadoop system
from the data repository involve exporting the data from the data repository and creating a
dump of a portion of the data repository as a file. This exporting process requires
significant amount temporary storage space for storing the exported file. Also, the
conventional exporting process is inherently slow as the exporting process involves a lot
of read/write operations on the data repository. Further, the storage devices, such as hard
disks and magnetic tapes, which are used to store the data repositories, have limited input-
output (I/O) speeds which further reduce the speed of data migration.
[0011] While transferring the processed data from the Hadoop system to the data
repositories, the aforementioned steps are performed in reverse order, i.e., first the processed data is exported from the Hadoop system to a file. Then the file is processed to convert the file into a format which is compatible with the data repository. Thereafter, the file, in the compatible file format, is uploaded to the data repository (also referred to as an import operation). Thus, the process of migrating data to and from the Hadoop system is time consuming and involves a lot of processing power which is used in converting file formats and executing read/write operations.
[0012] Further, the aforementioned steps are specific to a data repository and
cannot be implemented “as-is” for other data repositories, even if the data repositories are
of the same type. For example MySQL® (My Structured Query Language) and Oracle®
databases are both relational databases. However, the commands used to perform
import/export operations in MySQL® databases differ from the commands used to
perform import/export operations in Oracle® databases. Hence, the tools or set of
commands developed for migrating data from MySQL® databases to the Hadoop system
cannot be used for migrating data from Oracle® databases to the Hadoop system.
[0013] Hence, an enterprise has to invest in developing or purchasing multiple sets
of tools, for data migration to and from the Hadoop system, wherein each set of tools cater to data migration to and from a particular implementation of a particular type of data repository. This leads to very high costs. Further, performing data migration to and from the Hadoop system involves a user having to learn how to operate multiple sets of tools. Since, the usage of the conventional tools for migration of data from the data repositories to the Hadoop system is complicated, there are greater chances of errors occurring in the data migration process. Further, the involvement of users who have high technical skills of expertise and usage of multiple sets of tools for the data migration increases the costs associated with the data migration.

[0014] Further, the aforementioned steps of data migration involves engaging the
user who is technically familiar with both the Hadoop system and the data repository, so that the user can write and execute the set of commands or tools required for the export operation, the file format conversion and the import operation. Hence, the process of migrating data to and from the Hadoop system requires specialized manpower with special technical expertise which is available at a premium and which further increases the costs involved in data migration between the Hadoop system and the various types of data repositories.
[0015] Moreover, the conventional sets of tools, for data migration to and from the
Hadoop system, generally include a command line interface and lack a graphical user interface. In the command line interface, the user has to provide commands as lines of text at a visual prompt or a cursor and execute the commands. To use the command line interface, the user has to know the syntax of the commands and the various features associated with it. This increases the level of expertise of the user who may use the conventional tools for data migration to the Hadoop system. Also, the time take in data migration increases as the user may need to debug the syntax of the command before execution or incase an error interrupts the execution. This makes using the conventional sets of tools very difficult.
[0016] Moreover, the conventional sets of tools do not have any mechanism for
notifying the user of success or any failures/errors in data migration to and from the Hadoop system. Further, the conventional sets of tools do not perform data validation to ascertain error-free data migration to and from the Hadoop system. Thus, there is always a risk of undetected errors in data migration which may force the enterprise to repeat the data migration process from the start.
[0017] Further, most of the conventional tools do not verify whether the user who
has initiated or is performing the data migration has the requisite permissions to do so. If the user, who is performing the data migration, is an unauthorized user, then there are security concerns from the perspective of the enterprise. For example, data which is vital for the enterprise may be stolen or may be corrupted. Thus, the conventional tools have security concerns associated with them.
[0018] Moreover, even if the enterprise invests in multiple sets of tools to migrate
data to the Hadoop system, managing the multiple sets of tools is challenging. The multiple sets of tools have to be regularly updated, trouble shooting operations may have

to be performed and so on. This makes the management and running of the multiple sets of tools difficult.
[0019] In many cases, the conventional sets of tools are closely associated with the
applications running on the data repositories or on the Hadoop system. Thus, any updates
or changes made to either the conventional sets of tools or the applications affect the other.
Moreover, the conventional sets of tools cannot be reused for other applications “as-is”.
Hence, using and maintaining the conventional sets of tools is difficult to manage.
[0020] Further, in many cases, over time, certain new records may be created in
the data repository or certain existing records, in the data repository, may be modified. In order to synchronize the changes made in the data repository, the conventional sets of tools are executed as a scheduled job, i.e., executed at regular intervals of time, for example, on a particular day every week. The conventional sets of tools usually export the whole data repository into the file, process the file and then migrate the processes file onto the Hadoop system. This leads to high usage of storage as well as bandwidth of the network.
[0021] The present subject matter describes systems and methods for migration of
data to Hadoop systems from various data repositories and from the Hadoop systems to
the various data repositories. In one implementation, a data migration system is
implemented for data migration from various types of data repositories to the Hadoop
system and vice-versa. The data migration system acts as a single tool for data migration
to the Hadoop system from various types of data repositories, such as file systems,
relational database management system (RDBMS), mainframe systems, enterprise
databases and enterprise message services, and vice-versa. Thus, the data migration
system eliminates the requirement of having multiple tools for data migration to and from
the various types of data repositories. This reduced to lower costs of operation. The data
migration system includes a web based graphical user interface (GUI) which facilitates
users to easily migrate data to Hadoop systems from various types of data repositories and
vice-versa. Thus, operating the data migration system does not involve having a user with
special technical expertise. This lowers the costs of data migration operations.
[0022] The data migration system also supports various encryption techniques
which help in transferring data securely. Further, the data migration system supports the creation of various groups of users which facilitates implementing restrictions on the rights of the users to create connections with the various data repositories and the Hadoop system in order to prevent data migration operations from being performed by

unauthorized users. In one example, the data migration system implements various techniques of data validation to ensure that the data migration has been completed without any error. The data migration system also facilitates resuming any data migration operation which may have stopped while in progress due to various reasons, such as loss of connectivity with the data repositories and/or the Hadoop system. This eliminates having to restart a data migration operation and facilitates faster data migration to and from the Hadoop System. The data migration system also separates the data migration components from the applications running on the Hadoop system or on the data repositories. This facilitates addition or deletion or modification of the data migration components without affecting the operations or the functionalities of the applications. The data migration system also reduces the number of read/write operations on the data repositories as well as reduces the number of I/O operations to be performed on the storage devices. The data migration system also does not involve temporarily storing the huge export file of the data repositories and hence, eliminates having a large storage space for data migration operations.
[0023] In one implementation, the data migration system may be implemented as
various computing systems, such as servers, workstations and personal computers. The data migration system may also be implemented over other computing environments, such as distributed computing environment, wherein the components of the data migration system are distributed over multiple computing devices which are connected through a communication network.
[0024] In operation, an administrator of the data migration system defines various
groups, such as administrators, business users and operators, of users of the enterprise who have different privileges and rights. In one example, the administrators may be the persons who are in charge of the information technology (IT) infrastructure of the enterprise and the operations may be the persons who perform the day to day operations for maintaining the IT infrastructure and for providing various IT based facilities, such as networking. Similarly, business users may be understood as stakeholders of the enterprise who are responsible for the working of a business unit of the enterprise. For example, the administrators may have the right to create connectors to establish connection between the data repositories and the data migration system and create connectors to establish connection between the Hadoop system and the data migration system. In one implementation, the operators may have the permissions to use the pre-existing connectors created by the administrators to migrate data but may not have privileges to create new

connectors. The business users may have permissions to access and migrate data which pertains to their business unit and may not have privileges to access and migrate data which are of other business units. This facilitates in preventing unauthorized access and migration of data.
[0025] In one implementation, the administrator uses a GUI of the data migration
system to select the type of connector which is to be created. For example, the connector may be for establishing connection with respect to a relational database management system (RDBMS), a file system and so on. Thereafter, the administrator provides the various authentication details, such as username, password, internet protocol (IP) address of the server hosting the data repository and the port number on which the data repository is operating, of the data repository for creation of the connector. The data migration system then verifies whether a connection can be established with the data repository using the provided authentication details provided by the administrator. On successful establishment of connection, the data migration system provides the option of storing the connector for future use. If the data migration system is unable to connect to the data repository, then the data migration system may generate an error message for the administrator and provide the option of re-entering the authentication details to the administrator. The data migration system also facilitates the administrator to delete or modify the details associated with any existing connectors.
[0026] Once, the connectors have been defined, the users of the user groups which
have requisite permissions may perform data migration operations. For the sake of brevity of description, the data migration operation is being explained with the context of a particular type of data repository, namely a RDBMS. However, the same is not to be construed as a limitation, as the described techniques may be applied to other types of data repositories, such as file systems and mainframes, with minor modifications as well be understood by those skilled in the art.
[0027] In one example, the operator with requisite permission may wish to migrate
data from the RDBMS to the Hadoop system. In operation, the operator may select the requisite connector to establish connection with the selected RDBMS which acts as a source and the requisite storage component of the Hadoop system which acts as the target. For example, the storage component may be HBase, which is a non-relational, distributed database, or Hadoop Distributed File System (HDFS) or Hive table of the Hadoop System. In one example, the operator has to select the requisite connector to establish the

connection of the data migration system with the requisite storage component of the Hadoop System.
[0028] Thereafter, the operator may perform various operations to select the data
which is to be migrated. For example, the operator may choose one or more tables of the RDBMS, choose or more columns of the tables, apply filter conditions on the chosen tables and/or columns to select the data which is to be migrated. In another example, the operator may select a view of the RDBMS as the data which is to be migrated. The view may be understood as a resultant table generated as a result of execution on one or more queries on the RDBMS.
[0029] On completion of the selection of the data which is to be migrated, the data
migration system may create a data validation file which may be used to validate the data post-migration. In one example, the data validation file may include various validation parameters, such as the number of records to be migrated, number of files to be migrated, volume of data to be transferred, and number of records which do not comply with the schema of the Hadoop system. In another example, the validation file may include predefined rules based on which migrated data may be validated.
[0030] Thereafter, the data migration system initiates migration of data from the
RDBMS to the Hadoop system. In one example, the data migration system may transfer the data in batches. After the data has been transferred, the data migration system may validate the migrated data based on the data validation file. The data migration system may also generate notifications indicating successful data migration or errors in data migration based on the validation.
[0031] In one example, the operator may define the aforementioned data transfer
from the RDBMS to the Hadoop system as a scheduled job so that the Hadoop system is updated with the changes in the RDBMS. In such cases, the data migration system facilitates carrying out incremental transfer of data, which has been added or been modified since the last transfer, to the Hadoop system which eliminates having to transfer the entire data. This facilitates faster data migration. In order to implement incremental data transfer, the data migration system may maintain a timestamp on the records of the Hadoop system. The timestamp may be understood as a sequence of characters or encoded information which identifies the time of occurrence of an event or an operation, such as creation of a record and modification of an existing record. At the time of transfer, the data migration system selects those records in the RDBMS which were created or modified at a later time than indicated in the timestamp and selects those records for migration. Having

a timestamp also facilitates the data migration system to resume data migration from the RDBMS to the Hadoop system in case of any break in the data migration due to various reasons, such as an interruption in the connectivity between the RDBMS and the data migration system. In this case, the data migration system checks the timestamp of the record which has last been successfully migrated and resumes the data migration based on the same. This facilitates the data migration system not to restart the data migration process and saves time and bandwidth of network.
[0032] Thereafter, the data migration system initiates migration of data from the
RDBMS to the Hadoop system. In one example, the data migration system may transfer the data in batches. After the data has been transferred, the data migration system may validate the migrated data based on the data validation file. The data migration system may also generate notifications indicating successful data migration or errors in data migration based on the validation.
[0033] In one implementation, the data migration system may also provide data
transformation capabilities, to the data transferred to the Hadoop system. That is, the data migration system may allow transformation of data along with data migration. The transformation may include basic transformations, such as arithmetic expressions, string operations, data operations, and based on pre-defined rule, and advance transformations, such as database joins and HDFS joins, Look-ups, and data filtering.
[0034] The manner in which the systems and methods for migration of data to and
from the Hadoop systems may be implemented has been explained in details with respect to the Fig. 1-3. While aspects of described systems and methods can be implemented in any number of different computing systems, transmission environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[0035] Figure 1 illustrates a network environment 100 implementation of the
systems and methods for migration of data to a Hadoop system 102 from various data repositories 104, such as relational databases 104-1, mainframes 104-2, and file systems 104-N, and from the Hadoop system 102 to the various data repositories 104, according to an embodiment of the present subject matter. In one implementation, the Hadoop system 102 may comprise various storage components, such as a Hadoop Distributed File System (HDFS) 106, HBase tables 108, tables of Hive 110, and other components 112. The other components 112 may include the various processing modules of the Hadoop system 102, such as a mapping module (not shown in figure) and a reducing module (not shown in

figure) to implement parallel, distributed processing techniques to process and analyze large volumes of data. Further, since the mapping module and the reducing module may be implemented as mappers and reducers of MapReduce framework of the Hadoop system 102, the mapping module and the reducing module can be understood as nodes of the MapReduce framework implemented in the Hadoop system 102.
[0036] In one implementation, the network environment 100 includes a data
migration system 114 to implement data migration between the Hadoop System 102 and the data repositories 104. In one example, the data migration system 114, the Hadoop System 102, and the data repositories 104 may be communicatively connected through a network 116. Further, the network environment 100 may also include one or more client devices 118-1, 118-2, …, 118-N, individually and commonly referred to as client device(s) 118 hereinafter, connected to the data migration system 114 through the network 116.
[0037] The network 116 may be wireless networks, wired networks, or a
combination thereof. The network 116 can be a combination of individual networks,
interconnected with each other and functioning as a single large network, for example, the
Internet or an intranet. The network 116 may be any public or private network, including a
local area network (LAN), a wide area network (WAN), the Internet, an intranet, a peer to
peer network, and a virtual private network (VPN) and may include a variety of network
devices, such as routers, bridges, servers, computing devices, storage devices, etc.
[0038] The client devices 118 may include computing devices, such as a laptop
computer, a desktop computer, a notebook, a workstation, a mainframe computer, mobile phone, and personal digital assistant. It would be understood that the client devices 118 are used by various categories of users, such as administrators, operators and business users, to configure and operate the data migration system 114.
[0039] According to an example of the present subject matter, the data migration
system 114 may include a processor 120 and memory 122 coupled to the processor 120.
The processors 120 may be implemented as one or more microprocessors,
microcomputers, microcontrollers, digital signal processors, central processing units, state
machines, logic circuitries, and/or any devices that manipulate signals based on
operational instructions. Among other capabilities, the processors 120 may be configured
to fetch and execute computer-readable instructions stored in the memory 122.
[0040] The memory 122 may include any non-transitory computer-readable
medium known in the art including, for example, volatile memory, such as static random

access memory (SRAM), and dynamic random access memory (DRAM), and/or nonvolatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0041] Further, the data migration system 114 may further include module(s) 124
and application data 126. The modules 124 may be communicatively coupled to the processors 120. The modules 124 may include routines, programs, objects, components, data structures, and the like, which perform particular tasks or implement particular abstract data types. The modules 124 may further include modules that supplement applications on the data migration system 114.
[0042] The application data 126 serves, amongst other things, as a repository for
storing data that may be fetched, processed, received, or generated by one or more of the modules 124, respectively. Although the application data 126 is shown internal to the depicted data migration system 114, it may be understood that it can reside in an external repository as well (not shown in figure).
[0043] In one example, the module(s) 124 may include a user management module
128, a connection manager 130, a notification generation module 132, a report generation module 134, a data validation module 136, a job scheduling module 138, a data security module 140, a data migration module 142, and other modules 144. The other module(s) 144 may include programs or coded instructions that supplement applications or functions performed by the data migration system 114. In one implementation, the data migration module 142 includes a file system data migration module 146, a RDBMS data migration module 148, a mainframe data migration module 150, and other data migration modules 152.
[0044] In one implementation, the data migration module 142 is configured to
migrate data from the data repositories 104 to the Hadoop System 102. The sub-modules of the data migration module 142 cater to various types of data repositories 104. For example, the file system data migration module 146 is configured to migrate data to the Hadoop system 102 from the file system 104-N and vice-versa. Similarly, the RDBMS data migration module 148 is configured to migrate data to the Hadoop system 102 from the mainframes 104-2 and vice-versa, and the mainframe data migration module 150 is configured to migrate data to the Hadoop system 102 from the mainframes 104-2 and vice-versa. The other data migration modules 152 may be configured to migrate data to the Hadoop system 102 from the other miscellaneous categories of data repositories, such as enterprise messaging services, and vice-versa.

[0045] In operation, the administrator of the data migration system 114 may use
the user management module 128 to define various groups, such as administrators, business users and operators, of users who have different privileges and rights of configuring and operating the data migration system 114. For example, the administrators may have the permissions to edit any configuration parameter, such as passwords, add or delete users, add or delete or modify user groups, of the data migration system 114. Similarly, the operators may have permissions to merely use the data migration system 114 as configured by the administrators and may have limited permissions to modify a subset of the configuration parameters of the data migration system 114. This facilitates in preventing unauthorized access and migration of data by users who do not have the requisite permissions.
[0046] Thereafter, the administrator may use the connection manager 130 to create
a connector to establish connection between one or more of the data repositories 104 and the data migration system 114. The administrator may also create connectors to establish connection between the storage components of the Hadoop system 102 and the data migration system 114. The connector may be understood as a driver or an underlying script which when executed establishes connection between the data migration system 114 and the Hadoop system 102 or between the data migration system 114 and the data repositories 104. The connectors may be reused whenever any operation requested by any of the users involves establishing a connection between the data migration system 114 and the Hadoop system 102 or between the data migration system 114 and the data repositories 104. After establishment of the connection, data transfer may be done between the data migration system 114 and the Hadoop system 102 or between the data migration system 114 and the data repositories 104.
[0047] In one implementation, the connection manager 130 may generate various
graphical user interfaces (GUIs) to facilitate the administrator to select the type of connector which is to be created. For example, the connector may be for establishing connection with respect to a RDBMS, a file system, and so on. Thereafter, the connection manager 130 prompts the administrator to provide the various authentication details to connect to the data repository 104. For example, to connect to the relational database 104-1, the connection manager 130 may prompt the administrator to provide authentication details, such as username, password, internet protocol (IP) address of the server hosting the relational database 104-1 and the port number on which the relational database 104-1 is operating, for creating of the connector. The connection manager 130 then verifies

whether a connection can be established with the data repository 104 using the authentication details provided by the administrator. If the connection manager 130 is successful in establishing a connection with the data repository 104, then the connection manager 130 provides the administrator with the option of storing the connector for future use. If the connection manager 130 fails to connect with the data repository, then the connection manager 130 may generate an error message for the administrator and provide the administrator with the option of re-entering the authentication details. The connection manager 130 also provides the administrator with the option to delete or modify the details associated with any existing connectors. In a similar manner, as described above, the connection manager 130 may facilitate the administrator to establish connectors for each of the data repositories 104 and each of the storage components of the Hadoop System 102.
[0048] After, the connectors have been established, the users of the user groups,
such as an operator, who have requisite permissions may perform data migration operations. For the sake of brevity of description, the data migration operation is being explained with the context of a particular type of data repository, namely the file system 104-N to the HIVE 110 of the Hadoop system 102. However, the same is not to be construed as a limitation, as the described techniques may be applied to other types of data repositories, such as relational databases and mainframes, with minor modifications as well be understood by those skilled in the art.
[0049] In operation, the operator selects the requisite connectors for establishing
connection with the file system 104-N and the HIVE 110. In one example, the file system data migration module 146 provides the operator with an option to select the requisite connectors from the list of connectors which have been pre-defined using the connection manager 130. Thereafter, the file system data migration module 146 facilitates the operator to provide the path or the address of the file in the file system 104-N which is to be migrated to the HIVE 110.
[0050] Further, the file system data migration module 146 prompts the operator to
enter the delimiter which has been used in the file. The delimiter may be understood as one or more character(s) which is used to specify the boundary between separate, independent fields whenever data is stored in plaintext format. Examples of popular formats of file which use delimiters are a comma-separated values (CSV) file, in which the fields of data are separated using commas (,) as a delimiter, while in a tab-separated values (TSV) file, the fields of the data are separated using tab as a delimiter.

[0051] The file system data migration module 146 may also prompt the operator to
provide the metadata of the file. In one example, the operator may provide the metadata as a CSV file, whereas in another example, the operator may manually provide the metadata using the GUIs generated by the file system data migration module 146. In another implementation of the present subject matter, the data migration system 114 may include the metadata associated with the data in the data repositories 104. For example, for the relational database 104-1, the data migration system 114 may store metadata like, but not limited to, table names, columns, column data types, referential integrities, and constraints. Similarly, for the file systems 104-N, the metadata like file name, field delimiters, column names, and data types may be stored. This will be also helpful in creating or modeling the Hive schema once the data is migrated to the Hadoop ecosystem. Therefore, in situations where the data from the data repositories 104 is transferred to the Hive 110 and a similar data structure is to be created, the metadata stored with the data migration system 114 can be utilized. Further, the metadata information may also be utilized by for future use. For example, if the source database repository 104 is decommissioned after migration of data onto the Hadoop system 102 and is to be modeled in the Hadoop system 102, the metadata stored with the data migration system 114 may be utilized.
[0052] Thereafter, upon obtaining the metadata, the file system data migration
module 146 prompts the operator to provide the details of the target Hive 110 tables. For example, the file system data migration module 146 may generate GUIs to facilitate the user to enter the name of the target tables of the HIVE 110.
[0053] The file system data migration module 146 may also facilitate the operator
to various operations to select the data which is to be migrated. For example, the operator may choose one or more columns of the file, apply filter conditions on the file and/or chosen columns to select the data which is to be migrated. In one example, the operator may define various selection parameters to select the data which is to be transferred. For example, the operator may apply various Boolean operators on the data present in the file to select the data to be transferred. In one example, the data validation module 136 may verify whether the operator has the requisite privileges to access and perform operations on the selected file of the file system 104-N to prevent unauthorized access or operations on the data.
[0054] On completion of the selection of the data which is to be migrated, the data
validation module 136 may create a data validation file which may be used to validate the data post-migration. In one implementation of the present subject matter, validation file

may be created based on predefined rules which may be defined through the GUI interface, by the administrator or the operator. In one example, the data validation file may include various validation parameters, such as the number of records to be migrated, number of files to be migrated, volume of data to be transferred, and number of records which do not comply with the schema of the Hive 110. For example, some records present in the file may include more or less number of columns present in the Hive 110 and hence, cannot be correctly migrated due to possible mismatch of columns. In one example, the operator may use the data security module 140 to encrypt the data which is to be migrated for security reasons. In said example, the data security module 140 may implement various conventionally known encryption techniques, such as symmetric key encryption and public key encryption algorithm, to encrypt the data which is to be migrated. The operator may use the data security module 140 to force transmission of the data, which is to be migrated, over a secure communication protocol instead of a normal communication protocol. For example, the operator may use the data security module 140 to transmit the data from the file system 104-N to the Hive 110 over Secure File Transfer Protocol (SFTP) instead of File Transfer Protocol (FTP). This ensures additional security of the data which is being migrated.
[0055] Thereafter, the file system data migration module 146 initiates migration of
data from the file system 104-N to the Hive 110. In one example, the file system data migration module 146 may transfer the data in batches. After the data has been transferred, the data validation module 136 may validate the migrated data based on the data validation file.
[0056] In one example, the notification generation module 132 generates various
notifications to alert the operator of the progress of data migration. For example, the notification generation module 132 may generate various audio-visual notifications to alert the operator of successful completion of the data migration. In another example, the notification generation module 132 generates notifications to alert the operator of any errors, such as lost connectivity to either the file system 104-N or the Hive 110, which may have occurred during the data migration.
[0057] In one example the operator may use the job scheduling module 138 to
define the aforementioned data transfer from the file system 104-N to the Hive 110 as a scheduled job so that the Hive 110 is updated with the changes in the file in the file system 104-N. Scheduled job may be understood as a job, such as data migration or data validation, which has to be executed at a scheduled time or which may have to be executed

recursively at regular intervals, such as once on a particular day of every week. In these
cases, the file system data migration module 146 carries out the incremental transfer of
data, which has been added or been modified since the last transfer, to the Hive 110 which
eliminates having to transfer the entire data. This facilitates faster data migration.
[0058] In one example, the data sub-modules of the data migration module 142
may include timestamp and details of successful transfers to either perform incremental data transfer or, ensure successful data transfer. For instance, the file system data migration module 146 may associate a timestamp with each file being transferred along with the file details. The file system data migration module 146 may keep track of all successfully migrated file details, such as file name, file size, and number of bytes transferred to monitor successful data transfer. The file system data migration module 146 may keep track of the point at which the file migration failed and may then, based on the file details, restart the file data transfer from the point of failure.
[0059] Similarly, in case of data transfer in RDBMS, a timestamp may be
associated with each record of the Hive 110. At the time of data migration, the RDBMS data migration module 148 may selects those records in the RDBMS system 104-1 which have been created or modified at a later time than indicated in the timestamp and selects those records for migration. Having a timestamp also facilitates the data migration modules 142 to resume data migration from the file system 104-N to the Hive 110 in case of any break in the data migration due to various reasons, such as an interruption in the connectivity between the data repositories 104 and the data migration system 114. In this case, the data migration modules 142 checks the timestamp of the record which has last been successfully migrated and resumes the data migration based on the same. This facilitates the data migration system 114 not to restart the data migration process and saves time and bandwidth of network.
[0060] In one implementation of the present subejct matter, the data migration
system 114 may also provide data transformation capabilities, to the data being transferred, or already transferred to the Hadoop system 102. That is, the data migration system 114 may allow transformation of data along with data migration. The transformation may include basic transformations, such as arithmetic expressions, string operations, data operations, and based on pre-defined rule, and advance transformations, such as database joins and HDFS joins, Look-ups, and data filtering. In one implementation of the present subject matter, the transformation may either occur in real time, or may occur in batches.

[0061] In one implementation, a workflow management module (not shown in the
figure) may provide application GUI interfaces to users, such as operators to create expressions or define transformation rules for basic transformations. The application GUI may provide multiple provisions like, but not limited to, editing, deletion, and updation of the defined transformation rules. In one implementation, the application GUI may also provide provisions to execute expressions on column data migrated to the Hadoop system 102. Further, the application GUI may also provide provisions to execute multiple transformation rules on single row at a time. Furthermore, the application GUI provides provisions to accept record omission rules or error criteria and to write the omitted records to a predefined file-system.
[0062] Further, the data migration system 114 may also provide advanced
transformation functions, such as database joins and HDFS joins. In database joins, the data migration system 114 may transform data where one or more tables are joined to form a new table with selected data set and operator defined conditions. In situations where the database join is undertaken in the batch mode, the data migration system 114 may migrate the data first and thereafter perform the join operation on the HDFS files. The data migration system 114 may allow, joining of multiple tables under same database, creation of conditions and rules for the purpose of join. In one implementation of the present subject matter, the data migration system 114 may allow the operator to apply filter conditions on table level and record level prior to the migration. Therefore, upon performance of the database join transformations, the data migration system 114 may store the data in HDFS 106, or HIVE 110.
[0063] The data migration system 114 may invoke a map-reduce program in the
Hadoop system 102 for handling the join operations after migration. The data migration system 114 may provide the details of the joining transformation, such as filter conditions to the map-reduce program along with the input file path for the purpose of handing the join operations after the migration. In one implementation, the system may also map the column details with corresponding file’s column index position so that the map-reduce can accept the columns from the file to execute the join operations. It would be appreciated that for a file as the source for the transformation, the user may provide metadata associated with the file.
[0064] Further, the data migration system 114 may also allow HDFS joins. Since
for HDFS file system, there is no metadata available to the data migration system 114, metadata may be received from the users, such as the operator through the user interface.

In operation, the metadata migration system 114 may accept multiple file inputs and corresponding metadata for the join operation. Also, the user may provide instructions for the HDFS join by utilizing column details provided as metadata.
[0065] In one implementation of the present subject matter, the data migration
system 114 may also support look-ups and data filters transformation operations. It would be appreciated that look-ups are reference tables from where a main table can replace the corresponding entries from. Therefore, the data migration system 114 may configure input look-up tables and allow defining of rules for look-up transformation. Based on the defined rules, the main table and corresponding columns may be configured by the data migration system 114 and a transformed output as a HDFS file may be generated. Therefore, the transformed output may include replaced data after referring to the configured look-up tables.
[0066] The data migration system 114 may also allow data filtering operation as
transformation of data. In situations of data filtering, the data migration system 114 may allow filtering of source data based on pre-defined rules. For example, if the operator wishes to remove the records having ‘null’ values in a database store 104, these records can be treated as error. Therefore, the data migration system 114 may transfer the error records to a predefined HDFS path and transfer the remaining filtered data to a defined output path received from the operator.
[0067] In one implementation of the present subject matter, the workflow
management module of the data migration system 114 may provide GUI interface to users, such as operators to define a workflow for data migration. Workflow can be understood as a set of jobs that can be executed in a sequence or order. The order of execution of the jobs may either be defined by the users from time to time, or may be pre defined based on requirement for data migration. In one implementation of the present subject matter, the workflow management module may provide workflow management to the users based on which the users may either select jobs and defined priorities for each job, or may drag drop jobs one after another in an order or sequence of their execution.
[0068] For example, in case an operator first wishes to migrate data from the
relational database 104-1 to the HDFS 106 and then wishes to clean the data utilizing a pig script. Further, after cleansing, the user may also wish to perform a basic transformation to the cleaned data. In such a scenario, the operator may drag the following jobs in the defined scenario: firstly the operation of RDBMS TO HDFS, followed by job of PIG CLEANSING and finally applying BASIC TRANSFORMATION. In an example, the

data migration module 142 of the data migration system 114 may keep track of the sequence of jobs. Prior to the execution of the jobs, the data migration module 142 may keep track of the jobs which can be executed in parallel and identify independent sequences where sequential execution can be done.
[0069] In yet another implementation of the present subject matter, the data
migration system 114 may allow entity based data migration with sub-setting and masking from the relational database 104-1 to the Hadoop system 102 and, from the Hadoop system 102 to another Hadoop system 102. An entity can be understood as an abstract representation of data with independent existence. Further, data sub-setting may be understood as a process of creating referentially correct, business intact and cut-down version of databases, generally referred to as subset data. Also, data masking can be understood as replacement of existing sensitive information in test or development environment with information that looks real but is of no use to anyone who might wish to misuse it.
[0070] The data migration system 114 may allow entity level data migration from
the relational database 104-1 with all existing relationships. In one example, the data migration module 142 extracts and stores all relational objects from the relational database 104-1 for further reference. Also the data migration system 114 may allow users to provide entity relationship details through the application GUI.
[0071] In one example, the data migration system 114 includes the report
generation module 134 which may be configured to generate various logs and reports
regarding the operations of the data migration system 114. These logs and reports may
facilitate troubleshooting and auditing of the data migration operations.
[0072] Thus, the data migration system 114 provides a single tool for data
migration to the Hadoop system 102 from various types of the data repositories 104, such
as file systems 104-N, relational databases 104-1, mainframes 104-2, enterprise databases
and enterprise message services, and vice-versa. Thus, the data migration system 114
eliminates the requirement of having multiple tools for data migration to and from the
various types of data repositories. This reduced to lower costs of operation. The data
migration system 114 provides various GUIs which facilitate the users to easily migrate
data to Hadoop systems 102 from various types of data repositories 104 and vice-versa.
[0073] The data migration system 114 also supports various encryption techniques
which help in transferring the data securely. Further, the data migration system 114 supports the creation of various groups of users which facilitates implementing restrictions

on the rights of the users to create connections with the various data repositories 104 and the Hadoop system 102 in order to prevent data migration operations from being performed by unauthorized users. The data migration system 114 also implements various techniques of data validation to ensure that the data migration has been completed without any error. The data migration system 114 also facilitates resuming any data migration operation which may have stopped while in progress due to various reasons, such as loss of connectivity with the data repositories 104 and/or the Hadoop system 102. This eliminates having to restart a data migration operation and facilitates faster data migration to and from the Hadoop System 102.
[0074] Figure 2 illustrates a method 200 for migrating data from a data repository
to a Hadoop system, according to an example of the present subject matter. Figure 3
illustrates a method 300 for providing incremental updates of data from a data repository
to a Hadoop system, according to an example of the present subject matter.
[0075] The methods 200 and 300 may be described in the general context of
computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The methods 200 and 300 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in either a local or a remote computer storage media, including memory storage devices.
[0076] The order in which the methods 200 and 300 is described is not intended to
be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the methods 200 and 300, or alternative methods. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein. Furthermore, the methods 200 and 300 can be implemented in any suitable hardware, software, firmware, or combination thereof.
[0077] With reference to the method 200, as shown in Figure 2, at block 202, a
request for generating a connector, for connecting to a data repository 104, is received. In one example, the connection manager 130 receives the request for generating the connector from the users who have the requisite privileges, such as the administrators.

[0078] As shown in block 204, the authentication details for connecting to the data
repository 104 are received. In one example, the connection manager 130 receives the authentication details for connecting to the data repository 104. The authentication details may include the username to access the data repository 104, password associated with the username, the IP address of the server hosting the data repository 104 and the port number on which the data repository 104 is operating.
[0079] As illustrated in block 206, the connector for connecting to the data
repository 104 is generated. In one example, the connection manager 130 generates the
connector for connecting to the data repository 104. Once generated, the connector may be
reused for establishing connection with the data repository 104 in future.
[0080] As depicted in block 208, selection parameters to select data, from the data
repository, which is to be migrated is received. In one implementation, the data migration module 142 receives the selection parameters from the administrator to select the data which is to be migrated. For example, the user may perform various operations, such as choose particular columns, apply filters on the data and create views, using the data migration module 142 to select the data which is to be migrated.
[0081] At block 210, a data validation file is generated for subsequent validation of
the data which is to be migrated. In one example, the data validation module 136 generates
the data validation file for subsequent validation of the data which is to be migrated. In
one example, the data validation file may include various validation parameters, such as
the number of records to be migrated, number of files to be migrated, volume of data to be
transferred, and number of records which do not comply with the schema of the Hadoop
system 102. In another example of the present subject matter, the data validation file may
include predefined rules based on which migrated data may be validated.
[0082] As shown in block 212, a request, for generating a connector for connecting
to a Hadoop system 102, is received. In one example, the connection manager 130 receives the request for generating the connector from the users who have the requisite privileges, such as the administrators.
[0083] As depicted in block 214, the authentication details, for connecting to the
Hadoop system 102, are received. In one example, the connection manager 130 receives the authentication details for connecting to the Hadoop system 102. The authentication details may include the username to access the Hadoop system 102, password associated with the username, the IP address of the Hadoop system 102 and so on.

[0084] As illustrated in block 216, generate the connector for connecting to the
Hadoop system 102. In one example, the connection manager 130 generates the connector for connecting to the Hadoop system 102. Once generated, the connector may be reused for establishing connection with the Hadoop system 102 in future.
[0085] At block 218, the data is migrated from the data repository 104 to the
Hadoop system 102. In one example, the data migration module 142 transmits the data from the data repository 104 to the Hadoop system 102. In one example, the data migration module 142 transmits the data in batches. Such transmission in batches may allow data to be transferred at different instances, such as some part of the data may be transferred instantly whereas, some part of the data may be transferred at a later stage. However, it would be appreciated that the data transfer may also be transferred in real time based on operator preferences.
[0086] At block 220, the data migrated, to the Hadoop system 102, is validated
based on the data validation file. In one example, the data validation module 136 validates the data based on the data validation file generated earlier.
[0087] As shown in block 222, notification messages, based on the validation, are
generated to notify one of a successful completion of data migration and error(s) in the data migration. In one example, the notification generation module 132 may generate notifications to alert the administrator of any error(s) that may have occurred during the data migration or may alert the administrator of the successful completion of the data migration.
[0088] With reference to the method 300, as shown in Figure 3, at block 302, a
request is received for a scheduled incremental update job. In one example, the operator of the data migration system 114 may use the job scheduling module 138 to define the data transfer from the data repository 104 to the Hadoop system 102 as a scheduled job so that the Hadoop system 102 is updated with the changes in the file in the data repository 104. Herein, the data migration module 142 carries out the incremental transfer of data, which has been added or has been modified since the last transfer, to the Hadoop system 102 which eliminates having to transfer the entire data.
[0089] As shown in block 304, the connectors for connecting to the data repository
and the Hadoop system are loaded. In one implementation, the connection manager 130 loads the connectors for connecting to the data repository 104 and the Hadoop system 102 and establishes connection of the data migration system 114 with the data repository 104 and the Hadoop system 102.

[0090] As illustrated in block 306, a user input indicative of the field to be used as
a base for incremental updates is received. In one example, the operator may use the data
migration module 142 to select the field which is to be used as the base for incremental
updates.
[0091] As depicted in block 308, timestamp present on an existing record in the
Hadoop system is read. In one implementation, the data migration module 142 reads the
timestamp present on an existing record in the Hadoop system 102.
[0092] At shown in block 310, time of one of creation and modification of a record
in the data repository is read. In one example, the data migration module 142 reads the
time of one of creation and modification of a record in the data repository 104.
[0093] At block 312, it is determined whether the time of one of creation and
modification is later than the timestamp of the record present in the Hadoop system 102.
[0094] If at block 312, it is determined that the record present in the data
repository has a later time of creation than the timestamp of the record in the Hadoop
system, then as shown in block 314, the record is selected for updating the Hadoop
system. In one example, the data migration module 142 selects the records of the data
repository 104 which have a later time of creation than the timestamp of the record in the
Hadoop system 102.
[0095] As illustrated in block 316, the Hadoop system is updated based on the
selected record(s). In one example, the data migration module 142 updates the Hadoop
system 102 by migrating the selected records from the data repository 104 to the Hadoop
system 102.
[0096] If at block 312, it is determined that the record present in the data
repository does not has a later time of creation or modification than the timestamp of the
record in the Hadoop system 102, then as shown in block 316, the record present in the
data repository is ignored, i.e., not selected, for the purpose of updating the Hadoop
system 102. In one example, the data migration module 142 ignores the record(s) which
have an earlier time of creation or modification than the timestamp of the record in the
Hadoop system 102.
[0097] Although embodiments for methods and systems for migration of data
between the Hadoop systems and the various data repositories have been described in a
language specific to structural features and/or methods, it is to be understood that the
invention is not necessarily limited to the specific features or methods described. Rather,

the specific features and methods are disclosed as exemplary embodiments for migration of data between the Hadoop systems and the various data repositories.

I/We claim:
1. A data migration system (114), for migration of data to a Hadoop system (102)
from a plurality of types of data repositories (104), the data migration system (114)
comprising:
a processor (120);
a memory (122) coupled to the processor (120);
a connection manager (130), executable by the processor (120), to:
receive a request, from at least one user, for generating a connector wherein the connector, when executed, establishes a connection of the data migration system (114) with at least one of the Hadoop system (102) and one or more of the data repositories (104);
generate the connector based on the request;
a data migration module (142), executable by the processor (120), to:
establish connection with the Hadoop system (102) and the one or more of the data repositories (104) using the generated connector;
migrate data from the one or more of the data repositories (104) to the Hadoop system (102);
a data validation module (136), executable by the processor (120), to:
generate a data validation file, indicative of validation parameters of the data to be migrated; and
validate the migrated data, based on the data validation file, after the migration of data to the Hadoop system (102) is complete.
2. The data migration system (114) as claimed in claim 1, wherein the connection
manager (130) further:
receive authentication details of the Hadoop system (102) and the one or more of the data repositories (104); and
generates the connector based on the received authentication details.
3. The data migration system (114) as claimed in claim 1, wherein the data migration
module (142) further:

receive selection parameters from the at least one user to select the data, from the one or more of the data repositories (104), which is to be migrated to the Hadoop system (102); and
select the data, to be migrated, based on the received selection parameters.
4. The data migration system (114) as claimed in claim 1, wherein the data migration
system (114) further comprises a data security module (140), executable by the processor
(120) to:
verify whether the at least one user has the requisite permissions to access and migrate the selected data; and
terminate the data migration on verifying the at least one user not to have the requisite permissions.
5. The data migration system (114) as claimed in claim 4, wherein the data security
module (140) further :
encrypts the data to be migrated; and
transmits the data to be migrated over a secure communication protocol.
6. The data migration system (114) as claimed in claim 1, wherein the data migration
system (114) further comprises a user management module (128), executable by the
processor (120), to:
create a plurality of user groups, comprising at least one user, wherein each of the plurality of user groups is associated with a set of permissions and privileges; and
modify the set of permissions and privileges associated with at least one of plurality of user groups.
7. The data migration system (114) as claimed in claim 1, wherein the data migration
system (114) further comprises a job scheduling module (138), executable by the
processor (120), to:
facilitate the at least one user to define the migration of the data as a scheduled job; execute the scheduled job as per the schedule to transmit data which has been one of created and modified, since a last migration, to the Hadoop system (102).

8. The data migration system (114) as claimed in claim 7, wherein the data migration
module (142) further:
associates a timestamp with each record in the Hadoop system (102);
on execution of the scheduled job, reads the timestamp associated with each record of the Hadoop system (102);
reads one of the time of creation and modification of a record in the data repository (104);
compares the timestamp with the one of the time of creation and modification of the record to determine whether the record has been one of created and modified since the last migration;
select the record for migration on determining the record to have been one of created and modified since the last migration; and
migrate the record to the Hadoop system (102) to update the Hadoop system (102).
9. The data migration system (114) as claimed in claim 1, wherein the data migration
module (142) further comprises:
a file system data migration module (146) to migrate data to the Hadoop system (102) from a file system (104-N) of the data repositories (104);
a relational database management system (RDBMS) data migration module (148) to migrate data to the Hadoop system (102) from relational databases (104-1) of the data repositories (104); and
a mainframe data migration module (150) to migrate data to the Hadoop system (102) from mainframes (104-2) of the data repositories (104);
10. The data migration system (114) as claimed in claim 1, wherein the data migration module (142) performs data transformation operations, wherein the data transformation operations are one of basic transformations and advanced transformations.
11. The data migration system (114) as claimed in claim 1, wherein the data migration module (142) provides workflow management based on sequence of operations defined by the at least one user.

12. A computer implemented method for migration of data to a Hadoop system (102)
from a plurality of types of data repositories (104), wherein the data migration system
(114) is communicatively coupled to the Hadoop system (102) and the plurality of types of
data repositories (104), the method comprising:
receiving a request, from at least one user, for generating a connector wherein the connector, when executed, establishes a connection of the data migration system (114) with at least one of the Hadoop system (102) and one or more of the data repositories (104);
generating the connector based on the request;
generating a data validation file, indicative of validation parameters of the data to be migrated;
establishing connection of the data migration system (114) with the Hadoop system (102) and the one or more of the data repositories (104) using the generated connector;
migrating selected data to the Hadoop system (102); and
validating the migrated data, based on the data validation file, after the migration of data to the Hadoop system (102) is complete.
13. The computer implemented method as claimed in claim 12, wherein the computer
implemented method further comprises:
receiving selection parameters from the at least one user to select the data, from the one or more of the data repositories (104), which is to be migrated to the Hadoop system (102); and
selecting the data, to be migrated, based on the received selection parameters.
14. The computer implemented method as claimed in claim 12, wherein the computer
implemented method further comprises:
verifying whether the at least one user has the requisite permissions to access and migrate the selected data; and
terminating the data migration on verifying the at least one user not to have the requisite permissions.
15. The computer implemented method as claimed in claim 12, wherein the computer
implemented method further comprises:
encrypting the data to be migrated; and

transmitting the data to be migrated over a secure communication protocol.
16. The computer implemented method as claimed in claim 12, wherein the computer
implemented method further comprises:
creating a plurality of user groups, comprising the at least one user, wherein each of the plurality of user groups is associated with a set of permissions and privileges; and
modifying the set of permissions and privileges associated with at least one of plurality of user groups.
17. The computer implemented method as claimed in claim 12, wherein the computer implemented method further comprises providing entity based data migration with sub-setting and masking.
18. The computer implemented method as claimed in claim 12, wherein the computer implemented method further comprises:
facilitating the at least one user to define the migration of the data as a scheduled job; and
executing the scheduled job as per the schedule to transmit data which has been created or modified, since a last migration, to the Hadoop system (102).

Documents

Application Documents

#	Name	Date
1	457-MUM-2014-IntimationOfGrant14-06-2022.pdf	2022-06-14
1	457-MUM-2014-PETITION UNDER RULE 137(30-09-2014).pdf	2014-09-30
2	457-MUM-2014-FORM 1(30-09-2014).pdf	2014-09-30
2	457-MUM-2014-PatentCertificate14-06-2022.pdf	2022-06-14
3	457-MUM-2014-CORRESPONDENCE(30-09-2014).pdf	2014-09-30
3	457-MUM-2014-CLAIMS [03-01-2020(online)].pdf	2020-01-03
4	spec for filing.pdf	2018-08-11
4	457-MUM-2014-DRAWING [03-01-2020(online)].pdf	2020-01-03
5	FORM 5.pdf	2018-08-11
5	457-MUM-2014-FER_SER_REPLY [03-01-2020(online)].pdf	2020-01-03
6	FORM 3.pdf	2018-08-11
6	457-MUM-2014-OTHERS [03-01-2020(online)].pdf	2020-01-03
7	FIGURE.pdf	2018-08-11
7	457-MUM-2014-FER.pdf	2019-07-05
8	457-MUM-2014-Correspondence-200115.pdf	2018-08-11
8	ABSTRACT1.jpg	2018-08-11
9	457-MUM-2014-Power of Attorney-200115.pdf	2018-08-11
10	ABSTRACT1.jpg	2018-08-11
10	457-MUM-2014-Correspondence-200115.pdf	2018-08-11
11	FIGURE.pdf	2018-08-11
11	457-MUM-2014-FER.pdf	2019-07-05
12	FORM 3.pdf	2018-08-11
12	457-MUM-2014-OTHERS [03-01-2020(online)].pdf	2020-01-03
13	FORM 5.pdf	2018-08-11
13	457-MUM-2014-FER_SER_REPLY [03-01-2020(online)].pdf	2020-01-03
14	spec for filing.pdf	2018-08-11
14	457-MUM-2014-DRAWING [03-01-2020(online)].pdf	2020-01-03
15	457-MUM-2014-CORRESPONDENCE(30-09-2014).pdf	2014-09-30
15	457-MUM-2014-CLAIMS [03-01-2020(online)].pdf	2020-01-03
16	457-MUM-2014-PatentCertificate14-06-2022.pdf	2022-06-14
16	457-MUM-2014-FORM 1(30-09-2014).pdf	2014-09-30
17	457-MUM-2014-PETITION UNDER RULE 137(30-09-2014).pdf	2014-09-30
17	457-MUM-2014-IntimationOfGrant14-06-2022.pdf	2022-06-14

Search Strategy

1	2019-07-0215-03-10_02-07-2019.pdf
1	zhang2013AE_28-11-2020.pdf
2	Shvachko_02-07-2019.pdf
3	2019-07-0215-03-10_02-07-2019.pdf
3	zhang2013AE_28-11-2020.pdf