Abstract: A data synchronization system (102) for data synchronization includes an identification module (112) configured to compare a plurality of columns of an updated schema of a database (108) to identify related columns, based at least on an exact matching technique and a pattern matching technique. The related columns are indicative of columns having at least one common term in column names. Based on the column names, an analysis module (212) is configured to categorize the related columns into one or more groups. The analysis module (212) is further configured to determine a plurality of sub-groups for each of the one or more groups. Further, the analysis module (212) is configured to generate a database report based on the determination of the plurality of sub-groups for each of the one or more groups, for synchronizing data in the database (108).
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: DATA SYNCHRONIZATION
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Indian Nirmal Building, 9th Floor, Nariman
SERVICES LIMITED Point, Mumbai, Maharashtra 400021,
India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.
TECHNICAL FIELD
[0001] The present subject matter relates, in general, to data synchronization, and in
particular, to a system and a method for synchronizing data in at least one schema of a database.
BACKGROUND
[0002] A database typically includes data and relationship of the data with an
application, such as a software application. Further, the data may go through various changes at different stages of the database, such as development, testing, and production. The data may be stored in the database in accordance with a specific format or structure, known as a database structure or a schema. The schema typically includes a large number of tables, where each table includes a plurality of columns and rows. The database may go through changes, such as addition or deletion of data in the schema that may be made by multiple users. For example, in a development stage, multiple developers can perform changes in the software application for enhancing capabilities of the software application leading to change in associated database. In another example, the changes can also be made during different stages of the software application, such as when the software application is in a testing stage, to address any defects identified during testing of the application, a testing team may propose changes in the software application and associated database. The changes in the database can also be associated with the schema of the database.
[0003] Typically, data synchronization systems are used to track and manage the
changes made to the database. The data synchronization systems may be able to track changes made in the schema. Such systems facilitate in tracking the changes made by multiple users in the schema of the database. The schema may get updated when the changes are made in the schema. For example, if any change is made in a column of a table by a developer, the schema gets updated.
SUMMARY
[0004] This summary is provided to introduce concepts related to data synchronization,
which is further described below in the detailed description. This summary is neither intended
to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0005] A data synchronization system for data synchronization includes an
identification module configured to compare a plurality of columns of an updated schema of a database to identify related columns, based at least on an exact matching technique and a pattern matching technique. The related columns are indicative of columns having at least one common term in column names. An analysis module is configured to categorize the related columns into one or more groups based on the column names. The analysis module is further configured to determine a plurality of sub-groups for each of the one or more groups. Each of the plurality of sub-groups is indicative of a change in at least one of a data type and a column type of each of the one or more groups. Further, the analysis module is configured to generate a database report based on the determination of the plurality of sub-groups for each of the one or more groups for synchronizing data in the database.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The detailed description is described with reference to the accompanying figures.
In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[0007] Fig. 1 illustrates a network environment implementing a data synchronization
system, in accordance with an embodiment of the present subject matter.
[0008] Fig. 2a illustrates components of the data synchronization system, in accordance
with an embodiment of the present subject matter.
[0009] Fig. 2b illustrates an exemplary database report generated by the data
synchronization system, in accordance with the embodiment of the present subject matter.
[0010] Fig. 3 illustrates a method for synchronizing data in a database, in accordance
with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0011] Conventionally, various data synchronization systems are available for
synchronizing data in a database. The data may be stored in the database in accordance with a specific format or structure, known as a database structure or a schema. Considering an example of a software application for creating an attendance sheet. Data, such as name, address, and employee ID for the software application may be stored in different columns of at least one table of a schema of a database that may be associated with the software application. In another example, information associated with salaries and account details of employees may be stored in a plurality of columns of another table of the schema. The schema may go through various changes at different stages and in different environments.
[0012] A database environment may refer to different phases or stages in which the
database may be used. Examples of the database environments may include, but are not limited to, a development environment, a testing environment, a staging environment, and a performance environment. In an example, when more than one developer works on a single database, any change made in the schema of the database by one of the developers may affect the work done by other developers. For example, if relationship between any two tables of the schema is modified, the schema of the database gets changed. To avoid this, different copies of the schema may be created which may be provided on local machines of the developers. This may enable a developer to make changes in his copy of the schema without affecting the work done by other developers. Generally, when a change is made in the schema of the database, the schema gets updated. Such updated schema may be generated even for single developer projects/applications where different changes made in the schema come into existence at different points of time.
[0013] At times, while making the changes in the database, some of the developers or
testers may perform changes according to their convenience. For example, a developer may modify the name of a column from "phone_numbers" to "phone-nums" while making the changes in the database. In another example, the data type or column type may also differ for the modified columns in comparison to originally defined columns.
[0014] Further, the conventional data synchronization systems are used for managing
and synchronizing the updated schema of the database. The data synchronization systems synchronize different columns by matching names of the columns. As described above, different developers may provide different names to those columns or tables where they are doing some modifications. Accordingly, the data synchronization systems may not be able to identify such changes made to the schema and may render the database unsynchronized.
[0015] Moreover, relational databases have a foreign key relationship between columns
of table so that the columns must be of same data type. The foreign key relationship typically identifies a column or set of columns in one table that refers to a column or set of columns in another table. Any change in the foreign key relationship may change relationship between different tables of the schema and thereby results in change in the schema. However, for few data types such as “varchar”, the foreign key relationship is not followed and the columns with different length are allowed. Such an arrangement may not be reliable as, when the length of column exceeds a pre-defined length, the relation between the columns may break. For example, there may be various columns of different tables in the schema that may have some common attributes. For example, column names such as names, description, created by, updated by, may be commonly found in different tables of the schema. However, the data type may be different for each of these columns. For instance, a first table may have a column "created by" with a pre-defined data type having a pre-defined length, such as “varchar [20]”. A second table in the schema may have a column with the same name but with data type as “varchar [10]”. Therefore, if a user performs an operation which inserts a row in the second table with “varchar [20]” (more than 10 characters) for the “created by” column, this may result in failure or loss of data (if data truncation mode is set to true) while inserting a record. In these scenarios, it becomes difficult for a database administrator (DBA) to identify the columns which are not in sync in order to maintain the uniformity of the database.
[0016] At various instances, the foreign key relationships may not used in tables, either
due to performance issue or due to clustered databases. However, a software application associated with such a database may be configured to support such a relationship. This leads to mismatch of data type and other properties of the relational columns as there is no restriction on creating/modifying the columns. Furthermore, the conventional systems do not provide any
mechanism to maintain the uniformity in the data type and column type pertaining to related columns, making the system slow, thus, affecting performance of the system.
[0017] In accordance with the present subject matter, a system and a method for
synchronizing data in at least one schema of a database are described herein. A schema may be understood as a database structure comprising of a large number of tables, where each table is associated with a plurality of columns and rows. The schema gets updated when there is any change in the schema. The changes may include addition of columns, deletion of columns, modification of columns, modification of relations between two tables, and the like. In an example, the schema may get updated when there is a change in a column of any of the tables in the schema. Therefore, the updated schema may include all the changes made to at least one schema.
[0018] According to an implementation, the present subject matter includes a database
which stores data related to at least one schema. Further, the database may include information related to all tables, such as table names, sizes and number of rows and columns in each table, different columns and rows within each table, type of data stored in the columns, and the like. The database also comprises of information related to changes made in the at least one schema. As mentioned earlier, the schema may get updated when there is any change in the schema. The updated schema may include all the changes made to the at least one schema. In one implementation, the database can be externally associated with the system. In an alternative embodiment, the database can be present within the system. In the description hereinafter, the data related to the at least one schema is referred to as schema meta-data. In an implementation, the schema meta-data stored in the database may be obtained from the database. For example, MySql provides information_schema that stores the schema meta-data.
[0019] As indicated earlier, the schema may be understood as a database structure
comprising of a large number of tables and each table is associated with a plurality of columns and rows. Generally, when a change is made in the schema of the database, the schema may get updated. The updated schema may include all the changes made to at least one schema. In one implementation, the at least one schema may be created at different environments, such as a development environment, a testing environment, and a performance environment. The schema meta-data may be retrieved whenever synchronization of data in the at least one schema is to be
performed. Further, the data of the at least one schema contained in the database may be updated, whenever required. For example, new data may be added into the database, existing data can be modified, or non-useful data may be deleted from the database.
[0020] The present subject matter may facilitate in retrieving the schema meta-data of
an updated schema of at least one schema of the database. The updated schema may include all the changes made to the at least one schema in the database. Further, the schema meta-data may facilitate in identifying when were the changes made in the at least one schema, what were the changes made, and the like. In an implementation, the schema may get updated by different users belonging to a same environment. Alternatively, the schema may get updated by users belonging to different environments. In an example, a developer creates a column named as "employee-name" that may include names of the employees. Further, another developer updates the "employee-name" column by adding names of new employees and changes the name of the column to "emp-name". In such a scenario, the creation of the column "emp-name" may be understood as the updating of the schema. The updated schema may include both the columns "employee-name" and "emp-name". Although it is possible that the at least one schema may get updated when any type of changes are made to the at least one schema of the database, the description henceforth has been explained with reference to creation and synchronization of data in an updated schema when the changes are made to the columns of one or more tables in the schema.
[0021] Once the schema meta-data is retrieved, the present subject matter facilitates in
identification of related columns. The related columns are indicative of columns having at least one common term in column names. Therefore, the related columns may be understood as columns with similar names or partially similar names. In an implementation, the related columns may be identified by comparing column names of the updated schema of the database. Comparison based on column names is hereinafter referred as exact matching. For example, the columns named as "employee-names" may be matched together in the updated schema through exact matching technique.
[0022] In another implementation, a pattern matching technique may be employed for
identifying the related columns. The pattern matching technique may include matching of columns based on partially similar names. For example, if any part of the column name is
matching with that of another column, then those columns are identified as related columns. For identifying related columns by the pattern matching technique, a plurality of pre-defined patterns may be used. For example, a pre-defined pattern, such as "account_*_id" may be used to identify the columns with column names having at least the term “account_*_id”. Therefore, columns having “account_*_id” in their names will be identified as related columns. Accordingly, columns with names "account_credit_id ", “account_debit_id ", and "account_id" will be identified as related columns.
[0023] Further, the present subject matter facilitates in categorizing the identified related
columns in a manner such that the columns with similar names or similar patterns, i.e., partially similar names are grouped together. Each group of the identified related columns may include at least one sub-group. The at least one sub-group may be determined based on a data type and a column type of the each group. Furthermore, the present subject matter may include generation of a database report based on the grouping and sub-grouping of the related columns. The database report may indicate capacity of the database to hold data. The database report may include information of the grouped columns, such as tables associated with the columns, type of data stored in each column, maximum and minimum value each column can hold, default value of each column, required language type to represent the value, and null able information.
[0024] The database report may then be analyzed by a database administrator (DBA) or
a database associate to identify the columns that are not synchronized with other columns. The columns that are not synchronized may be understood as the columns which are not uniform in terms of data type, column type, and/or column name. For instance, if a group of columns contains more than one sub-group that means, this group of columns contains related columns having different/non-uniform data types or column types. This non-uniformity may cause a fatal error in the system. Therefore, the database associate or the DBA identifies the non-uniform data types of the related columns and may selectively apply relevant changes to the columns to make the columns uniform.
[0025] The present subject matter thus provides an efficient system and method for
management of the data in at least one schema of the database in terms of time and performance. Furthermore, the present subject matter may enable the administrator to identify one or more columns of the updated schema from the database report and identify the columns
which are not synchronized. The administrator may also be able to selectively apply relevant changes to the updated schema of the at least one schema to make the database uniform.
[0026] These and other advantages of the present subject matter would be described in
greater detail in conjunction with the following figures. While aspects of described systems and methods for synchronizing data in at least one schema of a database can be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[0027] Fig. 1 illustrates a network environment 100 implementing a data
synchronization system 102, hereinafter referred as system 102, in accordance with an embodiment of the present subject matter. In one implementation, the network environment 100 can be a public network environment, including thousands of personal computers, laptops, various servers, such as blade servers, and other computing devices. In another implementation, the network environment 100 can be a private network environment with a limited number of computing devices, such as personal computers, servers, laptops, and/or communication devices, such as mobile phones and smart phones.
[0028] The system 102 is communicatively connected to a plurality of user devices 104-
1, 104-2, 104-3...104-N, collectively referred to as user devices 104 and individually referred to as a user device 104, through a network 106. In one implementation, a plurality of users, such as programmers, developers, data architects, software architects, module leaders, projects leaders, database administrator (DBA), database associates, stakeholders, and the like, may use the user devices 104 to communicate with the system 102.
[0029] The system 102 and the user devices 104 may be implemented in a variety of
computing devices, including, servers, a desktop personal computer, a notebook or portable computer, a workstation, a mainframe computer, a laptop and/or communication device, such as mobile phones and smart phones. Further, in one implementation, the system 102 may be a distributed or centralized network system in which different computing devices may host one or more of the hardware or software components of the system 102.
[0030] The system 102 may be connected to the user devices 104 over a network 106
through one or more communication links. The communication links between the system 102
and the user devices 104 are enabled through a desired form of communication, for example, via dial-up modem connections, cable links, digital subscriber lines (DSL), wireless, or satellite links, or any other suitable form of communication.
[0031] The network 106 may be a wireless network, a wired network, or a combination
thereof. The network 106 can also be an individual network or a collection of many such individual networks, interconnected with each other and functioning as a single large network, e.g., the Internet or an intranet. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), etc., to communicate with each other. Further, the network 106 may include network devices, such as network switches, hubs, routers, for providing a link between the system 102 and the user devices 104. The network devices within the network 106 may interact with the system 102, and the user devices 104 through the communication links.
[0032] The network environment 100 further comprises a database 108 that
communicates with the system 102 over the network 106. In another implementation, the database 108 may be connected directly with the system 102. It would be appreciated by those skilled in the art that system 102 and the database 108 may be implemented in the same or different computing devices. The database 108 may store all data inclusive of data associated with at least one schema, hereinafter referred to as schema meta-data. For example, the database 108 may store a schema meta-data 110. The schema meta-data 110 may include, but is not limited to, information of all tables in the at least one schema, such as their names, sizes and number of rows and columns in each table, names of the columns and rows, what tables they are used in, the type of data stored in the columns, and the like.
[0033] The schema meta-data 110 may also include an updated schema of at least one
schema. The updated schema may include all the changes made to the at least one schema. The changes may include addition of columns, deletion of columns, modification of columns, modification of relations between two tables, and the like. For example, when there is a change in a column of any of the tables in the schema, the schema may get updated. It will be
appreciated by a person skilled in the art that the one or more schema may belong to different environments in which the database 108 may exist. Examples of the environments may include, but are not limited to, a development environment, a testing environment, a staging environment, and a performance environment.
[0034] Although, at least one schema gets updated when any type of changes are made
to the corresponding schema of the database 108, the description henceforth has been explained with reference to creation and synchronization of data in an updated schema when the changes are made to the columns of one or more tables in the schema.
[0035] According to an implementation of the present subject matter, the system 102
may include an identification module 112 that may be configured to retrieve the schema metadata 110 from the database 108. Further, the identification module 112 may be configured to identify related columns of one or more tables of a schema based on the schema meta-data 110. The related columns may be understood to include columns with similar names or partially similar names. The related columns may be identified by comparing a plurality of columns in the updated schema. Accordingly, the identification module 112 may be configured to compare columns with similar names in the updated schema. In an implementation, comparison based on similar column names is hereinafter referred to as exact matching. For example, the columns named as "employee-names" may be matched together in the updated schema through exact matching technique.
[0036] In the said implementation, the comparison based on partially similar column
names is hereinafter referred to as pattern matching. For example, if any part of the column name is matching with the name of another column, then those columns are identified as the related columns. For identifying the related columns by the pattern matching technique, a plurality of pre-defined patterns may be used. For instance, a pre-defined pattern, such as "member_id" may be used to identify the columns with column names having the term “member_id”. The columns having “member” in their names will be identified as the related columns. Accordingly, columns with names "member_id", ‘‘to_member_id”, and "from_member_id" will be identified as the related columns.
[0037] Further, the system 102 may facilitate grouping of the identified related columns
in a manner such that the columns with similar names or similar patterns are grouped together. Each group of the identified related columns may include at least one sub-group. The at least one sub-group may be created based on a data type and a column type of the each group by the system 102. Furthermore, the system 102 may be configured to generate a database report based on the grouping of the related columns. The database report may indicate capacity of the database to hold data. The database report may include information of the grouped related columns, such as tables associated with the columns, type of data stored in each column, maximum and minimum value each column can hold, default value of each column, required language type to represent the value, and null able information.
[0038] The database report may then be analyzed by a database administrator (DBA) or
a database associate to identify the columns that are not synchronized with other columns. The columns that are not synchronized may be understood as the columns which are not uniform in terms of data type, column type, and column name. Therefore, the database associate or the DBA identifies the non-uniform data types of the related columns and may selectively apply relevant changes to the columns to make the columns uniform.
[0039] The manner in which the system 102 generates the database report and the
manner in which the database administrator (DBA) or the database associate analyses the database report, to make the columns uniform, are explained in greater detail according to the Fig. 2a.
[0040] Fig. 2a illustrates various components of the data synchronization system 102,
according to an embodiment of the present subject matter.
[0041] In said embodiment, the data synchronization system 102 includes one or more
processor(s) 202, a memory 204 coupled to the processor(s) 202, and interface(s) 206. The processor(s) 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) 202 are configured to fetch and execute computer-readable instructions and data stored in the memory 204.
[0042] The memory 204 may include any computer-readable medium known in the art
including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0043] The interface(s) 206 may include a variety of software and hardware interfaces,
for example, the interface(s) 206 may enable the data synchronization system 102 to communicate over the network 106, and may include one or more interface for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer, etc. Further, the interface(s) 206 may include ports for connecting the data synchronization system 102 with other computing devices, such as web servers and external databases. The interface(s) 206 may facilitate multiple communications within a wide variety of protocols and networks, such as a network, including wired networks, e.g., LAN, cable, etc., and wireless networks, e.g., WLAN, satellite, etc. Further, the data synchronization system 102 also includes module(s) 208 and data 210.
[0044] The module(s) 208 include routines, programs, objects, components, data
structures, etc., which perform particular tasks or implement particular abstract data types. The module(s) 208 further include, in addition to the identification module 112, an analysis module 212, and other module(s) 214. The other module(s) 214 may include programs or coded instructions that supplement applications and functions, for example, programs in the operating system of the user device 104 and the data synchronization system 102
[0045] The data 210 serves, amongst other things, as a repository for storing data
processed, received and generated by one or more of the modules 208. The data 210 includes the schema meta-data 110, a report data 216, and other data 218. The schema meta-data 110 includes data associated with at least one schema of a database 108. For example, the schema meta-data 110 includes information related to all tables in the at least one schema of the database 108, such as their names, sizes and number of rows and columns in each table, names of the columns and rows, what tables they are used in, the type of data stored in the columns, and the like. The schema meta-data 110 also comprises of data associated with an updated schema of the at least one schema. The updated schema may include all the changes made to the
at least one schema in the database 108. As mentioned earlier, the schema may get updated when there is any change in the schema. For example, when changes are made in the two columns of same or different tables, the schema may get updated.
[0046] In one implementation, the at least one schema may be created at different
environments, such as a development environment, a testing environment, and a performance environment. The data of the at least one schema stored in the database 108 may be retrieved whenever synchronization of data in at least one schema is to be performed. The report data 216 includes a database report. The other data 218 includes data generated as a result of the execution of one or more other modules 214.
[0047] In the present embodiment, the schema meta-data 110 and the report data 216 are
depicted to be stored within the data 210, which is a repository internal to the data synchronization system 102. However, as described in the previous embodiment, the schema meta-data 110 may also be stored in the database 108 that is external to the data synchronization system 102.
[0048] According to the present subject matter, the identification module 112 may be
configured to retrieve the schema meta-data 110 for a pre-defined period, for example, past one year, from the database 108. As described previously, the schema meta-data 110 includes data associated with at least one schema of the database 108, i.e., information of all tables in the at least one schema of the database 108, such as their names, sizes and number of rows and columns in each table, names of the columns and rows, what tables they are used in, the type of data stored in the columns, and the like. The schema meta-data 110 also comprises of data associated with an updated schema of the at least one schema. The updated schema may include all the changes made to the at least one schema in the database 108.
[0049] According to an example, the schema meta-data 110 comprising of data of a
schema is depicted in Table 1 (provided below). As indicated earlier, the schema may be understood as a database structure comprising of a large number of tables and each table is associated with a plurality of columns and rows. According to said example, table names, column names, and data type and column type of the columns are considered.
Table 1
TABLE NAME COLUMN NAME DATA TYPE COLUMN TYPE
card card_id varchar varchar[30]
card expiry_date datetime datetime
card status tinyint tinyint[3]unsigned
member member_id int int[10] unsigned
member fname varchar varchar[45]
member lname varchar varchar[80]
member mname varchar varchar[75]
member dob date date
member status tinyint tinyint[3] unsigned
member_card member_id smallint smallint[5]
member_card card_id varchar varchar[20]
member_card status tinyint tinyint[3] unsigned
[0050] As is evident from the above table, some of the columns of different tables are
related to each other. The columns may be related if they have similar names or partially similar names. For example, columns named as “member_id” occur in two tables, i.e., in tables named as “member” and “member_card”.
[0051] Based on the retrieved schema meta-data 110, the identification module 112 may
be configured to identify related columns. The related columns are indicative of columns having at least one common term in column names. The identification module 112 may be configured to compare a plurality of columns of the updated schema of the at least one schema to obtain the related columns. The related columns may include columns with similar names or partially similar names. In an implementation, the identification module 112 may be configured to compare the columns with same name in the updated schema of the at least one schema to identify the related columns. The comparison based on similar column names is hereinafter referred to as exact matching. For example, as shown in table 1, columns named as “member_id” and “card_id” occur in the table named as “member_card”, but have same name as the columns in the tables “member” and “card”, respectively. Therefore, the identification module 112 may be configured to identify the related columns by the name "member_card" in the updated schema, through exact matching technique.
[0052] In an implementation, the identification module 112 may further be configured to
identify the related columns with partially similar names. In the said implementation, the comparison based on the partially similar names is hereinafter referred as pattern matching. For identifying the related columns by the pattern matching technique, a plurality of pre-defined patterns may be used. For example, a pre-defined pattern, such as "name” may be used to identify the columns with column names having the term “name”. Therefore, columns having “name” in their names will be identified as related columns. In an example, as shown in table 1, columns with names "fname", "lname", and "mname" will be identified as related columns. In another example, a pre-defined pattern, such as "member" may be used to identify the columns with column names having the term “member”. Therefore, columns having “member” in their names will be identified as related columns. Accordingly, columns with names "member-name", "member-details", “from_member_id”, "to_member_id", and "member_id" will be identified as the related columns.
[0053] Subsequent to identifying the related columns, the analysis module 212 may be
configured to categorize the related columns into one or more groups. The identified related columns may be grouped in a manner such that the columns with similar names or partially similar names are grouped together. For example, all the columns with name “member_id” or
columns with names having the term “member_id” are grouped together as first group and all the columns with name “card” or columns with names having the term “card” are grouped together as second group. Therefore, columns with names, such as “from_member_id”, "to_member_id", and "member_id" will be grouped together and columns with names having the term "card” will be grouped together.
[0054] In one implementation, the analysis module 212 may further be configured to
determine a plurality of sub-groups for each of the one or more groups of the related columns based on data type and column type of each of the groups. In an example, a group of columns with column name “member_id” may be sub-grouped on the basis of data type, such that the data type may be “smallint”. In said example, the group of columns may also be sub-grouped based on the column type, the column type being “signed smallint”. In another example, a group of columns with column name “card” may be sub-grouped based on the data type and column type, such that the data type and column type are “varchar” and “varchar[30]”, respectively.
[0055] Further, the analysis module 212 may be configured to generate a database report
based on the grouping and sub-grouping of the related columns. The database report may indicate capacity of the database 108 to hold the data. The database report may include information of the grouped columns, such that each grouped column has at least one sub-group. The information of the grouped columns may include tables associated with the columns, type of data stored in each column, maximum and minimum value each column can hold, default value of each column, required language type to represent the value, and null able information. The generated database report is stored as the report data 216 within the data synchronization system 102.
[0056] The database report may then be analyzed by a database administrator (DBA) or
a database associate to identify the columns that are not synchronized with other columns. The columns that are not synchronized may be understood as the columns which are not uniform in terms of data type, column type, and column name. For instance, if a group of columns contains more than one sub group, that means, this group of columns contains related columns having different/non-uniform data types or column types.
[0057] As described earlier in an example, columns with name “member_id”,
“from_member_id”, and “to_member_id” are grouped together and the group is named as “member_id”. In the said example, the group of columns with name “member_id” includes subgroups based on data type and column type. The group includes two sub-groups based on data type, such as, “smallint” and “integer” value. Moreover, the group has three sub-groups based on column type, such as, “signed smallint [5]”, “unsigned smallint [5]”, and “unsigned int [10]”. The database report is indicative that all the columns in the schema are not capable to hold same range of data.
[0058] The database associate or the database administrator (DBA) identifies the non-
uniform data types and column types of the related columns and may selectively apply relevant changes to the columns to make the columns uniform such that all the columns hold same range of data.
[0059] Fig. 2b illustrates an exemplary database report 250 generated by the data
synchronization system 102, in accordance with the embodiment of the present subject matter. As shown in Fig. 2b, the database report 250 contains data related to the database report, such as actual column names, names of the columns after grouping, data type and column type of each column, and required java language type to represent the columns. As depicted in the database report 250, columns with name “member_id”, “from_member_id”, and “to_member_id” are grouped together. The grouped columns are named as “member_id”. Further, it can also be seen from the database report 250, the group of columns with name “member_id” includes subgroups based on data type and column type. The group includes two sub-groups based on data type, i.e., “smallint” and “integer” value. Moreover, the group has three sub-groups based on column type, i.e., “signed smallint [5]”, “unsigned smallint [5]”, and “unsigned int [10]”.
[0060] Fig. 3 illustrates a method 300 for synchronizing data in a database, in
accordance with an embodiment of the present subject matter. The method 300 is implemented in computing device, such as a data synchronization system 102. The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 300 may also be practiced in a distributed computing
environment where functions are performed by remote processing devices that are linked through a communications network.
[0061] The order in which the method 300 is described is not intended to be construed
as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300, or an alternative method. Furthermore, the method 300 can be implemented in any suitable hardware, software, firmware or combination thereof.
[0062] At block 302, the method 300 may include retrieving schema meta-data 110
associated with at least one schema of a database 108. The schema meta-data data 110 may include information related to all tables in the at least one schema of the database 108, such as table names, sizes and number of rows and columns in each table, different columns and rows within each table, type of data stored in the columns, and the like. The schema meta-data 110 also comprises of data associated with an updated schema of the at least one schema.
[0063] At block 304, the method 300 may include identifying related columns from the
schema meta-data 110. The related columns are identified based at least on exact matching technique and pattern matching technique. The related columns may include columns with similar names or partially similar names. Comparison based on similar column names is hereinafter referred as exact matching. Further, comparison based on partially similar column names is hereinafter referred as pattern matching. For example, if any part of the column name is matching with that of another column, then those columns are identified as related columns. In accordance with the present subject matter, the identification module 112 of the data synchronization system 102 is configured to identify the related columns through exact matching technique and pattern matching technique.
[0064] At block 306, the method 300 may include categorizing the related columns into
groups based on the exact matching technique and pattern matching technique. Each group includes at least one sub-group based on data type and column type. In an example, a group of columns with column name “member_id” may be sub-grouped on the basis of data type, such that the data type may be “smallint”. In said example, the group of columns may also be sub-grouped based on the column type, the column type being “signed smallint”. In one implementation, the analysis module 212 is configured to group the related columns.
[0065] At block 308, the method 300 may include generating a database report based on
the grouping of the related columns. The data or the schema meta-data 110 within the database is synchronized based on the database report. The database report may include information of the grouped related columns, such as tables associated with the columns, type of data stored in each column, maximum and minimum value each column can hold, default value of each column, required language type to represent the value, and null able information. The database report may then be analyzed by a database administrator (DBA) or a database associate to identify the columns that are not synchronized with other columns. In one implementation, the analysis module 212 is configured to generate the database report.
[0066] Although embodiments for methods and systems for synchronizing data in at
least one schema of a database have been described in a language specific to structural features and/or methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary embodiments for synchronizing data in at least one schema of a database.
I/We claim:
1. A method for synchronizing data of a database (108), the method comprising:
retrieving schema meta-data (110) of an updated schema of the database (108), wherein the updated schema is indicative of changes made in the database (108);
comparing a plurality of columns of the updated schema to identify related columns based at least on an exact matching technique and a pattern matching technique, wherein the related columns are indicative of columns having at least one common term in column names;
categorizing the related columns into one or more groups based on the column names; and
generating a database report based on the categorization of the related columns for synchronizing the data in the database (108).
2. The method as claimed in claim 1, wherein the categorizing comprises determining a plurality of sub-groups for each of the one or more groups, wherein each of the plurality of sub-groups is indicative of a change in at least one of a data type and a column type of each of the one or more groups.
3. The method as claimed in claim 1, wherein the exact matching technique comprises identifying the columns having same column names.
4. The method as claimed in claim 1, wherein the pattern matching technique comprises identifying the columns based on a pre-defined pattern of the column names.
5. The method as claimed in claim 1, wherein the updated schema is created in one of a development environment, a testing environment, and a performance environment.
6. The method as claimed in claim 1, wherein the at least one sub-group is created based on a data type and a column type of each of the one or more groups.
7. A data synchronization system (102) comprising:
a processor (202);
an identification module (112) coupled to the processor (202), the
identification module (112) configured to compare a plurality of columns of an updated schema of a database (108) to identify related columns based at least on an exact matching technique and a pattern matching technique, wherein the related columns are indicative of columns having at least one common term in column names; and
an analysis module (212) coupled to the processor (202), the analysis module (212) configured to:
categorize the related columns into one or more groups based on the column names,
determine a plurality of sub-groups for each of the one or more groups, wherein each of the plurality of sub-groups is indicative of a change in at least one of a data type and a column type of each of the one or more groups; and
generate a database report based on the determination of the plurality of sub-groups for each of the one or more groups for synchronizing data in the database (108).
8. The data synchronization system (102) as claimed in claim 7, wherein the identification module (112) is further configured to retrieve schema meta-data (110) of an updated schema of the database (108).
9. The data synchronization system (102) as claimed in claim 7, wherein the exact matching technique comprises of identifying the columns having same column names.
10. The data synchronization system (102) as claimed in claim 7, wherein the pattern matching technique comprises of identifying the columns based on a pre-defined pattern of the column names.
11. The data synchronization system (102) as claimed in claim 7, wherein the database report is configured to indicate capacity of the database (108) to hold data.
12. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
retrieving schema meta-data (110) of an updated schema of a database (108), wherein the updated schema is indicative of changes made in the database (108);
comparing a plurality of columns of the updated schema to identify related columns based on an exact matching technique and a pattern matching technique, wherein the related columns are indicative of columns having at least one common term in column names;
categorizing the related columns into one or more groups based on the column names, wherein each of the one or more groups includes at least one sub-group; and
generating a database report based on the categorization of the related columns for synchronizing the data in the database (108).
| # | Name | Date |
|---|---|---|
| 1 | SPEC IN.pdf | 2018-08-11 |
| 2 | FORM 5.pdf | 2018-08-11 |
| 3 | FORM 3.pdf | 2018-08-11 |
| 4 | FIGURES IN.pdf | 2018-08-11 |
| 5 | ABSTRACT1.jpg | 2018-08-11 |
| 6 | 291-MUM-2013-FORM 26(4-3-2013).pdf | 2018-08-11 |
| 7 | 291-MUM-2013-FORM 18.pdf | 2018-08-11 |
| 8 | 291-MUM-2013-FORM 1(13-2-2013).pdf | 2018-08-11 |
| 9 | 291-MUM-2013-FER.pdf | 2018-08-11 |
| 10 | 291-MUM-2013-CORRESPONDENCE(4-3-2013).pdf | 2018-08-11 |
| 11 | 291-MUM-2013-CORRESPONDENCE(13-2-2013).pdf | 2018-08-11 |
| 12 | 291-MUM-2013-OTHERS [04-10-2018(online)].pdf | 2018-10-04 |
| 13 | 291-MUM-2013-FER_SER_REPLY [04-10-2018(online)].pdf | 2018-10-04 |
| 14 | 291-MUM-2013-COMPLETE SPECIFICATION [04-10-2018(online)].pdf | 2018-10-04 |
| 15 | 291-MUM-2013-CLAIMS [04-10-2018(online)].pdf | 2018-10-04 |
| 16 | 291-MUM-2013-US(14)-HearingNotice-(HearingDate-31-07-2020).pdf | 2020-07-08 |
| 17 | 291-MUM-2013-Correspondence to notify the Controller [27-07-2020(online)].pdf | 2020-07-27 |
| 18 | 291-MUM-2013-Written submissions and relevant documents [14-08-2020(online)].pdf | 2020-08-14 |
| 19 | 291-MUM-2013-PatentCertificate09-02-2021.pdf | 2021-02-09 |
| 20 | 291-MUM-2013-IntimationOfGrant09-02-2021.pdf | 2021-02-09 |
| 21 | 291-MUM-2013-RELEVANT DOCUMENTS [27-09-2022(online)].pdf | 2022-09-27 |
| 22 | 291-MUM-2013-RELEVANT DOCUMENTS [26-09-2023(online)].pdf | 2023-09-26 |
| 1 | 291_20-03-2018.pdf |