Method And System For Pre Deployment Performance Estimation Of Input

Method And System For Pre Deployment Performance Estimation Of Input Output Intensive Workloads

Abstract: A method and system is provided for pre-deployment performance estimation of input-output intensive workloads. Particularly, the present application provides a method and system for predicting the performance of input-output intensive distributed enterprise application on multiple storage devices without deploying the application and the complete database in the target environment. The present method comprises of generating the input-output traces of an application on a source system with varying concurrencies; replaying the generated traces from the source system on a target system where application needs to be migrated; gathering performance data in the form of resource utilization, through-put and response time from the target system; extrapolating the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system for higher concurrencies.

Patent Information

Application #

Filing Date

26 November 2015

Publication Number

46/2017

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-02-20

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021, India

Inventors

1. CHAHAL, Dheeraj

Tata Consultancy Services, Akruti Business Port, Road No. 13, MIDC, Andheri East Mumbai - 400093, Maharashtra India

2. VIRK, Rupinder Singh

Tata Consultancy Services Akruti Business Port, Road No. 13, MIDC, Andheri East Mumbai - 400093, Maharashtra India

3. NAMBIAR, Manoj Karunakaran

Tata Consultancy Services, Akruti Business Port, Road No. 13, MIDC, Andheri East Mumbai - 400093, Maharashtra India

Specification

DESC:
FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR PRE-DEPLOYMENT PERFORMANCE ESTIMATION OF INPUT-OUTPUT INTENSIVE WORKLOADS

Applicant:
Tata Consultancy Services Limited
A Company incorporated in India under the Companies Act, 1956
having address:
Nirmal Building, 9th floor, Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY

[001] The present application claims priority from Indian Provisional Patent Application No. 4445/MUM/2015, filed on November 26, 2015, the entirety of which is hereby incorporated by reference.

FIELD OF THE INVENTION

[002] The present invention generally relates to predicting the performance of distributed enterprise application. More particularly, the present invention relates to predicting the performance of input-output intensive distributed enterprise application on multiple storage devices without deploying the application and the complete database in the target environment.

BACKGROUND OF THE INVENTION

[003] The most efficient method to find the performance of an application on a storage system is evaluating the application by running on a platform of interest. However, migrating the application to a new environment and testing for the performance is a non-trivial and extremely daunting task. It requires a lot of effort to set it up and subsequently fine tune it. Another close approach is running the synthetic workload generated by input-output subsystem and characterization tools. The synthetic workloads have access pattern very similar to that of the applications. Though this approach is relatively easier but, in most cases, does not reproduce the characteristics of the applications or the workload accurately.

[004] The use of trace for input-output profiling has been there for many years now but its usage has found traction in recent years for predicting the performance in a cloud based environment

[005] Prior illustrates creating an artificial workload to predict the performance of input-output intensive distributed enterprise applications, but the most significant issue with this approach is that the actual application workload is not run or tested in the target environment. Even though the approach does not require replicating the database on target architecture, but still it remains an extremely complicated procedure.

[006] Further, prior art also illustrates predicting the performance of web application on cloud based environment, but even though such prior art is capable of predicting the end-to-end performance of multiple resources with high accuracy, it remains a huge challenge for them to hold good for high concurrency i.e large number of users.

[007] Thereby, predicting the performance of input-output intensive distributed enterprise application on multiple storage devices without deploying the application and the complete database in the target environment is still considered as one of the biggest challenges of the technical domain.

SUMMARY OF THE INVENTION
[008] Before the present methods, systems, and hardware enablement are described, it is to be understood that this invention is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments of the present invention which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims.

[009] The present application provides a method and system for pre-deployment performance estimation of input-output intensive workloads.

[0010] In an embodiment of the present invention, a method and system is provided for pre-deployment performance prediction of input-output intensive enterprise application on computing environment with advanced storage devices where application needs to be migrated.

[0011] In another embodiment of the present invention, a computer implemented method is provided for predicting performance of input-output intensive enterprise application on target systems connected to advanced storage devices. The present method comprises of generating the input-output traces of an application on a source system with varying concurrencies; replaying the generated traces from the source system on a target system where application needs to be migrated; gathering performance data in the form of resource utilization, through-put and response time from the target system; extrapolating the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system for higher concurrencies.

[0012] In another embodiment of the present invention, a system (200) for predicting performance of input-output intensive enterprise application on a target system is provided. The system (200) has three major components: an I/O trace capture module (202), an I/O trace replay module (204) and an extrapolation module (206). The I/O trace capture module (202) generates the input-output traces of an application on a source system with varying concurrencies. The I/O trace replay module (204) replays the generated traces on a target system and gathers performance data in the form of resource utilization, through-put and response time from the target system. The extrapolation module (206) extrapolates the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system for higher concurrencies

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary constructions of the invention; however, the invention is not limited to the specific methods and system disclosed. In the drawings:

[0014] Fig. 1: illustrates a network implementation of a system for pre-deployment performance estimation of input-output intensive workloads, in accordance with an embodiment of the present subject matter;

[0015] Fig. 2: shows block diagrams illustrating the system for pre-deployment performance estimation of input-output intensive workloads, in accordance with an embodiment of the present subject matter;

[0016] Fig. 3: shows a flowchart illustrating the method for pre-deployment performance estimation of input-output intensive workloads, in accordance with an embodiment of the present subject matter;
[0017] Fig. 4: shows a graph illustrating application vs trace disk utilization extrapolated on the target system for JPetStore application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0018] Fig 5: shows a graph illustrating application and trace throughput extrapolated on the target system and application and trace response time extrapolated on the target system for JPetStore application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0019] Fig. 6: shows a graph illustrating application vs trace data transfer via read and write system calls for JPetStore application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0020] Fig. 7: shows a graph illustrating application vs trace disk utilization extrapolated on the target system for equiz application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0021] Fig. 8: shows a graph illustrating application and trace throughput extrapolated on the target system and application and trace response time extrapolated on the target system for equiz application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0022] Fig 9: shows a graph illustrating application vs trace data transfer via read and write system calls for equiz application during low end HDD to high end HDD migration in accordance with an exemplary embodiment of the present subject matter;

[0023] Fig. 10: shows a graph illustrating application vs trace disk utilization extrapolated on the target system for JPetStore application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter;

[0024] Fig 11: shows a graph illustrating application and trace throughput extrapolated on the target system and application and trace response time extrapolated on the target system for JPetStore application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter;

[0025] Fig. 12: shows a graph illustrating Application vs trace data transfer via read and write system calls for JPetStore application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter;

[0026] Fig. 13: shows a graph illustrating application vs trace disk utilization extrapolated on the target system for equiz application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter;

[0027] Fig 14: shows a graph illustrating Application and trace throughput extrapolated on the target system and Application and trace response time extrapolated on the target system for equiz application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter; and

[0028] Fig 15: shows a graph illustrating Application vs trace data transfer via read and write system calls for equiz application during low end HDD to SDD migration in accordance with an exemplary embodiment of the present subject matter.

DETAILED DESCRIPTION OF THE INVENTION

[0029] Some embodiments of this invention, illustrating all its features, will now be discussed in detail.

[0030] The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.

[0031] It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.

[0032] The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms.

[0033] The elements illustrated in the Figures inter-operate as explained in more detail below. Before setting forth the detailed explanation, however, it is noted that all of the discussion below, regardless of the particular implementation being described, is exemplary in nature, rather than limiting. For example, although selected aspects, features, or components of the implementations are depicted as being stored in memories, all or part of the systems and methods consistent with the attrition warning system and method may be stored on, distributed across, or read from other machine-readable media.

[0034] The techniques described above may be implemented in one or more computer programs executing on (or executable by) a programmable computer including any combination of any number of the following: a processor, a storage medium readable and/or writable by the processor (including, for example, volatile and non-volatile memory and/or storage elements), plurality of input units, and plurality of output devices. Program code may be applied to input entered using any of the plurality of input units to perform the functions described and to generate an output displayed upon any of the plurality of output devices.

[0035] Each computer program within the scope of the claims below may be implemented in any programming language, such as assembly language, machine language, a high-level procedural programming language, or an object-oriented programming language. The programming language may, for example, be a compiled or interpreted programming language. Each such computer program may be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.

[0036] Method steps of the invention may be performed by one or more computer processors executing a program tangibly embodied on a computer-readable medium to perform functions of the invention by operating on input and generating output. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, the processor receives (reads) instructions and data from a memory (such as a read-only memory and/or a random access memory) and writes (stores) instructions and data to the memory. Storage devices suitable for tangibly embodying computer program instructions and data include, for example, all forms of non-volatile memory, such as semiconductor memory devices, including EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROMs. Any of the foregoing may be supplemented by, or incorporated in, specially-designed ASICs (application-specific integrated circuits) or FPGAs (Field-Programmable Gate Arrays). A computer can generally also receive (read) programs and data from, and write (store) programs and data to, a non-transitory computer-readable storage medium such as an internal disk (not shown) or a removable disk.

[0037] Any data disclosed herein may be implemented, for example, in one or more data structures tangibly stored on a non-transitory computer-readable medium. Embodiments of the invention may store such data in such data structure(s) and read such data from such data structure(s).

[0038] The present application provides a computer implemented method and system for pre-deployment performance estimation of input-output intensive workloads.

[0039] The present application provides a computer implemented method and system for pre-deployment performance estimation of input-output intensive workloads. Referring now to Fig. 1, a network implementation 100 of a system 102 for pre-deployment performance estimation of input-output intensive workloads is illustrated, in accordance with an embodiment of the present subject matter. Although the present subject matter is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. In one implementation, the system 102 may be implemented in a cloud-based environment. In another embodiment, it may be implemented as custom built hardware designed to efficiently perform the invention disclosed. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user devices 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.

[0040] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.

[0041] In one embodiment the present invention, referring to Fig. 2, describes a detailed working of the various components of the system 102.

[0042] A system (102) for re-deployment performance estimation of input-output intensive workloads; comprising a processor (202), a memory (204), operatively coupled with said processor. In an aspect In an embodiment of the present disclosure, a system (200) has three major components: an I/O trace capture module (210), an I/O trace replay module (212) and an extrapolation module (214). The I/O trace capture module (210) generates the input-output traces of an application on a source system with varying concurrencies. The I/O trace replay module (212) replays the generated traces on a target system and gathers performance data in the form of resource utilization, through-put and response time from the target system. The extrapolation module (214) extrapolates the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system.

[0043] In an embodiment of the invention, the I/O trace capture module (210) generates the traces in the user mode. There are multiple ways and tools to trace the I/O calls of an applications depending upon the layer they operate on e.g. kernel, user space or a combination of both. The user mode requires no modification in the application or the kernel. The input-output profile trace of AOI is captured using the strace utility in linux system. In order to reduce the strace overhead and the size of trace file, only input-output related calls are captured and the following I/O system calls are captured: read(), write(), pread(), pwrite(), lseek(), fsync(), open(), close(). Each row in the captured trace consists of process ID, timestamp value, offset and the input-output system call. In order to capture the trace, all the thread IDs are first found that are spawned by MySQL and then strace is attached to each of these IDs. Thus multiple trace output files are created. In order to maintain the same order of the execution on the target system, all these files are merged in to a single file and then the output system calls are sorted according to the timestamp value.

[0044] In another embodiment of the invention, the I/O trace replay module (212) copies the database files of the application to a temporary directory on the target system.. The access path to any file in the trace file is replaced with the path to that file in the temporary directory. Commonly used input-output tracing tools are used to replay the input-output trace captured on the test system. The replay executes the input-output operations on the target system as recorded in the trace on the test system,. Studies have shown such commonly used tools scale within a difference of few percentage points when compared with the original application. One of the drawback with the tools is that they are single threaded. Hence replaying the trace for high concurrency is a challenge. The features of the tools are modified to support multithreading. One of the challenges associated with the trace-replay method is maintaining the realism of the workload when load-profile is replayed on a target system. The input-output system calls are captured along with their timestamp. When the trace is replayed on the target system it is ensured that input-output calls are executed at the same time interval as in the original system so that workload is replicated correctly.

[0045] In another embodiment of the invention, the extrapolation module (214) uses a specific method for extrapolating the performance of an application for a large number of users on a given platform. The method, at first, takes load testing results as input from for a small number of users in terms of throughput and resource utilization. Though the input-output trace is replayed by the I/O trace replay module (212) and disk utilization of storage system is captured only on the database server, the extrapolation module (214) predicts the end-to-end response time and throughput of the application. In order to extrapolate throughput, the maximum throughput based on the resource utilization information and service demand is first estimated. A combination of linear regression and another statistical technique called sigmoid curve (or S curve) are used to predict the performance until the application encounters the first bottleneck. Linear regression is used to predict the performance until throughput reaches the half of the maximum throughput and beyond that point sigmoid curve is fit in till the throughput reaches 90% of maximum value. Next, the extrapolation module (214) uses a black-box technique that does not require detailed modeling of the application functionalities nor does it require any architectural simulation for the target system. At least two measurements are performed on the target platform and using the performance statistics it extrapolates the throughput, response time and maximum number of users supported by the systems. Then, the bottleneck resources are pinpointed as well. Furthermore, resource utilization information at various servers in a system (for example, application server and database server) is projected. However, the extrapolation module (214) makes an assumptions that there are no software bottleneck in the application of interest. For extrapolation, user performance data obtained by running traces for two concurrency levels on the target system is sufficient. To improve the prediction accuracy, test system traces are run on the target system for multiple concurrencies as discussed in previous section. The resource utilization for these multiple concurrencies is used as input and extrapolated for higher concurrencies to obtain performance metrics like resource utilization, throughput and response time.

[0046] Referring now to Fig. 3 a flow chart illustrating the method for pre-deployment performance estimation of input-output intensive workloads is shown.

[0047] The process starts at step 302, the input-output traces of an application on a source system with varying concurrencies is generated. At step 304, the generated traces from the source system are replayed on a target system. At step 306, performance data in the form of resource utilization, through-put and response time are gathered from the target system. At step 308, the data gathered from the target system are extrapolated in order to accurately the performance of multi-threaded input-output intensive applications in the target system.

[0048] The following paragraphs contain experimental data which is intended to help a person skilled in the art understand the working of the invention. The experimental data is not to be construed as limiting the scope of the invention which is limited only by the claims.

[0049] In order to perform the experiment two applications equiz and JPet-Store are used. Application equiz provides an automated web-enabled technology platform to assess and verify technical skills of the people in a large software company. The application is implemented with java servlets, stored procedures and includes an automatic code evaluation (ACE) framework. JPetStore is an eCommerce J2EE application benchmark which allows users to browse and search for different types of pets in five top level categories. It provides detailed information on prices, inventory and images for all items within each category. Along with login authentication it provides full shopping cart facility that includes credit card option for billing and shipping.

[0050] While performing the experiment all test applications are deployed on apache tomcat server and MySQL is used as backend. The think time between the application pages is fixed at 5 sec. Each application is run for a 20 minutes durations. The storage system configurations that are used are provided in table 1 below. A low-end HDD as test storage system and high-end HDD or SSD as target storage system.

Storage Type Disk Model RPM No. of Disk IQ Scheduler File System Interface System Configuration Linux kernel
Low End HDD Caviar SE Serial ATA Drive 7200 1 CFQ ext4 300Mb/s Serial ATA 2.0 8 Core Xeon CPU@ 2.6Ghz, 6MB L2cache Cent OS 6.5, 2.6.32
High End HDD HP 10000 1 CFQ ext4 Dual Port SAS, 6GB/s 16 Core Xeon CPU@ 2.4Ghz, 12MB L2cache Cent OS 6.6, 2.6.32
SDD Virident Systems Inc.
FlashMax drive Micron slc-32 - 1PCIe Default ext3 - 16 Core Xeon CPU@ 2.4Ghz, 12MB L2cache Cent OS 6.6, 2.6.32
Table 1

[0051] The IO trace of applications on the database was captured for multiple concurrencies using widely available linux utility strace. The same trace files are run on the target systems using iorepaly. The IOreplay is run exact mode so that the trace is executed on the target system in the same time as on the source system. This mode of IOreplay preserves the thinktime in the original application.

[0052] The disk utilization and throughput is measured using iostat utililty for all the concurrencies. The disk utlilization and throughput data point for all the concurrencies are fed into PerfExt which is a tool used in the experiment for the extrapolation purpose. The extrapolated values are compared with the actual performance data. Percentage error in metric prediction is calculated using equation (1)

……….. (1)

[0053] PerfExt, is a tool used used for extrapolation in the instant experiments which developed by the inventors for extrapolating the performance of an application for a large number of users on a given platform, however while implementing the method and system disclosed herein any tool capable of extrapolation as described in this specification may be used instead of PerfExt. Working of the PerfExt tool is explained in the following paragraphs.

[0054] PerfExt takes load testing results as input from for a small number of users in terms of throughput and resource utilization. Though the IO trace is replaced and disk utilization of storage system is captured only on the database server but PerfExt predicts the end-to-end response time and throughput of the application.

[0055] To extrapolate throughput, PerfExt first estimates the maximum throughput based on the resource utilization information. Linear regression is used to predict the performance util throughput reaches the half of the maximum throughput and beyond that point sigmoid curve is fit in till the throughput reaches 90% of maximum value. PerfExt uses a combination of linear regression and another statistical technique called sigmoid curve (or S curve) to predict the performance until the application encounters the first bottleneck. PerfExt uses a black-box technique that does not require detailed modeling of the application functionalities nor does it require any architectural simulation for the target system.

[0056] A single user test is performed on the target platform and using the performance statistics PerfExt extrapolates the throughput, response time and maximum number of users supported by the systems. PerfExt pinpoints the bottleneck resource as well. Furthermore, PerfExt projects resource utilization information at various servers in a system (for example, application server and database server).

[0057] PerfExt is able to provide accuracy of about 90% in the throughput and utilization metrics. However, it makes an assumption that there in no software bottleneck in the application of interest.

[0058] To perform extrapolation using PerfExt, user performance data obtained by running traces for two concurrency levels on the target system is sufficient. To improve the prediction accuracy, test system traces were run on the target system for multiple concurrencies as discussed in previous section. The resource utilization for these multiple concurrencies is used in the PerfExt as input and extrapolated for higher concurrencies to obtain performance metrics like resource utilization, throughput and response time.

[0059] For the purpose of the experiment the disclosed invention was evaluated using two applications namely JpetStore and equiz. The performance of these applications was predicted on a high end HDD and SSD storage systems using the trace generated on low-end HDD.

[0060] Experiment 1: Low-end HDD to High-end HDD migration prediction: JPetStore application was run on a test system for 50, 100, 200, 300, 400 and 500 users and IO trace files were generated. The trace files are played on the target system. As shown in Figure 2a, disk utillization obtained upto 500 users using trace from test system is similar to the actual application when run on the target system. The disk utilization is further extrapolated using PerfExt beyond 500 users. As shown in Fig 4, both PerfExt and actual application run show disk saturation for around 2500 users. Average error in predicting the disk utilization for extrapolated values is less than 15%.

[0061] Also, throughput measured in pages/s is similar for trace replay when compared with the corresponding application data upto 500 users as shown in Fig 5. The extrapolated throughput beyond 500 is comparable with actual application. Fig 5 also illustrates, the predicted response time is comparable to the actual application up to 1500 users. At 2000 users, predicted response time is 1.01ms and similar response time is observed with actual application for 2200 users. This difference in response time at predicted maximum achievable concurrency of 2000 is due to early bottleneck detection by PerfExt.

[0062] For validation of trace replay, we compared the data exchanged by the application trace and the actual application. As shown in the Fig. 6, the total data read and written is similar for trace and the actual application for tested number of users.

[0063] The Equiz application was run for 50, 100 and 150 users on the test system and the IO trace was captured. The trace files are replayed on the target system and performance data is collected. As shown in the Figure 7, the disk utilization of trace replay on the target system and corresponding application run is similar. When utilization is extrapolated beyond 200 users, the disk bottleneck is predicted at around 750 users by the extrapolation tool while it is observed at 800 users when actual application is run. The average error in utilization prediction is less than 7% for the extrapolation values.

[0064] The throughput difference in trace replay and actual application as illustrated in Fig 8, is insignificant. The extrapolated thoughput shows a good match upto 600 users. Also, Fig 8 shows that the predicted response time is same as observed with actual application run by varying workload. Some difference is observed in throughput and response only at maximum achievable concurrency i.e. 700

[0065] As show in Figure 9, the total data exchanged on the target system by the trace replay and the actual applications is comparable irrespective of the concurrency.

[0066] Experiment 2: Low-end HDD to SSD migration prediction: The The trace generated for two applications i.e. JPetStore and equiz on low-end HDD was also run on SSD and then extrapolated for higher concurrencies using our extrapolation tool. For JPetStore application, as shown in the Fig. 10, the predicted disk utilization is similar to actual utilization up to 1500 users. At maximum 15000 users, predicted utilization is 31.8% against the actual utilization of 24.8%. We observed memory bottleneck at this workload. Also, maximum throughput observed by actual application is 2696 pages/s and comparable to the predicted value of 2994 pages/s as shown in Fig 11. As shown in Fig 11 a good match in predicted and actual response time up to 5000 users is observed and the difference widens with increasing workload. This increase in % error can be attributed to the very small values of response time that results in higher experimental errors. Also, the total data read/s and write/s for trace is similar to the actual data read/s and write/s as illustrated in Fig 12.

[0067] For equiz application, the maximum disk utilization obtained by running actual application on the SSD is 46.26% for 2500 users while trace extrapolated value is 55.27% as shown in Fig 13. Equiz application leads to a CPU bottleneck at 2500 users restricting further extrapolation. A negligible difference in throughput is observed between actual and extrapolated values particularly for smaller concurrency. Maximum difference is observed at maximum achievable concurrency of 2000 users with trace extrapolation as shown in Fig 14. Though response time is predicted correctly for smaller concurrencies but a very large error in prediction is observed at very high concurrency as shown in Fig. 14. Also, we observed the same amount of data being read and written to the storage device by actual application and trace as shown in Fig. 15.

[0068] It should be noted that the description merely illustrates the principles of the present subject matter. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present subject matter and are included within its spirit and scope. Furthermore, all examples recited herein are principally intended expressly to be only for explanatory purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

,CLAIMS:
1. A system (102) for pre-deployment performance estimation of input-output intensive workloads; comprising a processor (202), a memory (204), operatively coupled with said processor, the system comprising:
an I/O trace capture module (210) configured to generate the input-output traces of an application on a source system with varying concurrencies;
an I/O trace replay module (212) configured to replay the generated traces from the source system on a target system wherein the target system is to which the application needs to be migrated;
the I/O trace replay module (212) further configured to collect performance data in the form of resource utilization, through-put time and response time from the target system; and
an Extrapolation module (214) configured to extrapolate the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive application in the target system.

2. The system according to claim 1 wherein the I/O trace capture module (210) is further configured to generate the input-output traces in a user mode wherein the user mode is a mode which requires no modification in the application or kernel.

3. The system according to claim 2 wherein the I/O trace capture module (210) is configured to
create a multiple trace output files;
merge the multiple trace output files into a single file; and
sort a system call corresponding to each of the merged multiple trace output files based on a timestamp value associated with the system call.

4. The system according to claim 1 wherein the I/O trace replay module (212) is configured to
copy the database files of the application to a temporary directory on the target system;
replace any access to the database file in the trace by path to the temporary directory; and
replay the generated traces from the source system on the target system wherein the replays are executed at the same time interval as in the source system.

5. The system according to claim 1 wherein the Extrapolation module (214) is further configured to predict performance of multi-threaded input-output intensive application by implementing linear regression, sigmoid curve and black box techniques on the data gathered from the target system.

6. A method for pre-deployment performance estimation of input-output intensive workloads; said method comprising processor implemented steps of:
generating the input-output traces of an application on a source system with varying concurrencies using an I/O trace capture module (210);
replaying the generated traces from the source system on a target system wherein the target system is to which the application needs to be migrated using an I/O trace replay module (212);
collecting performance data in the form of resource utilization, through-put time and response time from the target system using the I/O trace replay module (212); and
extrapolating the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive application in the target system using an Extrapolation module (214).

7. The method according to claim 6 wherein the input-output traces are generated in a user mode using the I/O trace capture module (210), wherein the user mode is a mode which requires no modification in the application or kernel.

8. The method according to claim 6 further comprising:
copying the database files of the application to a temporary directory on the target system;
replacing any access to the database file in the trace by path to the temporary directory; and
replaying the generated traces from the source system on the target system wherein the replays are executed at the same time interval as in the source system.

9. The method according to claim 6 wherein performance of multi-threaded input-output intensive application is predicted by implementing linear regression, sigmoid curve and black box techniques on the data gathered from the target system using the Extrapolation module (214).

Documents

Application Documents

#	Name	Date
1	Form 3 [26-11-2015(online)].pdf	2015-11-26
2	Drawing [26-11-2015(online)].pdf	2015-11-26
3	Description(Provisional) [26-11-2015(online)].pdf	2015-11-26
4	Form 18 [24-11-2016(online)].pdf	2016-11-24
5	Drawing [24-11-2016(online)].pdf	2016-11-24
6	Description(Complete) [24-11-2016(online)].pdf_119.pdf	2016-11-24
7	Description(Complete) [24-11-2016(online)].pdf	2016-11-24
8	Assignment [24-11-2016(online)].pdf	2016-11-24
9	Form 3 [28-11-2016(online)].pdf	2016-11-28
10	REQUEST FOR CERTIFIED COPY [29-11-2016(online)].pdf	2016-11-29
11	Form 3 [08-03-2017(online)].pdf	2017-03-08
12	Form-2(Online).pdf	2018-08-11
13	Form-18(Online).pdf	2018-08-11
14	ABSTRACT1.jpg	2018-08-11
15	4445-MUM-2015-Power of Attorney-220316.pdf	2018-08-11
16	4445-MUM-2015-Form 1-060516.pdf	2018-08-11
17	4445-MUM-2015-Correspondence-220316.pdf	2018-08-11
18	4445-MUM-2015-Correspondence-060516.pdf	2018-08-11
19	4445-MUM-2015-CORRESPONDENCE(IPO)-(CERTIFIED)-(2-12-2016).pdf	2018-08-11
20	4445-MUM-2015-FER.pdf	2020-07-10
21	4445-MUM-2015-OTHERS [10-01-2021(online)].pdf	2021-01-10
22	4445-MUM-2015-FER_SER_REPLY [10-01-2021(online)].pdf	2021-01-10
23	4445-MUM-2015-COMPLETE SPECIFICATION [10-01-2021(online)].pdf	2021-01-10
24	4445-MUM-2015-CLAIMS [10-01-2021(online)].pdf	2021-01-10
25	4445-MUM-2015-US(14)-HearingNotice-(HearingDate-22-01-2024).pdf	2024-01-06
26	4445-MUM-2015-FORM-26 [21-01-2024(online)].pdf	2024-01-21
27	4445-MUM-2015-FORM-26 [21-01-2024(online)]-1.pdf	2024-01-21
28	4445-MUM-2015-Correspondence to notify the Controller [21-01-2024(online)].pdf	2024-01-21
29	4445-MUM-2015-Written submissions and relevant documents [05-02-2024(online)].pdf	2024-02-05
30	4445-MUM-2015-PatentCertificate20-02-2024.pdf	2024-02-20
31	4445-MUM-2015-IntimationOfGrant20-02-2024.pdf	2024-02-20

Search Strategy

1	Searchstrategy_4445MUM2015E_09-07-2020.pdf