Sign In to Follow Application
View All Documents & Correspondence

Customized Zip File Input Format Using Mapreduce For Data Analysis

Abstract: MapReduce is a framework for processing huge number of datasets in parallel across a Hadoop cluster. Data analysis contains two steps mapping process and reducer process. Initially, the data for a MapReduce task is stored in input files, and input files typically reside in Hadoop Distributed File System. InputFormat describes the specification of input for a Map-Reduce job. InputFormat selects the files or other objects for input. We can implement a custom InputFormat to deal with the desired type of data and also to process specific input data file format as inputs to Hadoop MapReduce computations. Benefits of file compression are that it will reduce the space and also increase the speed of data transfer. As part of this proposed invention, we are collecting the homogeneous data and converting it into a Zip file, and we will customize our InputFormat as a Zip File Input Format and then provide the zip as input data and write the code for analyzing the data respectively. There by executing our MapReduce program we can get the analyzed output. 4 claims & 1 Figure

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
30 April 2022
Publication Number
19/2022
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

MLR Institute of Technology
Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad

Inventors

1. Mr. K. Sai Prasad
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
2. Dr. B. Maduravani
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
3. Dr. K. Srinivas Rao
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
4. Dr. E. Anupriya
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
5. Mr. P. Purushotham
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
6. Mr. P Srinivasa Reddy
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
7. Mr. T. Vinod
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad
8. Mr. K Shekar
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043, Medchal-District, Hyderabad

Specification

Description: Field of Invention
The present invention is relating to a system and method for analyzing the data from the various files in the Zipped format.
The Objectives of this Invention
In the proposed invention we give the Zip file which consists of many numbers of homogenous data files as the input and analyze all the data at single instance. We implement custom InputFormat to gain more control over the input data as well as to support specific input data file format as inputs to Hadoop MapReduce computations. In our invention we collected a homogenous placement data file and converted into Zip file and analyze the data using Mapreduce program. We get the insights of all the data in a single execution.
Background of the Invention
In (CN2010/102591847B), they invented a document processing device, document processing technique, and data management device are all included in the invention. The following are the steps in the documentation life cycle of a product: with the very first component, examine the input related documents; The following two types of distinct filtrates are set while evaluating the input pdf documents with the very first format via the specified phase analyzing input document files and including the attached document file with the second form. Another application invented by (CN2012/103714101A), their data management equipment and a data processing technologies are described in the present invention. A receiving unit, an identification unit, a kitchen counter utilized to calculate, an essential analytical unit, and a construction response team are all part of the control production equipment. One more invention (US2016/10268716B2), discussed methodology for interpreting Big-Data. The controller receives source information, which is then sent to the processing units for analysis. The server keeps track of a data model pertaining to a number of ancient completed tasks, that have at most once job identification, at most one textual sequences connected therewith an at most one job identification, as well as a listing of computation elements linked with the at most one text sequencing.
Summary of the Invention
Big data is used for processing enormous amount of data that can be in petabytes. Processing such a huge amount of data is a critical process and analyzing it also another task. The System we have invented gives us an easy way to compress the data and do the analysis of all the files in a single instance or execution. It is automated process of analyzing large number of homogenous files. In this way we have developed an invention which takes the zip file format of the student data and analyzes it in different dimensions all in a single execution.

Detailed Description of the Invention
In the proposed invention we will give the Zip file which consists of many numbers of homogenous data files as the input and analyze all the data at single instance. We implement custom InputFormat to gain more control over the input data as well as to support specific input data file format as inputs to Hadoop MapReduce computations. In our invention we will collected a homogenous placement data file and converted into Zip file and analyze the data using Mapreduce program. We get the insights of all the data in a single execution. The working process of this invention is explained as below;
This invention is being developed to solve the data analysis problems which are very crucial when we have some homogenous huge amount of data. When a user is having homogenous structured data stored in multiple files and asked to analyze such data instead of looking into each and every file separately a user can submit all the files at a stretch by keeping them in zip file and inputting it the developed algorithm. Hadoop framework has a provision to develop a user specified custom input format through which a map reduce code is written accordingly and to solve the problems which are discussed above our approach will be helpful. The use case explanation of this proposed invention is explained as below:
Collect the data and convert the data into structured data using excel sheets and save it in .CSV format. Then Convert the folder in which all the data files are present into a Zip format file using WinZip. Now write a Mapreduce program to process the Zip file format which is an input to the mapper and analyze the data in different dimensions in eclipse IDE and save it as .JAR file. Move the jar file into HDFS using Hadoop cluster using WINSCP Software. The multi node Hadoop cluster which is used to process huge amounts of data stores the data into HDFS and the jar file we have created will run the MapReduce job. Thus, even if a user has 1000 files in a zip file, data can be analyzed and within no time results will be analyzed according to the users input.
The following features are obtained from this proposed inventions as we are compressing files which will not only reduce the space but also data transfer will get faster. The second feature is very effective when dealing with large amount of data as we need not process every file separately but all file as a single zip format file. The third feature of this process is automated and it does not require much of manual work thereby reducing the efforts and also increasing the consistency in the results. The final feature is, we get a single combined output for all the files at a time making it easy to analyze the data without any extra efforts.

4 Claims & 1 Figure
Brief description of Drawing
In the figure which are illustrate exemplary embodiments of the invention.
Figure 1, the working of the Proposed Invention , Claims: The scope of the invention is defined by the following claims:

Claim:
1. A system/method for customized zip file input format using MapReduce for data analysis, said system/method comprising the steps of:
a) The system starts with input data (1), is in the form of Zip file (2).
b) That, zip file given to the HDFS (3) file system, then mapper (4), and also reducer (5) then finally we receive the output (6).
2. As mentioned in claim 1, collect the data convert the data into structured data using excel sheets and save it in .CSV format. Convert the folder in which all the data files are present into a Zip format file using winZip
3. According to claim 1, Mapreduce program to process the Zip file format and analyze the data in different dimensions in eclipse IDE and save it as .JAR file.
4. According to claim 1, the mapper and reducer analyze the input data and predict the output based on the user needs.

Documents

Application Documents

# Name Date
1 202241025414-REQUEST FOR EARLY PUBLICATION(FORM-9) [30-04-2022(online)].pdf 2022-04-30
2 202241025414-FORM-9 [30-04-2022(online)].pdf 2022-04-30
3 202241025414-FORM FOR SMALL ENTITY(FORM-28) [30-04-2022(online)].pdf 2022-04-30
4 202241025414-FORM 1 [30-04-2022(online)].pdf 2022-04-30
5 202241025414-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [30-04-2022(online)].pdf 2022-04-30
6 202241025414-EVIDENCE FOR REGISTRATION UNDER SSI [30-04-2022(online)].pdf 2022-04-30
7 202241025414-EDUCATIONAL INSTITUTION(S) [30-04-2022(online)].pdf 2022-04-30
8 202241025414-DRAWINGS [30-04-2022(online)].pdf 2022-04-30
9 202241025414-COMPLETE SPECIFICATION [30-04-2022(online)].pdf 2022-04-30