subject

Yates' big file sorter

for this assignment, you will design and implement a sort program that still works correctly, even when there is not enough memory to store an entire input file. this will give you practice with collections, files, exceptions, and sorting algorithms.

program description

at runtime, your program should read the name of a "large" input file to sort, and the name to give the sorted output file.

the program will sort the lines in the input file, and save the sorted lines in the output file. your sort algorithm will be a hybrid of the mergesort algorithm and java's built-in sorting algorithm as a two-phase process, as described below.

phase 1: breaking the input file into manageable chunks

in the first phase, the sort algorithm will read in a fixed number of lines from the input file, and store them into an array. it should pick a manageable number that will fit into memory. next, it will sort that array using java's built-in arrays. sort method. then it will store the sorted array in a temporary file called "temp_0_0.txt".

the algorithm will repeatedly read in chunks of lines until it has read in the entire contents of the input file. each time it reads in a chunk of lines from the input file, it stores that chunk in an array, sorts the array, and saves the array in another temporary file ("temp_0_1.txt", "temp_0_2.txt", "temp_0_n. txt"). notice that it is reading in an amount that will fit into memory each time, so that it does not run out of memory.

after this first phase is complete, each of the "temp_0_i. txt" files is in sorted order, and together they contain all of the stuff that was in the input file. the second phase must merge the files together, while making sure that the merged files remain in sorted order.

phase 2: merging the chunks together

remember from class that the mergesort algorithm repeatedly merges two small sorted arrays into a larger sorted array. here, you will do the same, but instead repeatedly merge two small sorted files into a larger sorted file. take care that you only read in one line of each file into memory at a time don't read the entire files into memory, or you will run out of memory!

your sort algorithm should begin phase 2 by merging "temp_0_0.txt" and "temp_0_1.txt", and saving it to a new file called "temp_1_0.txt". next, it will merge "temp_0_2.txt" with "temp_0_3.txt", and save the merged file as "temp_1_1.txt". this will repeat until there are no more "temp_0_i. txt" files left. if there are an odd number of these files, the last one will have nothing to merge with. that's ok, it can be merged in later iterations. just copy it into the next available "temp_1_i. txt".

after merging pairs of the "temp_0_i. txt" files, the sort algorithm needs to begin merging pairs of the "temp_1_i. txt" files. it will begin by merging "temp_1_0.txt" and "temp_1_1.txt", and saving the result to "temp_2_0.txt". then, it will merge "temp_1_2.txt" and "temp_1_3.txt", and save the result to "temp_2_1.txt", and so on.

each time through the set of temporary files, the merging process cuts the number of temporary files roughly in half, because it merges two files into one. (i say "roughly" because there may be an odd number of files, in which case the last file does not get merged with anything.) this phase of the sorting must keep repeating, until there are only two temporary files left. when that happens, it should merge those temporary files, but instead of saving the result to another temporary file, it should save it to the output file. then the sort is finished.

guidelines for testing your program

you may find a reasonably large text file (~7 mb) here, which you may use to test your program. it contains aesop's fables, the complete works of shakespeare, mary shelley's frankenstein, and mark twain's the adventures of huckleberry finn.

the file is certainly small enough to fit into memory i didn't want to you to have to deal with an enormous file. however, you should choose a setting so that your program reads in less than the whole file at a time. at first, try reading in around 1/20 of the lines at a time into memory during the first phase. you should test your program with several different settings.

unit tests

write a junit test to make sure that your merge() function works correctly. be sure to test that it merges arrays that are the same size and two arrays that are not the same size.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 22:30
One of your customers wants you to build a personal server that he can use in his home. one of his concerns is making sure he has at least one backup of their data stored on the server in the event that a disk fails. you have decided to back up his data using raid. since this server is for personal use only, the customer wants to keep costs down. therefore, he would like to keep the number of drives to a minimum. which of the following raid systems would best meet the customer's specifications? a. raid 0 b. raid 1 c. raid 5 d. raid 10
Answers: 3
question
Computers and Technology, 23.06.2019 07:30
To check spelling errors in a document, the word application uses the to determine appropriate spelling. internet built-in dictionary user-defined words other text in the document
Answers: 2
question
Computers and Technology, 23.06.2019 10:30
How would you categorize the software that runs on mobile devices? break down these apps into at least three basic categories and give an example of each.
Answers: 1
question
Computers and Technology, 23.06.2019 13:30
Spoons are designed to be used for: spring hammering. applying body filler. identifying high and low spots. sanding highly formed areas.
Answers: 3
You know the right answer?
Yates' big file sorter

for this assignment, you will design and implement a sort program...
Questions
question
Mathematics, 14.04.2021 21:30
question
History, 14.04.2021 21:30
question
Mathematics, 14.04.2021 21:30
question
Mathematics, 14.04.2021 21:30
question
Physics, 14.04.2021 21:30
question
Mathematics, 14.04.2021 21:30
Questions on the website: 13722361