subject

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount program should be of the form ‘wordfilename count’, where ‘’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount. java.

Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount. java’ on these two files will give an output similar to that below, where is a delimiter.

Output of DocWordCount. java

yellowfile2.txt 1

Hadoopfile2.txt 1

isfile2.txt 1

elephantfile2.txt 1

yellowfile1.txt 1

Hadoopfile1.txt 2

isfile1.txt 1

anfile2.txt 1

Initial code that needs to be modified:

package org. myorg;

import java. io. IOException;
import java. util. regex. Pattern;
import org. apache. hadoop. conf. Configured;
import org. apache. hadoop. util. Tool;
import org. apache. hadoop. util. ToolRunner;
import org. apache. log4j. Logger;
import org. apache. hadoop. mapreduce. Job;
import org. apache. hadoop. mapreduce. Mapper;
import org. apache. hadoop. mapreduce. Reducer;
import org. apache. hadoop. fs. Path;
import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
import org. apache. hadoop. io. IntWritable;
import org. apache. hadoop. io. LongWritable;
import org. apache. hadoop. io. Text;

public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount. class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job. setJarByClass( this .getClass());

FileInputFormat. addInputPaths(job, args[0]);
FileOutputFormat. setOutputPath(job, new Path(args[ 1]));
job. setMapperClass( Map .class);
job. setReducerClass( Reduce .class);
job. setOutputKeyClass( Text .class);
job. setOutputValueClass( IntWritable .class);

return job. waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText. toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word. isEmpty()) {
continue;
}
currentWord = new Text(word);
context. write(currentWord, one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count. get();
}
context. write(word, new IntWritable(sum));
}
}
}

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 13:50
Explain how email technologies enable the exchange of messages between users. find out the typical parts of an email address and explain each part.
Answers: 1
question
Computers and Technology, 24.06.2019 11:20
Print "censored" if userinput contains the word "darn", else print userinput. end with newline. ex: if userinput is "that darn cat.", then output is: censoredex: if userinput is "dang, that was scary! ", then output is: dang, that was scary! note: if the submitted code has an out-of-range access, the system will stop running the code after a few seconds, and report "program end never reached." the system doesn't print the test case that caused the reported message.#include #include using namespace std; int main() {string userinput; getline(cin, userinput); int ispresent = userinput.find("darn"); if (ispresent > 0){cout < < "censored" < < endl; /* your solution goes here */return 0; }
Answers: 3
question
Computers and Technology, 24.06.2019 17:50
Acontact list is a place where you can store a specific contact with other associated information such as a phone number, email address, birthday, etc. write a program that first takes in word pairs that consist of a name and a phone number (both strings). that list is followed by a name, and your program should output that name's phone number.
Answers: 1
question
Computers and Technology, 24.06.2019 21:00
How does a vaccine prevent sickness and individual?
Answers: 2
You know the right answer?
Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the...
Questions
question
History, 30.11.2020 06:30
question
Chemistry, 30.11.2020 06:30
question
Mathematics, 30.11.2020 06:30
Questions on the website: 13722361