subject

Example: data set: collections of text documents. problem: count the frequency of nouns that appear at least 100 times in the documents. (i) mapper function: tokenize each line into a set of terms (words), and filter out terms that are not nouns. (ii) mapper output: key is a noun, value is 1. (iii) reducer input: key is a word, value is list of 1’s. (iv) reduce function: sums up the 1’s for each key (noun). (v) reducer output: key is a noun, value is frequency of the word (filter the nouns whose frequencies are below ) data set: amazon book ratings data. each line in the data file has 4 columns (reviewer id, book id, book genre, rating), where ratings are integer-valued ranging from 1 to 4. problem: identify the highest rated book, i. e., the book with highest average rating, for each book genre. note that each book can have more than one ratings (e. g., by different ) data set: movie preference data. each record in the data file contains the movie title and list of users who liked the movie. for example, the record jaws user111 user134 user313 user5812 star_wars user111 user313 user388 user4422 problem: for each pair of users, count the number of movies they both liked. the output may exclude pairs of users who do not have any movies they both liked.(c) data set: maximum and minimum daily temperature readings for weather stations from around the world. each line in the data files has 4 columns (station id, date, max temperature, min temperature). 2 problem: find the station id and date of anomalous temperature readings in the dataset. a temperature reading is anomalous if the minimum daily temperature exceeds the maximum temperature for the given day.(d) data set: instagram friendship graph. each record corresponds to an instagram user, followed by a list of his/her friends. for example, the graph data may contain the following records: john123 mary456 tom312 lee222 mary456 john123 tom312 john123 lee222 lee222 john123 tom312 the first line above states that mary456, tom312, and lee222 are friends of john123. problem: find pairs of instagram users who are not friends with each other but who share one or more common friends. this is known as the "friend-of-a-friend" (fof) problem. for example, mary456 and tom312 are both friends of john123, but they are not friends with each other. the hadoop program should only output the pair (u, v) if u < v. in the previous example, the program should only output the pair (mary456, tom312) but not (tom312, ) data set: cancer data. each line in the data file corresponds to a patient with the following nominal-valued attributes: patientid, gender, marital status, smoker, weight class, and class, where the class attribute has value yes or no to indicate whether the patient has cancer. 12345, female, married, smoker, normal, yes. 13, male, single, nonsmoker, normal, no. 14423, male, married, smoker, overweight, yes. problem: compute the gini index for each of the following attributes: gender, marital status, smoker, and weight class, based on the distribution of their class values.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 06:00
Which statistical function in a spreadsheet you to see how far each number varies, on average, from the average value of the list?
Answers: 2
question
Computers and Technology, 23.06.2019 07:30
What part of the interface displays the external references contained in a selected cell? the status bar the review tab the scroll bar the formula bar
Answers: 1
question
Computers and Technology, 24.06.2019 18:50
Write a program that reads in a series of lines of input character by character (using the library function the first line of the input contains an integer which specifies the number of remaining lines of input, each of which contains a floating point number. the integer value on the first line can be read with (the library function) but all of the following lines can only be read with each line, after the first, contains a single floating point value, with up to four digits before the decimal point, and up to four digits following the decimal point, but there is not necessarily a decimal point in each number; i.e., it may appear to be an integer, but the digits should be read by your program, and the number should be converted to a corresponding floating point number. for instance, suppose the following input: 5 3.1255 20.25 0.875 1921.50 31 the required output is: − each of the input floating point values, printed on a separate line with four digits of precision, using printf(); − on the last line of the output, the string “total: ” followed by the sum of the input values, printed with printf() to 4 digits of precision. for example, the total of the sample input given above is 1976.7505, so the required output for this input would be: 3.1255 20.2500 0.8750 1921.5000 31.0000 total: 1976.7505 do not concern yourself with small differences in the total due to rounding, as the grader will not deduct points for this. constraints: − you are not allowed to use arrays on this portion of the lab assignment. − there is no maximum number of lines allowable. it all depends upon the first value of input. since you aren’t saving anything, it doesn’t matter. − you can assume that input will not contain more than 4 digits before or after the decimal point. you do not need to error check for this condition. -you must use getchar() to read in the floating point values one character at a time (i.e. do not use -you must declare and use your floating point values as a double to minimize rounding errors. -only use printf() to output the floating point numbers and the total (do not use − be sure your directions to the user are clear so they are sure to enter the input data correctly.
Answers: 1
question
Computers and Technology, 25.06.2019 03:00
What judgment would you make about open systems interconnect? not useful that it is a technology that it is a model that it does not pertain to technology
Answers: 1
You know the right answer?
Example: data set: collections of text documents. problem: count the frequency of nouns that appe...
Questions
Questions on the website: 13722361