subject

Question 1 (Index Construction):
Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.
You have collected the following documents (unstructured) and plan to apply an index technique to convert them into an inverted index.

Doc 1:data science is field to use scientific method, process, algorithm, system to extract knowledge.

Doc 2:data mining is the process to discover pattern in large data to involve method at the database system.

Doc 3:information system is the study of network of hardware and software that people use to process data.

To answer the below questions, you have to provide the detailed procedures step by step.
You need to remove all stop words and punctuation before the process of creating the inverted index. After that, please complete the following steps:

Question 1.1:
Create a merged inverted list including the within-document frequencies for each term.

Question 1.2:
Use the index created as above to create a dictionary and the related posting file.

Question 1.3:
Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only.

Question 1.4:
Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 00:20
The pyraminx is a rubik's cube-type toy in the shape of a tetrahedron (not a pyramid). the pyraminx shown below has edges 15\,\text{cm}15cm15, space, c, m long and vertical height h=12.2\,\text{cm}h=12.2cmh, equals, 12, point, 2, space, c, m. the triangle drawn with dashed lines is a right triangle. what is the distance rrr? round your answer to the nearest tenth.
Answers: 1
question
Computers and Technology, 22.06.2019 08:00
Two technicians are discussing the common u-joint. technician a says its input and output speeds should be equal. technician b says that it normally has two yokes. which technician is correct?
Answers: 1
question
Computers and Technology, 23.06.2019 13:30
What is the primary difference between the header section of a document and the body? a. the body is displayed on the webpage and the header is not. b. the header is displayed on the webpage and the body is not. c. the tag for the body is self-closing, but the tags for the headers must be closed. d. the tag for the header is self closing, but the tag for the body must be closed.
Answers: 3
question
Computers and Technology, 23.06.2019 19:30
What are loans to a company or government for a set amount of time
Answers: 1
You know the right answer?
Question 1 (Index Construction):
Suppose you have joined a search engine development team to...
Questions
Questions on the website: 13722360