代写Assignment 1 v2 develop a Gold Standard

- 首页 >> Python编程

Assignment 1 v2


Objective


The aim is to develop queries on your topic, index the Assignment 1 Document Collection, develop a Gold Standard and hence evaluate your system.


Developing Queries


You need to develop 20 queries, based on your chosen topic, as registered on Moodle and agreed with the lecturer. If you want to confirm what your topic is, you need to contact the lecturer. Each query appears in three forms in the Gold Standard: original_query, keyword_query, and kibana_query (see below). The 20 queries should be broken down into the following types:


Note that the queries are classified based on the type of information you are looking for. They are not classified based on the types of information in the query itself. So, for example a query like ‘When did Dire Straits play a concert in Dublin’ mentions an Organisation (Dire Straits), an Event (concert), and a Place (Dublin) in the query but the type of the query is in fact Date, because that is the type of the information we are looking for.


Try to use at least three types of query in your Gold Standard. Try searches of different types for your chosen topic and see what documents you can match. Based on that, decide which query topics work best for you. In general, try to use several different query types, minimum three.


First, you write down the original_query, which is complete and in English. Next, develop a Kibana Query (kibana_query in the .json) which matches relevant documents as well as it can, using keywords from the original_query but possibly combining them into exact phrases, and using a ‘bool’ of type ‘must’ or ‘should’ etc. See the lab sheets for hints. For finding a named entity such as a person or organisation, a phrase match always works well. Finally, keyword_query will be used in the Python as a simple best_fields search but uses exactly the same keywords as the kibana_query.


You need to experiment with various queries to see whether they will retrieve relevant documents for your project. Once you are happy with your query, you will have a kibana_query based on the keywords, possibly combined into phrases etc., and a keyword_query based on the same keywords. Now, keep these fixed, and start formally searching for matching documents to put in the .json Gold Standard.


When you enter a query and obtain results, you need to look at the top 40 documents returned and, for each document decide whether it contains the answer to the query, and if so, why. If a document contains the answer, we say it is ‘relevant’. If no document in the top 40 is relevant, you must change your query in order to meet this condition. So you need to spend some time with Elasticsearch/Kibana in order to develop your queries.


Queries beyond the first 40, we never look at. So, we never know how many relevant documents there really are for a query. We just estimate this, using the first 40.


You are allowed to reverse-engineer this. So, you can do some searches on your topic, and, based on the documents returned, see what questions are going to have relevant documents.


Developing the Gold Standard


Your Gold Standard will be a JSON containing the queries. For each query there will be one or more relevant documents, each specified by their DocID (shown as “_id” in the document collection. For each relevant document, there will be a list of one or more sentences, taken from the document, which contain the answer to the query (we say that the sentences support the answer). Sometimes, two sentences are needed, because one sentence alone does not demonstrate relevance, but in most cases, one sentence will demonstrate the relevance on its own.


The format of the Gold Standard JSON is shown in the file gold_standard.json It shows a complete example of a query and its matching documents. Also see worked_example_gold_standard.txt which shows exactly how the data was derived. Finally, rgs_read_gold_standard.py is a useful program which can read in your .json and print out information about it. If that program crashes, there is a problem with your .json. The error message gives the line number in the .json which is useful for tracking down mistakes.


The JSON you hand in must be syntactically correct (i.e. we must be able parse it after submission, and convert to a Python dictionary). If rgs_read_gold_standard.py can read your .json then it is OK syntactically.


The Document Collection


The document collection is called the ‘Assignment 1 Document Collection’ and you will find it on Moodle. It has the same format as previous collections (e.g. collection_10_docs_per_topic.json etc) but contains more documents.


Evaluating the System


We will evaluate the standard Elasticsearch system, using both the keyword_query and the kibana_query. For keyword_query we will use a simple  “best_fields” search, incorporating the keywords as the query. For kibana_query, we will submit the exact kibana query you provide in the Gold Standard, using the Python interface to Elasticsearch.


We can evaluate the system automatically, at n = 5 and n = 10. Some code will be provided to do this, but you will need to combine the parts together.


Concerning Recall, we will assume that all relevant documents are returned in the first 40. This assumption is unlikely to be true, of course, but it is needed to make the project practical.


Video


You should prepare a short video (about 5 minutes long) describing your work. The video should be in English. It is recommended to use Zoom to do this. You simply start a Zoom meeting with just yourself, start recording the meeting, then sit at your computer making the recording.


You must have a camera and film yourself. The film of you must appear in the video in one corner.


1. The video should start with you saying your name, reading out your student PRID and reading out your Registration Number.


2. Next, you should take four Kibana Query Language queries (kibana_query in JSON) from your Gold Standard: If you have queries of four or more types, demonstrate four queries of different types. If you have three types, choose one of each type, and then another which is one of the same three types. Enter each into Kibana and show the results returned. Scroll down the results until you find a document which is relevant to the query. Then explain very briefly why it is relevant, i.e. point to the parts which contain the answer.


3. Next, you should go to the Python code. Point to each function and explain briefly what it does (you do not need to go into detail, just a quick tour is needed).


4. Now run your Python evaluation program and show that it produces some output results.


5. Finally, bring up your .docx table of results and answer the question ‘Does Recall improve as n increases?’.


Note: This is not a test of English. As long as the video is coherent and understandable, it does not matter if the English is not quite correct. Also, if you make some mistake in what you intended to say, you can just correct it and carry on. No marks will be lost, we just concentrate on the content.


Hint: To show Kibana, say, in a Zoom meeting, you need to have it already running in a browser window. Then, start the meeting. There will only be one person at the meeting, i.e. you. Make sure your video is enabled so you can see yourself on the screen. In the presentation, do Share Screen, and select the browser window to share. Later, you can stop the sharing and then share PyCharm (you can use other ways of running Python if you prefer). Then stop sharing, and share the .docx. You may need to practise this a little and look at the resulting recording to make sure it is all working. We strongly recommend to use Zoom as we know it works and Essex has a license which you can use. However, you may use other software as long as it records the screen, shows a video of you throughout (on one side of the picture), has sound, and produces an .mp4 file of similar size to that produced by Zoom for the same meeting.


If you prefer not to make a video, we can evaluate your work in the lab. However, this will need to be in the evening, at a time stated by us, and this method will inevitably involve a lot of waiting around, but we will do our best. We may record that as well, but it will not need to show your face. Our suggestion is to try making a test video with a camera in the next days, and see how you get on. If it does not work out, then you can request a lab evaluation.


Table of Results


You should hand in tables of results, and a distribution of query types. The format of the tables will be given to you as empty tables in a .docx. The .docx may ask you a few other simple questions, rather like the lab sheets.


All real numbers in the table should be presented with 2 significant digits, e.g. 1.00, 0.45.


What to Hand In


Gold standard in JSON


Video in .mp4 [or lab evaluation]


Table of results in .docx


Suppose your ID number is 2300000. The files you upload to FASER must be called:


2300000.json


2300000.mp4


2300000.docx


Marking Scheme


Lab 1, Lab 2, Lab 3, and Class 1, Class 2 count for 4% each towards the entire Module, i.e. 20% in total or the entire Module.


Assignment 1 counts as 40% of the overall module total. Half of this is the 20% for the 3 labs and 2 classes. Assignment 1 is marked out of 100. Therefore, relative to the assignment (which is marked out of 100), 50% is the labs and classes. Effectively 10% of the assignment mark is for each lab or class. Once the assignment mark is scaled to be 40% of the entire module, each of those 10% will become 4% of the entire module.



站长地图