VisEx: Interactive Visual Exploration Task (e)
What is VisEx?
VisEx is a pilot task for establishing an evaluation framework of explorative information access environments. It evaluates environments in which users interactively refine or elaborate their information needs, and through various activities, accumulatively collect proper information. It is important in such environments to employ information visualization techniques for showing access results and to allow interactions to visualized information. Allowing reformulation of queries and relevance feedback is also crucial. The purpose of VisEx is to evaluate such total information access environments.
In VisEx, the organizers prepare a common framework for explorative information access environment systems (IAESs) and design experimental tasks. The participants submit the core of an IAES, which works in the common framework. Submitted IAESs are evaluated through laboratory experiments with human subjects, who are requested to pursue the experimental tasks in given environments. Objective data such as elapsed time and number of interactions are measured, and subjective data such as (dis)satisfanction of subjects are also examined through questionaries. The participants are expected to support the experiments especially in the explanation of how to use IAESs to subjects, though the main body of the experiments is conducted and managed mainly by the organizers.
In this term, that is, in NTCIR-9, VisEx remains a kind of evaluation with human subjects. The genuine objective of VisEx, however, is to establish an efficient and effective methodology for objective evaluation of interactive, explorative information access environments. The result of the experiments in this term are expected to reveal relationships between wholistic evaluation of a total EIAS and the benchmark evaluation of its components. Those also provide hints for modeling exploratory information access and its environments, such as logs of behaviors of subjects using IAESs. The participants and organizers should discuss together the details of the experiments such as what data should be measured for those purposes, and then share a vision of a new framework for evaluating explorative information access environments.
The IAES of the architecture shown in the figure is postulated. With this IAES, users operate the core of IAES and the text editor through a browser interface, in order to tackle a given task. That is, they find information needed using the IAES core, and then record it using the editor. This procedure is repeated interactively. The IAES core exploits an information retrieval (IR) engine and other function modules such as those for displaying documents and constructing snippets through a defined interface protocol. The role of the core is to obtain user's information needs and send it to the IR engine and to show the results in a form with which the user could understand those and pursue the task easily.
The organizers provide such a IAES, that consists of a sample IAES core and the framework which in tern consists of an editor, an IR engine, and other function modules. The interface among those component modules is determined by the organizers through discussion with the participants. The sample IAES core is a baseline (reference) and is like an ordinary web search engine interface, which shows the list of matched documents with their snippets and allows users to access a document itself by clicking an list item. The participants submit IAES cores that follow a given interface protocol and can be embedded in the IAES framework. The IAESs constructed using submitted cores are evaluated in laboratory experiments with human subjects. In the experiments, the following experimental tasks are carried out by human subjects.
The following two tasks are tried in the experiments. IAES core could be specialized to handling one task, though general purpose IAESs that can handle both tasks are desirable. A participant may submit just one core specialized for one task, and participate just in the experiment of one task.
- Event Collection Task
- The Event Collection Task uses the event-list questions in the NTCIR-7 ACLIA Task as the test set, which includes the following questions (information needs), for example.
- Please tell me about incidents where NATO has recognized cases of friendly fire.
- Please tell me about airplane crashes that have happened in Asia.
- This task requests subejcts to collect nuggets (event characteristics such as its time and place) as many as possible in a given time period. It is also important that the users achieve this activity comfortable with small stress. The document sets used is Mainichi newspapers (in Japanese) from 1998 to 2001.
- Trend Summarization Task
- The Trend Summarization Task is on summarization of the trend of time-series statisitcal information such as the subjects of NTCIR-5,6,7 MuST. Trends requested include not only changes of a given statistic themselves but also those background and influence. The information needs can be expressed as follows.
- Please tell me a summary of the states of the cabinet approval rating from 1998 to 1999.
- Please tell me a summary of the changes of gasoline price from 1998 to 1999.
- This task requests subjects to collect nuggets as many as possible in a given time period, which are primitive information that constitutes a requested summary. It is also important that the users achieve this activity comfortable with small stress. The document sets used is Mainichi newspapers (in Japanese) from 1998 to 1999.
Both of the tasks have a similar characteristics to the tasks adopted in TREC interactive track, which requests subjects to achieve high aspect/instance recall in a given time period. As opposed to usual evaluation workshop tasks, the possible themes of the task, such as questions and names of statistics are open to the participants in advance to the experiments. The participants can refer to those themes when designing and constructing IAES cores. Those are new for the subjects of the experiments, which must be enough for proper evaluation.
The experiments are planed to conduct at some proper place gathering IAESs and human subjects. The participants are expected to support the experiments especially in the explanation of how to use IAESs to subjects, though the main body of the experiments is conducted and managed mainly by the organizers. The experiments are designed so that more than one subject uses more than one IAES for more than one theme of the tasks. The scale of the experiments, such as the number of the subjects will be determined taking the number of the participants and submissions into account. The data planed to be measured includes elapsed time to achieve the task, recall, which must be a function of time, number of interactions, which can be logged as shown in the figure. Subjective measures concerning subjects' satisfaction and stress are also examined using questionaries after the experiments,
It is needless to say that the most important for evaluating interactive, explorative IAESs is observing detailed behaviors of the IAES cores submitted by the participants. Those respond to and guide users' various information seeking behaviors, such as querying, browsing, scanning, navigating and selecting. Such micro-level observations of interactions between users and environments should be related to macro-level measures such as recall and user satisfaction. This relation could lead a model that estimates performance of IAESs as a whole from the results of benchmark tests of their components, and may allow more efficient evaluation using simulation. The genuine objective of VisEx is to establish such a methodology of evaluation. In order to achieve this goal, the participants are expected to join the discussion for sharing taxonomy/ontology of actions of both IAESs and their users, as a common ground of micro-level evaluation. And, IAES cores submitted should have proper mechanisms that allow it to obtain log data for micro-level evaluation as much as possible.
VisEx emphasizes community-based activities like discussion on these aspects. Such discussion includes design of an IAES framework and interface among its components. It is our hope that the organizers and participants joined together to make this workshop fruitful.
- Corpus of the answer nuggets for ten questions used in the event collection task and news paper articles that contain those. Named entities appeared in those are annotated by their type such as date, place, and person and by their thematic role to the event such as agent and theme, It is the set of relevant documents and nuggets to be retrieved. It may be used also in designing visualization components. (Released in Oct. 2010)
- Corpus of the passages and and news paper articles that mentions time-series statistics that the trend summarization task deal with. Names of statistics, those values, dates, and so on are annotated. It is the set of relevant documents and passages to be retrieved, and include nuggets constituting requested summaries. Annotated information may be useful for visualization design also. (Released in Oct. 2010)
- Summary examples of the trend summarization task constructed by human. (Released in Mar. 2011)
- End of Oct. 2010 Participation Registration (First) Due
- End of Dec. 2010 Participation Registration Final Due
- End of Dec. 2010 IAE I/F description release
- Latter part of Mar. 2011 Release the Framework and Baseline IAE
- Latter part of Jul. 2011 Conduct Laboratory Experiments
- Latter part of Aug. 2011 Release Experiment Results
- Nov. 2011 Report Submission Final Due
- Dec. 2011 NTCIR- Workshop meeting
- Please let the organizers know your interest, first.
- And, then, register your participation on the NTCIR-9 registration site.