CFP: MuST at NTCIR-7
NTCIR-7 conducts an innovative task, Multimodal Summarization for Trend Information (MuST), in which we investigate how to exploit "trends" as an abstract or summary of information to be accessed. MuST is aimed at making use of linguistic and visual information in a cooperative manner in domains where textual and numerical information co-exists. The purpose of the task includes encouraging cooperative and competitive research, in order to establish technologies that support interactive and explorative information access by exploiting non-linguisitic information cooperatively organized with linguistic information. Technologies developed in the task should be so fundamental and general that can be applicable to several scenes including those featured by other NTCIR tasks.
The task is twofold. The first focuses on specific themes in compilation of textual and numerical information for information access and use of visual information in those scenes, and develops evaluation criteria of those themes. The second encourages a wide variety of research activities based on broad interests on this field.
The organizers sincerely appreciate your participation.
Information access has been getting more interactive and explorative recently. Information gathering is no longer achieved one-shot interaction in which a user describes his/her interest precisely and in return obtains just enough information relevant to it. Rather, it is a continuous process in which a user browses information gathered according to his/her vague interest, and then, seeks interesting parts whose detailed information is important to him/her. Through this process, s/he interactively gets to focus on some information and/or elaborates his/her interest. In addition, information handled there is not limited to text, but ranges over several media.
MuST considers "trends" as the core of systems that support such information access, in which users browse outlines, elaborate their interests, and narrow down into the details. Trend is the first answer to the questions users get interested in such as "how does it going in the game machine industry since 2006?,""what changes have been shown on oil and gasoline price this year? " and "How terrible were the typhoons last year?" and can be a good start point of interactive and explorative information access process. Information composing trend and the process of compiling trend have several interesting features like the following.
- Obtaining trends needs to compile information ranging over a given time-period. Since there are a lot of redundant information in it, that compilation should be synthetical and well-organized.
- Trends usually contain statistical data such as time-series data and geometrical data. Some statistics are more complicated and have other dimensions for different arrangement.
- Not only factual information such as reporting changes of statistical data, but also their interpretations, analyses of those cause, and forecast of those impact are important in trends.
- Information gathering and compilation of trends should cover not only several genres of textual information such as newspaper articles and blog pages, but also many styles of information such as numerical information on white papers.
MuST considers to support interactive and explorative information access, taking trends with these features as an example.
MuST has been conducted since 2005 as a pilot task of NTCIR workshop. MuST at NTCIR-7 picks up some themes many participants have been interested in there and organizes them into evaluation subtasks. It also welcomes researches on a wide variety of themes based on interests of the participants themselves. The participants are also encouraged to position their research themes in a wider view and to evolve those to fundamental technologies of information compilation.
The task is twofold. The first focuses on specific themes in compilation of textual and numerical information and use of visual information in those scenes, and develops evaluation criteria of those themes. The second encourages a wide variety of research activities based on broad interests on this field. In the both, the MuST data set plays a major role. This data set has been constructed through MuST of a NTCIR pilot task. It consists of annotated 581 articles of 27 topics from Mainichi newspapers in 1989 and 1999. The topics were selected as their articles have typical features that can be summarized as trends. The annotations correspond to the partial results of syntactic and semantical analysis for information extraction and summarization.
The followings are specific subtasks for evaluation (hereafter evaluation subtasks). The formal run of these evaluation subtasks uses Mainichi newspaper articles in 2000 as the data set. That is, we use articles in 1998 and 1999 for training and system development and in 2000 for the evaluation. The period of the run is relatively long, around two weeks, which allows the participants to tune up their system and/or to construct the domain knowledge of the topic of the run. A visualization tool (visualization platform) will be provided to the participants in order to visualize their results in a simple manner.
- Text to numerical data conversion, or extracting numerical data from texts
- Input: A set of newspaper articles mentioning a specific statistic over a specific time period.
- Task: Extracting all possible pieces of data of a given statistic from input articles.
- Output: The collection of the pairs of a time-point and the value of a given statistic at that point.
- Evaluation: precision and recall used in ordinary information extraction, and similarities (such as smallness of mean squared error) of curve drawn using obtained data and the real curve of the statistic (tentative).
- Numerical data to text conversion, or Text generation from numerical data
- Input: A set of data of a specific time-series statistic over a specific time period. (A set of the pairs of a time-point and the value of a specific statistic at that point)
- Task: Generating a text that explains the trends and changes of the statistical data of input.
- Output:Text that explains the trends and changes of the statistical data of input
- Evaluation: subjective evaluation by human assessors similar to one employed in the evaluation of text summarization.
- Alignment of textual information and time-series statistical data
- Input: A set of newspaper articles mentioning a specific time-series statistic over a specific time period and a set of data of that statistic over that time period (The input of T2N + The input of N2T).
- Task: Finding all possible pairs of a description in given newspaper articles mentioning a given statistical data and the corresponding data point mentioned by that description in a given numerical data .
- Output: Annotated newspaper articles and numerical data, in which annotations represent the mutual correspondences of their parts.
- Evaluation: precision and recall based on the correct data constructed by human assessors (tentative).
A wide variety of research activities based on broad interests on this field (hereafter free subtasks) are also welcome. Those should use or refer to the MuST data set and be concerned to compilation of textual and numerical information and use of visual information in those scenes. The participants engaging the free subtask may use the visualization platform in their themes also.
The MuST data set is provided to all the participants. To use this data set, you have to have Mainichi Newspaper Full-Text Article Database CD-ROMs 1998 and 1999. If you have not purchased that database, you should apply the approval to use them for the research purpose to the NTCIR project when you register your participation to MuST. The evaluation subtasks also use Mainichi Newspaper Full-Text Article Database CD-ROMs 2000 for the data set. You can use for the research purpose these data set including Mainichi Newspaper articles for nothing during the workshop.
Please visit the NTCIR-7 registration web page (http://research.nii.ac.jp/ntcir/cgi-bin/ntc7Registration.cgi) and fill in information requested there. You are also expected to send a form of memorandum on permission to use NTCIR task participant test collection. In order to confirmation the organizer appreciate it if you send a email to us (must-adminあcslab.kecl.ntt.co．jp) and let us know the following information.
- the name of your organization
- the name of the representative of the team and his/her e-mail address
- subtasks you will participate in
- Dec 27, 2007 Deadline of the registration
- Jan 7, 2008 The first round table meeting
- Feb 2008 Dry run of evaluation subtasks
- Mar 2008 Midterm workshop especially for free subtask
- Jun 2008 Formal run of the evaluation subtasks
- Aug 2008 The second round table meeting
- Sep 2008 Report submission deadline
- Dec 2008 NTCIR-7 workshop meeting
We are very sorry that most information on MuST is in Japanese. If you get interested in the MuST at NTCIR-7 and you don't understand Japanese, please send a e-mail to the organizers(must-adminあcslab.kecl.ntt.co．jp).