Torngat1
From CISTI-ICIST LAB WIKI
Contents |
Project Torngat
Principal Investigator: Glen Newton, glen.newton@gmail.com
Ungava2 subproject. The goal of this work is to create semantic journal maps to support the user search experience, in a large scale digital library of science, technology and medical (STM) journal articles. By projecting article search results onto a semantic map, we seek to visualize and contextualize the query results, and offer interactive tools for users to refine queries and discover related articles.
This initial work is to find a technique that can scale to 10s of millions of terms. The prototype, (described in the paper below & requiring Java on the browser) shows how LuSql, Lucene, Semantic Vectors and R's MDS are used to create a 'Map of Science' from the full-text (only: no metadata used) of 5,733,721 articles from 2231 journals.
Note that the application has progressed (with improvements) since the writing of the paper, so there are some (small but noticeable) differences between the prototype and the paper.
Plan
- Find & validate method that can scale to large numbers of terms & build prototype visualization (Completed)
- Evaluate above at the subject category level
- Validate usefulness in a search context by projecting article search results onto semantic journal mapping space, & create tools to support discovery (like finding articles close to the articles in semantic journal space, etc.)
- Evaluate additional use cases such as:
- Given an arbitrary manuscript that is uploaded, project its location onto the semantic journal space. Show surrounding journals and surrounding articles. Useful for finding additional citations and possible journals for submission.
- Extend the semantic journal space to represent time, and visualize journals' relative movements through the semantic journal space over time
- Visualize in three dimensions, thus revealing better structure than in two dimensions
- Given a series of terms, display their locations on the semantic journal space & show similar terms
- Find & validate method that can scale to large numbers of terms AND large numbers of items, i.e. able to create an article semantic space.
Partners
- Alison Callahan, NRC Summer student & starting PhD @ Carleton September 2009.
- Michel Dumontier, Assistant Professor, Department of Biology and School of Computer Science, Carleton University.
Publications
- Newton, G. & A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. Preprint (PDF)

