Exercise 2: Literature Review

Much scholarly effort involves searching and synthesizing existing published works. How might we augment this sense-making process to better find, connect, summarize, and/or remember a collection of papers?

For this exercise, you will turn a dataset describing a collection of papers into an informative artifact useful for understanding and/or navigating that collection. Possibilities might include a visualization or other visual summary, a search tool, or the results of an analysis of the structure or content of the collection. However, whatever you do should in theory be repeatable: given a different collection can you readily produce a similar artifact?

You may work alone or in teams of two. You should publish your work as a web page and submit its URL to Canvas by October 28, 11:59pm PDT.

Assignment

We will provide a dataset that contains bibliographic information for all the readings in this course. We encourage you to start by examining the data and the various fields included. We also encourage you to look at online services (like the Semantic Scholar API, and the details it can provide about a paper) to help seed your thinking about what is possible.

Next, choose one task or question you’d like to support to aid researchers engaged in a literature review. Interesting aspects might include paper content, figures/imagery, chronology, popularity, citation networks, and co-authorship networks. But do not try to incorporate too many of such aspects! There is large swath of possible literature review needs. For this assignment you should focus on a limited, well-scoped, yet compelling aid.

Finally, design and prototype a corresponding interface and/or analysis algorithm to process the citation data and produce a result. Publish your results on a website accessible to course staff and students.

Data

The data is in JSON format, and combines the results from two different services: DOI.org and the Semantic Scholar API.

Paper Data: cse599d-paper-data.json (8.8MB)

The data is an array of objects. Each object has four top-level keys:

doi - The Digital Object Identifier of the paper.
s2id - The Semantic Scholar identifier of the paper.
csl - A CSL JSON object of citation data from DOI.org.
s2data - A JSON object with data queried from Semantic Scholar.

Note that some of the keys may be null, as papers might occur in one index but not the other. The dataset excludes any readings that are not listed in one of these two systems.

The dataset was constructed by querying for the paper ids provided in this list of papers (an array of { doi, s2id } objects):

Paper List: cse599d-papers.json

The @uwdata/citation-query library includes a bin/retrieve-papers.js script that will reproduce the dataset above when given the paper list as input.

Other Data Sources and Tools

You are free to further augment the provided data. If you are so inclined, you are also free to construct an alternative dataset (for example, based on your our literature reviews or a bibliography from one of your own papers). If you build an alternative data, we still encourage you also to share results from the provided dataset, as this may help with comparison and discussion across the class.

There are multiple free services that provide citation data:

For usage examples in JavaScript, see this notebook on Querying Citation Data or the @uwdata/citation-query library.

Other tools can be handy for parsing/conversion of citation formats:

Pandoc (Command line, Haskell)
Citation.js (JavaScript)

Submission Details

Your artifact should be published on the web. Hopefully this will be straightforward if you produced a visual and/or interactive artifact. A link to an online computational notebook is fine. If you instead produced an algorithmic analysis, create a web page that documents your work and presents the results. Please include any usage instructions, caveats, etc. as notes on your published page.

Submit the URL to your artifact on Canvas. Your webpage and submitted URL are due on October 28 by 11:59pm PDT.