Chapter 1 Introduction

This documentation describes the “news intensity” dashboard and API. These are tools for gauging the amount of relevant media coverage for any asset. That is to say, our news intensities (NI) measure the volume of news stories that are realistically pertinent to an asset, regardless of whether they mention that asset by name. For example, rare earths are used in making smartphones, therefore a story about rare-earth mining is potentially of relevance to a company manufacturing smartphones, whether or not the company is mentioned in the story. Likewise, Peru is a leader in zinc mining, therefore a story about political turmoil in Peru may have an impact on the prices of zinc futures, whether or not the connection is made in the story itself.

The notion that asset prices are impacted by the nature and extent of media coverage (even when it does not contain new information) has found strong support in the finance literature (e.g. Huberman and Regev (2001), Engelberg and Parsons (2011), Tetlock (2011)). By capturing the amount of related media coverage for any asset, NI are expected to have incremental explanatory power for asset price movements.

The calculation of NI relies on’s multi-lingual semantic fingerprinting technology, which ingests text and outputs its “semantic fingerprint” in the form of a sparse 128x128 matrix of 0s and 1s. Each position in the matrix represents a topic, so that the fingerprint of a text indicates which of the possible 16,384 topics the text is related to. Every day and for every supported language, we obtain the aggregate semantic fingerprint of online news stories from thousands of sources. The count of the relevant stories for each topic is then normalized so that the average normalized value across topics is equal to 1. We also obtain semantic fingerprints for traded assets based on their textual descriptions (Ibriyamova et al. (2017), Ibriyamova et al. (2018) show that such fingerprints have significant information content). The value of NI for any day/language/asset combination is the average of the normalized article counts across all topics related to the asset in question.


Huberman, Gur, and Tomer Regev. 2001. “Contagious Speculation and a Cure for Cancer: A Nonevent That Made Stock Prices Soar.” The Journal of Finance 56 (1). Wiley Online Library: 387–96.

Engelberg, Joseph E, and Christopher A Parsons. 2011. “The Causal Impact of Media in Financial Markets.” The Journal of Finance 66 (1). Wiley Online Library: 67–97.

Tetlock, Paul C. 2011. “All the News That’s Fit to Reprint: Do Investors React to Stale Information?” The Review of Financial Studies 24 (5). Oxford University Press: 1481–1512.

Ibriyamova, Feriha, Samuel Kogan, Galla Salganik-Shoshan, and David Stolin. 2017. “Using Semantic Fingerprinting in Finance.” Applied Economics 49 (28). Taylor & Francis: 2719–35.

Ibriyamova, Feriha, Samuel Kogan, Galla Salganik-Shoshan, and David Stolin. 2018. “Predicting Stock Return Correlations with Brief Company Descriptions.” Applied Economics. Taylor & Francis, 1–15.