Back to Work

Data Visualization

TL;DR:   Data Visualization project with D3 and React

Data Visualization thumbnail image

Data Visualization

Map ViewMap View 2

Motivation: Climate change is a global issue that is disproportionately being caused by certain countries while disproportionately impacting others. Taking immediate action to avoid climate disasters and humanitarian catastrophes is listed as the 13th UN Sustainable Development goal. To effectively combat rising temperatures, it is important to understand opinions on this issue on a worldwide scale, specifically the overall sentiment and stance of certain countries and their change of opinion over time. Though the main avenue of investigation so far has been through the administration of general surveys with yes/no questions¬† see the Yale Climate Change Opinion project ‚Äď Leiserowitz (2020), there is less work in qualitatively analyzing textual content, which should in theory provide more nuanced information. Our aim is for everyone, regardless of background, to explore and understand what factors influence their own country's opinion on climate change and to be aware of any biases or modern trends that currently exist. The visualization on this webpage reflects one of the best attempts so far at using natural language data from social media to tackle this problem!

Visualization explanations: This webpage includes a world map selector with a search feature to view a selection of statistics of any particular country, as well as a holistic map perspective that displays attributes such as sentiment, stance, and aggressiveness for all countries simultaneously. The first view allows you to filter for specific countries and attributes simply by using the dropdowns. The second view allows you to uncover more overall trends, again by selecting any attribute with the dropdown. Both views allow you to view the data over time with a time slider dynamically.In our first view, a card with country data and a two charts displaying the world averages: a line chart documenting the averages of the world visualizing the chosen attribute over time. We maximize readability by separating out these visualizations (as well as the time scale), and doubly encoding the attribute with color.In our second view, we clearly encode geolocation data with spatial position, and encode the chosen attribute on a monochromatic scale (not color, since these variables are quantitative, but each attribute uses a different hue). Again, we separate out the time scale to allow the user to brush across specific periods.

Data Analysis: The climate change twitter dataset is a collection of 14 million anonymized tweets gathered from 2006-2018¬† (Effrosynidis et al., 2022). The tweets were obtained by combining existing datasets that used Twitter's API (no longer free), identified using keywords such as ‚Äúglobal warming‚ÄĚ, ‚Äúclimate change‚ÄĚ, ‚Äúclimate crisis‚ÄĚ, etc. The authors obtained the actual text of the tweets by using a hydrator, and performed several natural language processing (NLP) tasks on the resulting data.We focus on several key attributes resulting from this analysis: geolocation, sentiment, aggressiveness, and stance. Geolocation of a user's tweet who has sharing permission is represented by an x and y coordinate, and can be assigned to a particular country. Sentiment (positive or negative), aggressiveness (aggressive or not), and stance (believer vs. neutral vs. denier) were automatically calculated via NLP and machine learning techniques on our end was done within a¬† Google Collab Notebook with the pandas Python library, where only geolocated tweets were subsampled from the original dataset, and coordinates were mapped to countries. The sampled data was then converted into a month-separated json file for efficient dictionary lookup. This is the data that runs the current visualization. Google Collab Link

Task Analysis: We focus on the following three tasks  (Munzner 2014), which can be investigated through our visualizationAre there regional patterns in climate change opinion? In particular, across which continents, hemispheres, and latitude lines? This task involves discovering patterns by browsing over the geographical and temporal search space, while comparing various regions to one another.What is the distribution of climate change sentiment across the globe? Are there more believers or deniers overall? Are tweets generally passive, or more aggressive? This task involves summarizing data across all regions, while alsobrowsing over individual time slices.How has climate change opinion changed over time? This task again involves discovering patterns by browsing over monthly period data, while comparing various regions to one another.

Design Process: We drafted our initial sketches of the visualization with the above tasks in mind. From the beginning, we envisioned a world map with data points divided over countries, where individual attributes of the data could be selected. We realized this high-level summary was difficult to interpret on its own, so we brainstormed ways to zoom in and highlight the properties of individual countries (Schneiderman 1996), settling on the current iteration of our first visualization, including dropdown selection and alternate perspectives across different time slices.

Conclusion: Overall, this visualization accomplishes our main goals and succeeds at enabling our main list of tasks. It enables a key set of data analysis tasks through a filtering of tweet data across content and across time. Following the ICE-T visualization value paradigm (Wall et al. 2018), we: optimize user tasks by allowing them to pinpoint selections with efficient menu interactions, allow the user to extract a multitude of patterns from multiple separate views, and provide a high-level data overview with map aggregation. Future work should involve the last heuristic - confidence - by allowing the user to review particular instances of data, for example by including a viewer for particular tweets.Until this work is implemented, we acknowledge that while interesting trends can indeed be quickly identified using this visualization, the techniques involved in the original paper are not perfect, and some measures, having been automatically calculated, may not be completely accurate. We encourage readers to explore this visualization deeply to start thinking about regional biases, but also to exercise reasonable caution when drawing conclusions.

Acknowledgements: React, D3, Python (pandas)Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American statistical association, 79(387), 531-554. Effrosynidis, D., Karasakalidis, A. I., Sylaios, G., & Arampatzis, A. (2022). The climate change Twitter dataset. Expert Systems with Applications, 204, 117541.Leiserowitz, A., Maibach, E., Rosenthal, S., Kotcher, J., Bergquist, P., Ballew, M., Goldberg, M., Gustafson, A., & Wang, X. (2020). Climate Change in the American Mind: April 2020. Yale University and George Mason University. New Haven, CT: Yale Program on Climate Change Communication.Shneiderman, B. (1996, September). The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings 1996 IEEE symposium on visual languages (pp. 336-343). IEEE.Wall, E., Agnihotri, M., Matzen, L., Divis, K., Haass, M., Endert, A., & Stasko, J. (2018). A heuristic approach to value-driven evaluation of visualizations. IEEE transactions on visualization and computer graphics, 25(1), 491-500.