December 15, 2018 - EPFL, Lausanne

In the months following the 2016 American election, media outlets have reported an increase in social divisiveness and the polarization of opinions. Many of these organizations point at modern social media as a potential cause of this divide. Through an analysis of the wide range of social and political discussions available on Reddit, we seek to measure and analyze how this polarization of opinions has evolved over time in order to confirm or disprove this hypothesis.


A Quick History of Reddit

Reddit is an online discussion and social news platform where content is created, shared and ranked by the community. Originally founded in June 2005, Reddit now contains more than 500,000 monthly active users and a total of over 3 billions comments. According to Wikipedia, which references a study published by Alexa Internet, Reddit is the 3rd most visited website in the United States, and 6th worldwide.

We highlight two main specificities of Reddit relevant to this study. First, the website uses explicit user feedback to evaluate and rank its content. This feedback is in the form of a vote (upvote for positive and downvote for negative) that any user can perform on any post or comment. These votes are then aggregated into a single score metric, which measures the difference between upvotes and downvotes.


Fun Fact - A negative score does not necessarily indicate a polarizing discussion. On the blessed day of November 12, 2017, the entire Reddit community came to an agreement to mass downvote a reply from EA Community to a user frustrated about their questionable revenue model on Star Wars Battlefront. At the time of this writing, the comment has a score of -667,815 points, the lowest on the entire platform.


Secondly, Reddit separates its content into numerous user-created communities (formally named subreddits). Each subreddit is a forum of its own, with a specific set of rules and informal codes that members adhere to. They vary from very large communities discussing a broad range of subjects (for instance /r/politics) to smaller niche groups centered around a very specific topic. (e.g. /r/BirdsTakingTheTrain).


High-Level Statistics


On the left, one can see the daily number of comments posted on Reddit over the years (averaged over a period of 60 days). On the right are plotted the 10 biggest subreddits in terms of total comments posted.

In the chart above, we see that 9 out of the 10 the biggest Reddit communities are related to either politics or sports. This highlights a potential bias in our data, as the set of Reddit users might not be representative of the general population in terms of interests or behaviors.

Finding the Right Metric

The initial part of our research was based around the notion of divisiveness itself. Given the available data, how can we measure if a particular discussion is polarized or not? What is a sign that a topic is controversial?

In our research, we define polarization as the degree to which people's opinion on a topic are clustered around several extreme, divergent positions. In other words, it can be thought of as the lack of a healthy middle-ground in the opinion space of a community. Similarly, a particular topic is seen as controversial if it easily creates such a polarization. For example, gun control in america is a controversial topic because a people either argue for a complete ban on guns, or total freedom. This leaves little room for compromise.

Having said that, we quickly realized that polarization itself is a complex, multi-faceted notion that would be hard to measure by itself. In order to capture it properly, we decided to keep 4 metrics, each measuring a different aspect or consequence of a controversial topic. For each metric, we present its distribution over the dataset and its average over 4 meta-categories of subreddits (compiled with the help of /r/ListOfSubreddits).





In the plot on the right, we already notice that politics seems to be the most polarizing category. It simultaneously exhibits the lowest level agreement and positivity while having the highest negativity and vulgarity.

A Look at Communities

With all our metrics properly defined, we are now fully equipped to explore the dataset. In particular, let's first have a look at how the different communities of Reddit compare with regards to agreement and positivity.

Looking at the plot below, some things already stand out. /r/The_Donald, a subreddit dedicated to unconditional fans of Donald Trump, is by far the subreddit with the highest agreement of all. We found two possible explanations to this result. In the first scenario, this community truly is always in constant agreement, such that only very few comments ever disagree with the general opinion. Or, such comments do exist but are systematically removed by the moderation team. Either way, this result points towards a general feeling that Redditors visiting this community might have experienced: /r/The_Donald is a big echo chamber.


Positivity Versus Agreement Score


We can also point out the location of /r/worldnews, /r/news and /r/politics, all of them in the bottom left. With all three being non-partisan communities, their overall low agreement simultaneously supports our hypothesis about /r/The_Donald and provides a good comparison point. Moreover, we also notice that the discussions related to politics tend to be the least positive of all.

On the other side of the plot, we notice that /r/aww and /r/trees are featured as the most wholesome communities. This is not surprising. Indeed, /r/aww is a community dedicated to posting cute pictures of animals. On the other hand, /r/trees is a subreddit reserved to regular users of a special kind of "tree," which we leave it to the reader to figure out by themselves which kind exactly.


Hint - Snoop Dogg used to be a big contributor of /r/trees, accumulating a respectable amount of 649,567 Reddit Karma points on his main account /u/Here_Comes_The_King

The Evolution of the Divide

As stated in the introduction, another one of our motivations in this study was to understand how polarization and divisiveness have evolved over time. Is our society truly more polarized today, as some of the media seem to think? If that is the case, we should be able to find some hints supporting this theory while looking at long-term trends on Reddit.

We started by analyzing how our agreement metric has evolved over time. Looking at the results below, what we find is a pretty clear upward trend since 2012, with a sharp increase around November 2016. Concretely, it means that the average Reddit user is more in agreement with the content they see on the platform today than they were 10 years ago. There are many factors that can contribute to this. For instance, Reddit might have tweaked and improved their underlying ranking algorithm (for instance, which posts make it to the front page). However, it remains that the average Reddit user is lesser and lesser in contact with opinions with which they disagree and might experience a greater feeling of consensus as a result.

The echo chamber argument is also strengthened by the results found on the second plot. There, we can see that the increase in agreement is coupled with a 20% decrease in the average number of communities a user interacts with. Altogether, this supports a potential explanation that people might have formulated: online communities have become more polarized as users increasingly interact with like-minded individuals.


The Echo Chamber Effect


In the first figure, one can see how the agreement factor, despite an initial recess between 2010 and 2012, has since then steadily increased. On the second plot, we see that the average number of subreddits in which a user participates has decreased by approximately 20% in the same time lapse.

Next, we asked ourselves how this results correlate with the other metrics we defined like positivity and negativity. In fact, we saw that positivity on Reddit has globally been increasing from 2008 to 2012 and then reached a pretty steady plateau. At the same time, the negativity has been decreasing in an almost-perfectly mirrored fashion.

Finally, we note that the initial dip matches with the 2008 financial crisis and seems to increase as the economy recovered, but establishing whether there is a significant correlation is beyond the scope of this study.


Positivity Versus Negativity Over Time


Here, we see that the positivity and negativity metrics are mirrored over time. Despite an initial decrease between 2006 and 2008, positivity has been on the rise until 2012 and fairly flat since then.

Conclusions

In conclusion, we have seen multiple ways to indirectly observe the effects of polarization, a notion hard to measure by itself. Through these metrics, we detected that Reddit users are more polarized today, in the sense that on average, users partake in a smaller amount of communities that exhibit a higher level of unanimity. This provides support for the hypothesis that modern social medias and their inherent filter bubbles create an echo chamber effect.
Data Source: This entire study was based on a free, publicly available dataset. containing over 3 billion Reddit comments ranging from December 2005 to March 2017. It was kindly aggregated by Reddit user /u/Stuck_In_the_Matrix and more information can be found in a Reddit post he wrote about it. We thank him deeply for his contribution.

[1] Davidson et al.(2017), Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the 11th International AAAI Conference on Web and Social Media.