Michael Jackson: The King of Pop at the Service of Wikispeedia

ADA 2023

Wikispeedia Dataset


Wikipedia has grown so large that users have discovered a way to pass the time by navigating from one article to another in a minimum of time. Of course, a lot of the articles are about people. Articles about all kinds of people, big and small, famous and not so famous, are available.

Browsing through celebrity profiles in the game, a pattern emerges, prompting deep questions about virtual representation and its impact on our understanding of the real world. This exploration leads us to think about cultural diversity and the way reality is reflected in the digital worlds created. Follow us on this captivating dive into the heart of Wikispeedia, where data reveals a very specific facet of our society. The distribution of ethnic groups in our society and the place of each of them in the overall general culture seems to be a real problem even if the new general thoughts tend to modernize and change this backward vision.


Ethnicity is a quality or fact of belonging to a population group or subgroup composed of people who share a common cultural origin or descent. Looking deeper into the data set, we categorized people based on their physical ethnicity, meaning we categorized them into broad groups of people with racial, national, tribal, religious, common linguistic or cultural.

Our detailed analysis led us to a very unequal distribution of each of the ethnic groups in the Wikispeedia game. We were struck by the difference in articles about celebrities of Caucasian origin compared to articles about celebrities of other ethnicities. This uneven distribution is shown below.


Distribution of people ethnicity within Wikispeedia

Count of the number of persons in each ethnics groups within the games

Furthermore, we noticed a particularity in our dataset : the king of POP, Michael Jackson, is not part of the game !

Data story issues

His (a little) megalomaniac character cannot prevent him from being concerned about his presence in a game where he can be the main actor in the players' victory. Let's help him know which ethnicity he should highlight in his Wikipedia page to become the main actor in the participants' victory. Let's start our analysis by focusing on the parts that were played by a celebrity. Then, to focus on MJ's desire to be a major asset in a victory, let's focus on the victorious games and a main problem arises from our first analysis:

Does the ethnicity of the targeted celebrity have an influence on the finality of the game ?

We first assessed the difference between the games played and the ethnicity of the targeted person. The majority of targeted celebrities are white. Indeed, as presented below, this distribution is clearly unequal. This makes us realize that players are more likely to target a white person than someone of another ethnicity. This is explained on the one hand by a general culture of society turned towards the West and therefore Caucasian people but also by the greater emphasis on white people in the Wikispeedia game.


Most target people with a victory pathway

This chart show all the people target more than 20 times in the victory games (the distribution of ethnicity is quite the same in the defeat games)

MJ offers us a naive analysis below, simply comparing victories and defeats for paths targeting white people and people of other ethnicities. Once again, this analysis shows us that there are many more paths targeting white people, but it does not allow us to draw any conclusions about the impact of ethnicity on the outcome of the game.


Distribution of victory and defeat depending on the ethnicity of the target

Result of the naïve analysis, considering only the source and the target of the pathway

It's more complicated than that Michael. We need to look deeper into the analysis to see more clearly. There are a lot of factor that are needed to be consider, the ethnicity of the target is not the only reason why people win or lose. We therefore opted for a precise analysis of the difficulty of winning a game according to the ethnicity of the targeted personality but also considering the difficulty of the pathway and the skills of the players.


Distribution of victory and defeat depending on the ethnicity of the target

Result of the analysis after matching the data on the propensity score of each pathway to fixed the confounders

Where could this bias come from ?

Maybe understanding what is the origin of the difference can help to choose. Is Wikispeedia is solely responsible for this bias or is there is sociological response to this phenomenon different analysis ? Maybe players are more attracted and succeed better when the article are positive. The sentiment analysis of the pages could help Michael Jackson to choose.


Average Sentiment Analysis of the People article

The average positive, negative and neutral sentiment extracted from the wikispeedia plain text for people in the same category

Thus, there is no difference in the sentiment analysis. Wikispeedia People pages are principally neutral. But maybe the bias has another origin. We didn’t considere the link in the previous analysis maybe the number of link in pages impact the games.

All the Wikispeedia game is based on the links, you click on link to go from one page to another while trying to get closer to the target page. The number of link going in and out of each page is a key characteristic impacting the result. To maximize the chance of getting to the target people, the target people link should appears in a maximum of others pages. So maybe bias could come from an unequal distribution of the people links in other pages. An quick analysis gives the following result.


Number of links toward People pages

For each people, we count the number of link in other pages that allow to goes toward him/her.

White people have more links in other pages that allows to go toward them than others ethnics people. Thus white people should be advantaged. Whereas we find earlier than there is more victory targeting others people groups.

"It doesn't matter if you're black or white"

- Michael Jackson


Surprisingly, it seems easier to win when the person targeted belongs to 'other ethnicities'. The origin of this bias is not due to Wikispeedia but more to general knowledge. Indeed, sentiment analysis shows no difference in the way pages are written, and the distribution of links leans towards inequality in favor of white people. Although the bias in the victory is in favor of other ethnicities, there is still an inequality in the representation of other ethnicities people within the game and in the number of links to access these pages. Moreover, the causal analysis should be considered carefully, as only one matching was made based on the propensity score. Some confounders are difficult to determine due to a lack of data, such as player level. So, if MJ wants to be seen more, it would seem that being affiliated with the 'white' ethnicity is the best choice. However, if he aims to help the player win he should choose to belong to another ethnicity.