splitcommand. I used a lot of code posted on the site Python-Graph-Gallery (❤️) to draw the plots.
This is the smallest dataset, Twitter has published 770 users potentially connected to Iranian propaganda (many information [userid, user_display_name, user_screen_name] is hashed because of a choice of Twitter).
By watching the plots, it is possible to see that the Iranian propaganda seems to have been very intense in the years 2017 and 2018, because over 50% of the users have been created in these two years. Most accounts have a description (and the word journalist is common in their descriptions). Over half of the users have less than 100 followers and follow less than 100 accounts.
INTERNET RESEARCH AGENCY DATASET
Twitter says IRA users are 3.841, almost five times Iranian users. In the file ira_users_csv_hashed.csv I count only 3.836 users (3.837 rows).
A difference with Iranian users is the creation date, IRA users were created mainly in 2013 and 2014, two years before the US elections. A curious fact is that only two new accounts were created in 2018. Also the IRA accounts have a description (73%), a number similar to the Iranian one.
These 3.836 users tweeted 9.007.377 (nine million) times from 2012 to 2018, in 2015 there were 3.132.628 (three million) tweets (45% are retweets), almost three times the number of Iranian tweets from 2012 to 2018.
There is another important difference between the two datasets: the Iranian propaganda machine is “classic”, the accounts increase in 2017 and 2018, the effects (interactions) grow in 2017 and 2018. The IRA accounts created from 2013 to 2014 are 2.639 (68%), the year 2015 is the one with the highest activity, more than three million tweets, but the years with more interactions are 2016 and 2017, in those years there were 30.059.381 (thirty million) hearts, 25.615.627 (twenty-five million) retweets and 1.962.259 (almost two million) replies to their tweets. A hypothesis about this is that the IRA propaganda machine has improved over the years: they can reach more people with fewer tweets. This is only an unconfirmed hypothesis, there may also be other causes. The most common language in the tweets is Russian, followed by English. The most used European languages are: German, Bulgarian, Italian and Spanish. Another difference with Iranian tweets is the plot of user-agents: the IRA accounts used many different applications to post their tweets. The most common user-agent is Twitter Web Client (28%), followed by twitterfeed (16%) and TweetDeck (6%).
MY TWO CENTS
In my opinion the two Twitter datasets are not so good, for two main reasons:
- Why are these users potential propaganda accounts? How did Twitter choose them? There are various reasons why Twitter may not give this information publicly, but this is a limitation for the analysis of the researchers.
- Most screen names are hashed and Twitter have not shared information about the followers, so it is not possible to create a network of these propaganda accounts.
Looking at the plots, from 2016 the favorites to the propaganda tweets increase a lot and they exceed the number of retweets. This fact happens both in the IRA propaganda and in the Iranian propaganda. There may be many reasons why this happens, but I find it curious how Twitter changed at the end of 2015: Twitter changed the “star” symbol to the “heart” symbol, saying that this had quickly increased the number of the users’ interactions.
About this topic, I suggest an article by Wired, “Twitter’s Dated Data Dump Doesn’t Tell Us About Future Meddling“, and an analysis by Ilja, “Election Hacking: Exploring 10 Million Tweets from the Russian Internet Research Agency Dataset, Pt. 1 – Apporaching 5.3GB in R“.
I hope to have time to write the part II: I would like to analyze the frequency of words in the descriptions and tweets, the most frequent hashtags and some other statistics.
These days in France there have been many manifestations of people in the yellow vest. Of course on Twitter it was born a hashtag to talk about this:
#GiletsJaunes. To better understand how propaganda can affect Twitter I suggest three tweets:
For 3 days, I’m capturing the English tweets with the #GiletsJaunes hashtag. Based on 34K tweets, I made this cool retweet interaction graph.
Let see what we can find 🔽🔽🔽 pic.twitter.com/2ValzktKoX
— Elliot Alderson (@fs0c131y) 7 dicembre 2018
— x0rz (@x0rz) 6 dicembre 2018
#GilletsJaunes Social Media Monitoring update: between 11/17 and 12/08 we’ve seen 6.505.389 tweets with @thefool_it.
Secondary HT are #Macron, #Paris, #24novembre, #1erDecembre.
More information on included images. RT welcome, pls ask for permission bf publication. pic.twitter.com/khdFgEGf9p
— Matteo G.P. Flora (@lastknight) 9 dicembre 2018