About Iran and IRA Twitter datasets (for fun) – Part III

This is the third and last post about the Internet Research Agency dataset, which was shared by Twitter in October 2018. In Part II I have focused on the European situation – especially in Germany, Italy, France and Spain – to understand if the Russian government might have tried to spread disinformation as it did in the US. In this post I want to focus on Italy and answer to the question: has the Internet Research Agency tried to manipulate information in Italy?

In Part I, I have showed that English and German are respectively the first and the third most used language in the dataset. Therefore, it is convenient to take a look at some plots about English and German tweets, since they can help to better understand the IRA schemes and whether they are also applied to the Italian situation.

United States

The first plot shows the trend of tweets (light blue line) and retweets (green line) written in English. The first tweets in English are dated 2011, but the largest volume is between July 2014 and April 2017.

English tweets and retweets trend

It is possible to see a huge change in the Internet Research Agency’s strategy during this period: between July 2014 and May 2016 the volume of tweets is greater than the volume of retweets - especially between July 2014 and July 2015 - however, between June 2016 and April 2017, the situation is the opposite. In September 2017 the volume of tweets is almost zero. This graph shows that the IRA propaganda started two years before the 2016 US elections, and at that time the accounts created original tweets and media. However, during the last months of the election campaign, there was a change in the propaganda strategy: the accounts retweeted more than they tweeted, maybe to endorse real people’s viral tweets. This plot provides further information on the strategy used by IRA.

Creation date of accounts plot

The creation dates of the accounts who have tweeted in English at least once are shown in this plot. The red lines represent the accounts who also have set ‘EN’ (English) as profile language. Between June 2013 and September 2014 a large group of English accounts was created, almost one year before the start of the propaganda, which suggests that the disinformation operation may have been planned months, perhaps years, in advance. These results agree with an awesome analysis published by Symantec.

Germany

In Part II, I have already written about the IRA propaganda in Germany, but these plots offer an insight into the volume of the tweets involved and the period in which they were written. The first plot is about tweets and retweets trend: it shows how the volume of tweets written in German has quickly grown during 2017, with a maximum in September.

German tweets and retweets trend

Why this spike? Federal elections were held in Germany in September 2017. The period between April 2016 and October 2016 is peculiar: during this time, the well-known account @erdollum was largely active and wrote many tweets in German, some of which were published on German online newspapers. I’m not quite sure about the reasons which led to the 2016 propaganda activity, but I have two main theories. Either the Internet Research Agency wanted to spread disinformation to polarize the public opinion on the Brexit referendum, or it wanted to make the German accounts realistic and trusted to use them again in the future.

Creation date of accounts plot [de]

This graph shows that many accounts (red columns) were created in October 2015 and July 2016, and a little set of accounts was created between January and April 2017. Propaganda in the United States and in Germany has some differences (e.g. the American trolls are more than the German ones, the propaganda in the US bagan earlier and is more organized than that in Germany). I don’t know the reasons, but:

The US elections were the primary target of the Russian trolls
English is more common than German
Germany has fewer voters

These facts - and maybe others - may have affected the disinformation operation in Germany.

Italy

Now I will try to answer to the previous question. Italy is a unique case in this dataset because there are few tweets written in Italian by the Russian banned users, on the other hand there are many retweets from Italian accounts, mainly Italian media and journalists. The tweets in Italy are almost zero, but there is an anomalous spike of Italian retweets between March and April 2017, circa 13.000 retweets.

Italian tweets and retweets trend

In the plot is also shown a little light blue spike in August 2015, but we are dealing with a false positive: the tweets are not written in Italian and Twitter has made a mistake in the dataset. There are some errors in the language classification, but they are basically irrilevant. I would also like to point out the number of the active “Italian” accounts and their creation date.

Creation date of accounts plot [it]

All these retweets were made by only nine accounts, seven of which have ‘it’ (Italian) as profile language and two have ‘en’ (English). These nine accounts were created on the same date – 6th March 2017 – and they started to retweet right away. These accounts were certainly coordinated and – probably – automated or semi-automated, it is possible to find a pattern both in the daily rhytm and in the growth of the tweets volume.

1. Tweets volume

The nine accounts are divided into two sets: those that were only active in March and those that were also active in other months.

It is easy to identify the two sets and the two patterns. Four accounts were active only in March and their hashes are:

2f929662caeaa8ac1405b9eecabcae76698e805b757bec9cf0358f52f962373b
7a746a6b77f61084a3ee732e71a3aa436e33f463825829c7286a5203ddcd6e4f
8b3aa79f9434a59439dbfbf33093f1c5ef41686e4f8c75b361338a8399a6d6cf
372658464fd2be7d3c2c220eded91605f498483c984455ab63cd4ae74dd18895

The other five accounts were active also in other months, they have the same pattern between March and May. Their hashes are:

378410ae3ae135e4f2271e4cad9190bae2d5b029c5888114a7b0088342751a90
7279bc41cba54b4b5920031ecab92f4e1b806cd82980fc453d1ba97acbc036e3
15db31e780e1d2df7f5d8ad7ea0d3cacfde539ef93fe71572696aabc4b19ed36
8cf62b0d819688e71454a1997832b5e9c720bfb5a0cae070be1f2c0eb8c13792
f82084575688f694cdf1328a7a28bc453f3b9d0c52029ac28d55ab535b1ebb5f

These two groups of accounts did not make the same number of retweets, but they did follow the same pattern, therefore I think that they are a network (or a sub-network of the Russian trolls).

2. Daily rhytm

In the daily rhytm of these users there is also a pattern.

Eight accounts are more active on Tuesday between 3 p.m. and 5 p.m. and Wednesday between 8 a.m. and 10 a.m.. Thus, I think these accounts were all automated. Only one of the users does not follow the pattern. Their hash is: 372658464fd2be7d3c2c220eded91605f498483c984455ab63cd4ae74dd18895.

This account retweeted many Italian tweets, their bio was “Diretor e Professor Universitário. Cruzeirense apaixonado”, so the account was – probably – South American, but the profile language was English. A possible reason for these discrepancies can be the choice to have “weak” patterns, in order to make the network harder to find (this is just an unconfirmed hypothesis).

There are two other reasons why I think that these accounts were automated. The first one is the user agent – or Twitter client – used by the nine accounts: all of them have only used Twitter Web Client to publish tweets and retweets. The second one is about 8th March and the hashtag #MakeHerSmile: eight accounts wrote only one tweet each, one account wrote two tweets. These tweets were published in the date 8th March 2017, and they all had the hashtag #MakeHerSmile and concerned the International Woman Day. You can download the CSV file about these ten tweets here.

In conclusion, I have also reversed the hashes, and many of these accounts are also in the FiveThirtyEight dataset. The nicknames are: @rossirossivin, @AnnaRoman0, @Giovanna__Moret, @1lorenafava1, @GattiSilgatti, @VittoreGuidi, @sergio_maestri, @MariaLuigi5, @frannervia.

All the activities of these accounts stopped in 2017, so there is no correlation between them and the hashtag #MattarellaDimettiti.

My Two Cents

In conclusion - in my own opinion - the answer to the initial question is “No, but…”. Italy does not appear to have been targeted by Russia, there were too few accounts who shared Italian news and they wrote no original tweets. The Italian news shared by the trolls is various, so either there was no strategy, or Russian trolls did not know Italian as they knew English and German. Hence I think Russian propaganda did not target Italy, but there was a network - nine coordinated accounts - that shared Italian news for reasons unknown. In general, the Internet Research Agency dataset shared by Twitter is very useful to understand how a government disinformation campaign works, in my opinion the strategies used by the Internet Research Agency could work in the present day. I hope I am able to share my code soon, in my Github repository My-Twitter-World.

And that’s all! So long, and thanks for all the fish 🐬

United States#

Germany#

Italy#

1. Tweets volume#

2. Daily rhytm#

My Two Cents#