Powered by GitBook

Preprocessing

Data cleaning

The gathered tweets might contain duplicates, as it is very likely that a tweet is retweeted by other users. So to avoid having same tweet, we eliminate the duplicates out of the gathered tweets from searchTweets.csv and store them in uniqueSearchTweets.csv.

Save the below script as remDupTweets.py in your project folder and Run the script

You will see the output saved as uniqueSearchTweets.csv inside the search-data folder.

'''
removing duplicate tweets from gathered
'''

import csv

alltweets = csv.reader(open("search-data/searchTweets.csv", 'rb'))

noDup = csv.writer(open("search-data/uniqueSearchTweets.csv", "wb")) # store unique tweets
# airwayTweet = csv.writer(open("tweets-data/airwayTweets.csv", "wb")) # store tweets by ariways

tweets = set()
i = 0;
for row in alltweets:
    i = i + 1
    # print i
    if row[2] not in tweets:
        t = row[2].lower()
        t = t.replace('\n', '')
        noDup.writerow([row[0], row[1], t])
        # print "writing row.."
        tweets.add( row[2] )

We will now find sentiment analysis for the preprocessed unique tweets.

Get started with Sentiment Analysis using TextBlob.

results matching ""

No results matching ""