How to Scrape specific Tweets from Twitter using Twint?

In this tutorial, I'll share-

* Advantages of using Twint over official Twitter API

* How to download & install/execute Twint on Kali Linux OS

* How to scrape specific Tweets from Twitter using Twint

* Twint commands


Scrape Targeted Tweets using Twint

Disclaimer- This tutorial is strictly meant for educational purposes, please don't use this for illegal/unfair activities. This blog doesn't promote illegal activities. I'm not responsible for any of your actions. Think & act logically.


Advantages of using Twint-

  1. No Rate limitation (unlike Twitter API 3200 max)

  2. Fast Setup

  3. Anonymous Scraping, no Twitter Signup needed.

  4. Easy to use


In order to install Twint, open your terminal in your Kali Linux OS & type in the command:

sudo pip3 install twint

Then clone this GitHub repository, using the command below,

git clone https://github.com/twintproject/twint

After cloning the script from the GitHub, type in the commands

cd twint
sudo pip3 -r requirements.txt 

To download pipenv, you can use the command given below, most probably it'll be present in requirements.txt file you cloned from GitHub.

pip3 install pipenv

Create a virtual environment by typing in the command in terminal-

python -m venv venv

To activate virtual enviroment,

source/venv/bin/activate

Now, you have successfully setup the environment for Twint. Here are a few Twint commands that can help you to get started with your Twitter OSINT.

sudo twint -u username -o filename --csv

The above command is used to scrape tweets from username mentioned, the o command/keyword is used to specify the filename, followed by the format, in this case, we will be saving it in csv format to view the information in Excel sheet. The Twitter data information scraped by Twint includes- Twitter username, time, tweet date, tweet content, hashtags, like count, retweet count etc.

sudo twint -s "something" --until 01.02.2020

In the above command, -s is used to specify for the term you're searching for, it can be a keyword, personal details like email id or phone number or anything, the --until command is used to specify the date till which the Twitter data will be scraped.

For more commands, you can type in the below command in your terminal.

sudo twint -h 

TWINT - An Advanced Twitter Scraping Tool.



optional arguments:
  -h, --help            show this help message and exit
  -u USERNAME, --username USERNAME
                        User's Tweets you want to scrape.
  -s SEARCH, --search SEARCH
                        Search for Tweets containing this word or phrase.
  -g GEO, --geo GEO     Search for geocoded Tweets.
  --near NEAR           Near a specified city.
  --location            Show user's location (Experimental).
  -l LANG, --lang LANG  Search for Tweets in a specific language.
  -o OUTPUT, --output OUTPUT
                        Save output to a file.
  -es ELASTICSEARCH, --elasticsearch ELASTICSEARCH
                        Index to Elasticsearch.
  --year YEAR           Filter Tweets before specified year.
  --since DATE          Filter Tweets sent since date (Example: "2017-12-27
                        20:30:15" or 2017-12-27).
  --until DATE          Filter Tweets sent until date (Example: "2017-12-27
                        20:30:15" or 2017-12-27).
  --email               Filter Tweets that might have email addresses
  --phone               Filter Tweets that might have phone numbers
  --verified            Display Tweets only from verified users (Use with -s).
  --csv                 Write as .csv file.
  --json                Write as .json file
  --hashtags            Output hashtags in seperate column.
  --cashtags            Output cashtags in seperate column.
  --userid USERID       Twitter user id.
  --limit LIMIT         Number of Tweets to pull (Increments of 20).
  --count               Display number of Tweets scraped at the end of session.
  --stats               Show number of replies, retweets, and likes.
  -db DATABASE, --database DATABASE
                        Store Tweets in a sqlite3 database.
  --to USERNAME         Search Tweets to a user.
  --all USERNAME        Search all Tweets associated with a user.
  --followers           Scrape a person's followers.
  --following           Scrape a person's follows
  --favorites           Scrape Tweets a user has liked.
  --proxy-type PROXY_TYPE
                        Socks5, HTTP, etc.
  --proxy-host PROXY_HOST
                        Proxy hostname or IP.
  --proxy-port PROXY_PORT
                        The port of the proxy server.
  --essid [ESSID]       Elasticsearch Session ID, use this to differentiate
                        scraping sessions.
  --userlist USERLIST   Userlist from list or file.
  --retweets            Include user's Retweets (Warning: limited).
  --format FORMAT       Custom output format (See wiki for details).
  --user-full           Collect all user information (Use with followers or
                        following only).
  --profile-full        Slow, but effective method of collecting a user's Tweets
                        and RT.
  --translate           Get tweets translated by Google Translate.
  --translate-dest TRANSLATE_DEST
                        Translate tweet to language (ISO2).
  --store-pandas STORE_PANDAS
                        Save Tweets in a DataFrame (Pandas) file.
  --pandas-type [PANDAS_TYPE]
                        Specify HDF5 or Pickle (HDF5 as default)
  -it [INDEX_TWEETS], --index-tweets [INDEX_TWEETS]
                        Custom Elasticsearch Index name for Tweets.
  -if [INDEX_FOLLOW], --index-follow [INDEX_FOLLOW]
                        Custom Elasticsearch Index name for Follows.
  -iu [INDEX_USERS], --index-users [INDEX_USERS]
                        Custom Elasticsearch Index name for Users.
  --debug               Store information in debug logs
  --resume TWEET_ID     Resume from Tweet ID.
  --videos              Display only Tweets with videos.
  --images              Display only Tweets with images.
  --media               Display Tweets with only images or videos.
  --replies             Display replies to a subject.
  -pc PANDAS_CLEAN, --pandas-clean PANDAS_CLEAN
                        Automatically clean Pandas dataframe at every scrape.
  -cq CUSTOM_QUERY, --custom-query CUSTOM_QUERY
                        Custom search query.
  -pt, --popular-tweets
                        Scrape popular tweets instead of recent ones.
  -sc, --skip-certs     Skip certs verification, useful for SSC.
  -ho, --hide-output    Hide output, no tweets will be displayed.
  -nr, --native-retweets
                        Filter the results for retweets only.
  --min-likes MIN_LIKES
                        Filter the tweets by minimum number of likes.
  --min-retweets MIN_RETWEETS
                        Filter the tweets by minimum number of retweets.
  --min-replies MIN_REPLIES
                        Filter the tweets by minimum number of replies.
  --links LINKS         Include or exclude tweets containing one o more links. If
                        not specified you will get both tweets that might contain
                        links or not.
  --source SOURCE       Filter the tweets for specific source client.
  --members-list MEMBERS_LIST
                        Filter the tweets sent by users in a given list.
  -fr, --filter-retweets
                        Exclude retweets from the results.
  --backoff-exponent BACKOFF_EXPONENT
                        Specify a exponent for the polynomial backoff in case of
                        errors.
  --min-wait-time MIN_WAIT_TIME
                        specifiy a minimum wait time in case of scraping limit
                        error. This value will be adjusted by twint if the value
                        provided does not satisfy the limits constraints

Using Twint, you can scrape & mine some precise information about the victim/target-

  1. Personal Details like emailid, phone number shared by the target on Twitter.

  2. Connections of the target.

  3. Photos of workplace/home.

  4. Hashtags

  5. Retweets

  6. Specific Keywords

  7. Investigation & Data analysis purposes

  8. Travel Records, Upcoming events etc.

  9. Medical Record

  10. Political records


twint -s "corona" --verified

Using this command you can scrape tweets from all the verified accounts on Twitter who tweeted about "corona", so "corona" is the keyword here, & we have also specified the script to scrape tools from verified accounts by using the command "--verified".


Twint Twitter scraping demo


Thanks for reading this blog post! Have a nice day.


© 2020 BY ANUKIRAN GHOSH

  • Facebook Basic Black
  • Twitter Basic Black
  • Black Instagram Icon