Skip to content

nutso/twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter ~ Playing with Python

Requirements:

  • Python
  • Scrapy
  • BeautifulSoup4

Usage: change to the user_history directory and run:

scrapy crawl user_history_spider -s LOG_FILE=scrapy.log -s LOG_LEVEL='INFO' -o scraped_data.json -a handle=BarackObama

(replace "BarackObama" with the Twitter user handle of interest)

Tweets are returned as best approximating the Twitter tweet format (ref: https://dev.twitter.com/overview/api/tweets) as possible.

Additional fields include:

  • created_at_ts: UTC timestamp corresponding to the created_at field
  • entities item contains pure array lists of the correpsonding entity, without additional information (i.e. tweet.entities.hashtags is an array of hashtag terms only)
  • date_scraped: date and time when the record was scraped ("now")
  • url: actual url from which the record was scraped

About

playing with python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages