Twitter-Healthcare

tweepy_healthcare_finder.py

1.) Streams twitter data

2.) Filters the data based on the keywords in the first column of DiseaseHashtags.csv

3.) Organizes the twitter data into the following file structure:

    ## Data structure for tweets to pass into mongo collection
    
    my_data = {
        'id': decoded['id'],
        'text': decoded['text'],
        'place': {'country': country,
                  'full_name': full_name},
        'user': {'screen_name': decoded['user']['screen_name'],
                 'location': decoded['user']['location']},
        'entities': {'hashtags': hashes}
    }

4.) If a mongod.exe mongoDB instance is open streams the each tweet record into a collection called 'twitter_healthcare' in a database called 'twitter'

mongo_search_healthcare.py

Some basic pymongo search queries on the data being streamed into the twitter.twitter_healthcare collection.

These queries can be dynamically run on the database collection as it is being streamed in.

1.) location_pipeline: creates a query to find the top locations of tweets in the dataset.

2.) hashtag_pipeline: creates a query to find the top hashtags in the dataset.

3.) project_matches_pipeline: creates a query to return the tweets from a specific location.

4.) aggregate: runs the created queries.

DiseaseHashtags.csv

csv file containing the sets of keywords to filter tweets by - first column

fields.txt

List of field names to export mongo collection to csv.

mongoexport create csv.txt

Command line argument to export mongo collection as csv with data/fields names specified in fields.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter-Healthcare

tweepy_healthcare_finder.py

mongo_search_healthcare.py

DiseaseHashtags.csv

fields.txt

mongoexport create csv.txt

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
DiseaseHashtags.csv		DiseaseHashtags.csv
README.md		README.md
fields.txt		fields.txt
mongo_search_healthcare.py		mongo_search_healthcare.py
mongoexport create csv.txt		mongoexport create csv.txt
tweepy_healthcare_finder.py		tweepy_healthcare_finder.py

FCH808/Twitter-Healthcare

Folders and files

Latest commit

History

Repository files navigation

Twitter-Healthcare

tweepy_healthcare_finder.py

mongo_search_healthcare.py

DiseaseHashtags.csv

fields.txt

mongoexport create csv.txt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages