ecfs-python

A Python library that scrapes fcc.gov to retrieve rulemaking information from the Electronic Comment Filing System.

Usage

from ecfs import FccProceeding

proceeding = FccProceeding(docket_number="14-28")
proceeding.get_comment_urls()
"""
[
	{
        "page_url": "http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=14-28&pageSize=100&pageNumber=1",
        "comment_urls": [
        	"http://apps.fcc.gov/ecfs/comment/view?id=6017609065",
        	"...",
        ],
        "page_number": "1"
    },
    {
    	...
	},
]
http://apps.fcc.gov/ecfs/comment/view?id=6017609065
"""

TODO:

package for pip
implement get_comment_data_for_comment_url. This would return information about the filing, such as date filed, name of filer, and an option to make an additional HTTP request to get the full text.
tests (maybe)
filter by date. This would allow you to scrape once, save the data (e.g. as json somewhere), then each day thereafter, incrementally retreive only new filings. You could then append the new filing data to the json file.

How

There's also a Ruby cousin to this: http://github.com/adelevie/ecfs. The Ruby version takes a somewhat different approach, however. The Ruby gem downloads an Excel file that fcc.gov allows you to export. The gem then parses rows of the file to get filing information. This works really well (fast and accurate) until result sets exceed 100,000 filings. This Python library is an attempt at building a somewhat slower, ~~but more dependable FCC ECFS scraper~~*. The library eschews the spreadsheet approach, and instead visits each web page containing filing results. This means sending an HTTP request to fcc.gov for every 100 filings (the maximum displayed per page). Please don't slam fcc.gov with an irrationally large number of requests. For your convenience, I added an optional sleep parameter to FccProceeding.__init__(). It's the number of seconds the script will wait between HTTP requests to fcc.gov.

*Limitations

Apparently, the same 10,000 document limit that is applied to spreadsheets also applies to the collection size of paginated result sets. Translation: this library suffers from the same limitations as its Ruby cousing, but takes many more HTTP requests to achieve the same results. For example, visit http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=14-28&perPage=100&pageNumber=100. You'll notice a message indicating that there are more than 10,000 records in the the 14-28 docket. However, the pagination stops at page 100 (with 100 documents per page => 10,000 total docs).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
__init__.py		__init__.py
ecfs.py		ecfs.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ecfs-python

Usage

TODO:

How

*Limitations

License

About

Releases

Packages

Languages

adelevie/ecfs-python

Folders and files

Latest commit

History

Repository files navigation

ecfs-python

Usage

TODO:

How

*Limitations

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages