Skip to content

Automation tool for danbinews members to upload 'ddaddasi' articles.

Notifications You must be signed in to change notification settings

danbi2990/ddaddasi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Warning: This program is only for the members of danbinews

What is it?

It helps the members to upload 'ddaddasi' to the website.
'ddaddasi' roughly means 'news terminology' in Korean. (따끈따끈 시사용어)

How to build

How it works

  • Assume you want to find info about 'IMF' (International Monetary Fund)
  • The program crawls the newspaper websites and scraps articles including the keyword
  • Here're the websites it crawls
    • Khan(경향), Hani(한겨래), ChoSun(조선), Joins(중앙), DongA(동아), Seoul(서울), NewSis(뉴시스), HanKyung(한국경제), MaeKyung(매일경제), Herald(헤럴드), NewsOne(뉴스원), MunHwa(문화일보)
  • When the user clicks 'Upload', the inputs(Title, Definition, Explanation, Links to article) are wrapped with html tags and uploaded to the website.

Branch multiprocessing

In most cases, it works well with plain requests module. However, some websites must be crawled by phantomjs. (HanKook(한국), KookMin(국민), YonHap(연합), SeoKyung(서울경제))

The problem is that phantomjs slows down the program. That's why the functionality removed from master branch.

To mitigate it, the branch multiprocessing forks a child process on starting and the child creates 4 threads of phantoms. while a user filling inputs, the phantoms will be initialised and ready.

Images

Startup:
First

On Search:
On Search

Before Upload:
Imgur

About

Automation tool for danbinews members to upload 'ddaddasi' articles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages