Skip to content

Simple library to count syllables in a word, based on the CMU Pronouncing Dictionary

Notifications You must be signed in to change notification settings

anson-vandoren/syllabifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Syllabifier

Count the number of syllables in arbitrary English words

Adapted from this repository with some major changes:

  • Ported to Python 3 and rewrote much of the code in a more Pythonic style
  • Made a few relatively minor corrections to the syllabification rules, following "English Words: A Linguistic Introduction" by Heidi Harley (Blackwell Publishing).
  • Removed ambisyllabicity rules for onset and coda

Please see Anthony Evans' README file for a detailed background to the project.

Set up

  • Requires Python 3
  • Clone this repo. No further installation required.

Usage

One word at a time:

python3 syllable3.py linguistics

Or several (space-separated):

python3 syllable3.py colourless green ideas

If using as a library, and you just need the syllable count of a word, use the num_syllables(word: str) function instead.

Output

If the input word is found in the dictionary, a phonemic, syllabified transcript is returned. For example, for the word linguistics:

linguistics: 3 syllables: <o:L|n:IH|c:NG> <o:GW|n:IH|c:None> <o:ST|n:IH|c:KS>

Each syllable is made up of an 'o' onset, 'n' nucleus, and 'c' coda. Phonemes capitalized in ARPAbet format. In line with phonological theory, the nucleus must have content, whereas the onset and coda may be empty.

CMU Pronouncing Dictionary

Syllabify depends on the CMU Pronouncing Dictionary of North American English word pronunciations. Version 0.7b was the current one at time of writing, but it throws a UnicodeDecodeError, so we're still using version 0.7a (amended to remove erroneous 'G' from SUGGEST and related words). Please see the dictionary download website to obtain the current version, add the cmudict-N.nx(.phones|.symbols)* files to the CMU_dictionary directory, remove the '.txt' suffixes, and update the line VERSION = 'cmudict-n.nx' in cmuparser3.py

About

Simple library to count syllables in a word, based on the CMU Pronouncing Dictionary

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages