Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add semicolon, change to uppercase #448

Merged
merged 5 commits into from
Oct 2, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 26 additions & 8 deletions lib/haskell/natural4/src/LS/XPile/LogicalEnglish/IdVars.hs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
{-# OPTIONS_GHC -W #-}

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE DuplicateRecordFields, RecordWildCards #-}
{-# LANGUAGE OverloadedStrings #-}
Expand All @@ -16,6 +18,7 @@ import Data.Coerce (coerce)
import Data.HashSet qualified as HS
import Data.Sequences (fromStrict, toStrict)
import Data.Text qualified as T
import Text.Regex.PCRE.Heavy qualified as PCRE
import Text.Replace (Replace (Replace), listToTrie, replaceWithTrie)

import LS.XPile.LogicalEnglish.Types
Expand Down Expand Up @@ -88,13 +91,13 @@ a config file that is kept in sync with the downstream stuff
(since have to do this kind of replacement in the converse direction when generating justification)
-}
replaceTxt :: T.Text -> T.Text
replaceTxt = toStrict . replaceWithTrie replacements . fromStrict
replaceTxt =
replacePeriod . toStrict . replaceWithTrie replacements . fromStrict
where
replacements =
listToTrie
[ Replace "," " comma",
Replace "." " dot ",
Replace "%" " percent"
[ Replace "," " COMMA",
Replace "%" " PERCENT"
{- ^ it's cleaner not to put a space after `percent`
because it's usually something like "100% blah blah" in the encoding
So if you add a space after, you end up getting "100 percent blah blah", which doesn't look as nice.
Expand All @@ -105,17 +108,32 @@ replaceTxt = toStrict . replaceWithTrie replacements . fromStrict
""

>>> replaceTxt ("100.5 * 2" :: T.Text)
"100 dot 5 * 2"
"100 DOT 5 * 2"

>>> replaceTxt "100% guarantee"
"100 percent guarantee"
"100 PERCENT guarantee"

>>> replaceTxt "rocks, stones, and trees"
"rocks comma stones comma and trees"
"rocks COMMA stones COMMA and trees"
-}
]


-- LE has no trouble parsing dots that appear in numbers, ie things like
-- "clause 2.1 applies" is fine.
-- However, dots used as a full-stop, as in "The car is blue." is not ok
-- and so that "." needs to be turned into "PERIOD".
replacePeriod =
PCRE.gsub
-- https://stackoverflow.com/a/45616898
[PCRE.re|[a-zA-z] + [^0-9\s.]+|\.(?!\d)|]
(" PERIOD " :: T.Text)

-- replaceHyphen =
-- PCRE.gsub
-- -- https://stackoverflow.com/a/31911114
-- [PCRE.re|(?=\S*[-])([a-zA-Z]+)\-([a-zA-Z]+)|]
-- \(s0:s1:_) -> mconcat [s0, " HYPHEN ", s1] :: T.Text

{- | Convert a SimplifiedL4 Cell to a VCell
The code for simplifying L4 AST has established these invariants:
* every IS NUM has had the IS removed, with the number converted to T.Text and wrapped in a MkCellIsNum
Expand Down
Loading