Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines lead to ugly output #36

Open
mirabilos opened this issue Nov 6, 2022 · 3 comments
Open

Newlines lead to ugly output #36

mirabilos opened this issue Nov 6, 2022 · 3 comments

Comments

@mirabilos
Copy link

See https://toot.mirbsd.org/@mirabilos/statuses/01GH65F7V9ZK7KQ6YRG3PF0AR2

  • newlines are converted to hard linebreaks
  • some spaces are lost (jschauma!Consequently)

Input feed: http://www.mirbsd.org/wlog.rss

@mirabilos
Copy link
Author

Possible fix follows, although this is

diff --git a/feediverse.py b/feediverse.py
index cee0078..7161182 100755
--- a/feediverse.py
+++ b/feediverse.py
@@ -11,6 +11,8 @@ import feedparser
 from bs4 import BeautifulSoup
 from mastodon import Mastodon
 from datetime import datetime, timezone, MINYEAR
+# with https://github.com/matthewwithanm/python-markdownify/issues/82 applied
+from markdownify.markdownify import markdownify
 
 DEFAULT_CONFIG_FILE = os.path.join("~", ".feediverse")
 
@@ -37,6 +39,7 @@ def main():
     config = read_config(config_file)
 
     masto = Mastodon(
+        version_check_mode="none",
         api_base_url=config['url'],
         client_id=config['client_id'],
         client_secret=config['client_secret'],
@@ -50,11 +53,15 @@ def main():
         for entry in get_feed(feed['url'], config['updated']):
             newest_post = max(newest_post, entry['updated'])
             if args.verbose:
-                print(entry)
+                print("‣‣‣ entry {{{", entry, "}}}")
+            postbody = feed['template'].format(**entry)
             if args.dry_run:
-                print("trial run, not tooting ", entry["title"][:50])
+                print("trial run, not tooting {{{", postbody, "}}}")
                 continue
-            masto.status_post(feed['template'].format(**entry)[:499])
+            if len(postbody) > 500:
+                postfix = "…\n\n(more…)"
+                postbody = postbody[:(500 - len(postfix))] + postfix
+            masto.status_post(postbody, visibility='public')
 
     if not args.dry_run:
         config['updated'] = newest_post.isoformat()
@@ -83,7 +90,7 @@ def get_entry(entry):
     url = entry.id
     return {
         'url': url,
-        'link': entry.link,
+        'link': entry.get('link', ''),
         'title': cleanup(entry.title),
         'summary': cleanup(summary),
         'content': content,
@@ -92,6 +99,15 @@ def get_entry(entry):
     }
 
 def cleanup(text):
+    text = re.sub('\r+\n?', '\n', text)
+    text = re.sub(' *\n *', '\n', text)
+    text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
+    text = re.sub('\n+ *<', ' <', text)
+    text = markdownify(text)
+    text = re.sub('  \n  \n', '\n\n', text)
+    text = re.sub(' *\n\n+', '\n\n', text)
+    return text
+    # old HTML to plaintext output:
     html = BeautifulSoup(text, 'html.parser')
     text = html.get_text()
     text = re.sub('\xa0+', ' ', text)
diff --git a/markdownify b/markdownify
new file mode 120000
index 0000000..deec112
--- /dev/null
+++ b/markdownify
@@ -0,0 +1 @@
+../python-markdownify
\ No newline at end of file

@mirabilos
Copy link
Author

mirabilos commented Jan 28, 2023 via email

@mirabilos
Copy link
Author

followup fix:

  • handle embedded single newlines better (translate to space everywhere, not just before tag)
  • drop inline images, they won’t work in a fediverse status anyway
--- a/feediverse.py
+++ b/feediverse.py
@@ -101,9 +101,10 @@ def get_entry(entry):
 def cleanup(text):
     text = re.sub('\r+\n?', '\n', text)
     text = re.sub(' *\n *', '\n', text)
-    text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
-    text = re.sub('\n+ *<', ' <', text)
-    text = markdownify(text)
+    text = text.replace('\n', '\1')
+    text = re.sub('\1\1\1+', '\n\n', text)
+    text = re.sub('\1+ *', ' ', text).strip()
+    text = markdownify(text, strip=['img']).strip()
     text = re.sub('  \n  \n', '\n\n', text)
     text = re.sub(' *\n\n+', '\n\n', text)
     return text

on top of the previous large diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant