Adding support for WebVTT (Web Video Text Tracks) (.vtt) format #337

dsavinov-actionengine · 2024-05-17T18:18:54Z

Problem and/or solution

Adding parsing and compiling for WebVTT (Web Video Text Tracks) (.vtt) format

How to test

1. Running unit-tests

/openformats/tests/formats/vtt/test_vtt.py contains tests for vtt
Use pytest openformats/tests/formats/vtt/test_vtt.py to run tests

2. Through testbed

Use "VTT" handler in the testbed

Reviewer checklist

Code:

Change is covered by unit-tests
Code is well documented, well styled and is following best practices
Performance issues have been taken under consideration
Errors and other edge-cases are handled properly

PR:

Problem and/or solution are well-explained
Commits have been squashed so that each one has a clear purpose
Commits have a proper commit message according to TEM

dsavinov-actionengine · 2024-05-17T18:24:38Z

@kbairak please review

kbairak

Overall it looks good. I will take a more thorough look and invoke it through a debugger. I have a feeling some things can be simplified but I will have to experiment a bit. For the time being, just the one comment.

kbairak · 2024-05-23T06:39:44Z

openformats/formats/vtt.py

+        string = OpenString(timings, string_to_translate,
+                            occurrences=f"{start},{end}")


It's good practice to include an order with each OpenString. This is somewhat easy to do:

from itertools import count order = count() for ... in ...: string = OpenString(..., order=next(order))

This will ensure each string will get an auto-incrementing value for order.

kbairak

Great work! I think that only the first comment requires a change (the one with the multiple occurrences of -->). The rest are optional.

kbairak · 2024-05-30T06:31:00Z

openformats/formats/vtt.py

+            str = src_strings[i];
+            if "-->" in str:
+                timings = str
+                timings_index = i


Minor because it is not going to affect performance by a lot, but we could break here.

Actually, we should break because the --> part could be in an actual subtitle and we want to consider the first occurrence as the timing.

kbairak · 2024-05-30T06:35:59Z

openformats/formats/vtt.py

+            offset += 1
+        return offset, string
+
+    def _format_timing(self, timing):


This is part of the standard library. It might be more well-suited. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

This function is written based on a similar function in SRT handler. It takes a string as input and returns a string too. The built-in strftime()/strptime() are less convenient here. Let's leave this function as it is?

kbairak · 2024-05-30T06:38:11Z

openformats/formats/vtt.py

+        transcriber = Transcriber(template)
+        template = transcriber.source
+        stringset = iter(stringset)
+        string = next(stringset)


There is a small probability (and if we're honest, transifex will probably stop the compilation process before the interpreter gets here) that this will raise a StopIteration. Maybe a try/except that raises a ParseError("stringset cannot be empty") would fit here.

kbairak · 2024-05-30T06:44:53Z

openformats/formats/vtt.py

+            hash_position = -1
+            if subtitle_section.count('-->') > 0:
+                arrow_pos = subtitle_section.index('-->')
+                end_of_timings = subtitle_section.index('\n', arrow_pos + len('-->'))
+                hash_position = end_of_timings + 1


How about:

Suggested change

hash_position = -1

if subtitle_section.count('-->') > 0:

arrow_pos = subtitle_section.index('-->')

end_of_timings = subtitle_section.index('\n', arrow_pos + len('-->'))

hash_position = end_of_timings + 1

try:

arrow_pos = subtitle_section.index('-->')

except ValueError:

hash_position = -1

else:

end_of_timings = subtitle_section.index('\n', arrow_pos + len('-->'))

hash_position = end_of_timings + 1

I know mine is longer and this is a styling preference so feel free to ignore but I feel it is more pythonic.

In current PR code, .index('-->') cannot produce an exception because it is under condition (.count('-->') > 0) in the previous line.
But .index('\n', arrow_pos + len('-->')) (in both current code and in your suggestion) can give an exception.
So in the new commit there is a rework.

kbairak · 2024-05-30T06:45:07Z

openformats/formats/vtt.py

+            hash_position = -1
+            if subtitle_section.count('-->') > 0:
+                arrow_pos = subtitle_section.index('-->')
+                end_of_timings = subtitle_section.index('\n', arrow_pos + len('-->'))


Could this line raise a ValueError?

In theory, it could.
Absence of '\n' after timing means that subtitle text is missing. In such case, parser shall raise an exception earlier (function _parse_section(), line 104). Nevertheless, new commit adds ValueError handling here in compile() function, too.

kbairak

Great work!

Adding support for VTT format

045dc3f

kbairak reviewed May 23, 2024

View reviewed changes

Create OpenString order parameter

f3ab507

dsavinov-actionengine requested a review from kbairak May 27, 2024 10:10

kbairak requested changes May 30, 2024

View reviewed changes

fixed review comments

3fdc783

dsavinov-actionengine requested a review from kbairak June 3, 2024 11:46

kbairak approved these changes Jun 10, 2024

View reviewed changes

kbairak merged commit f286be5 into transifex:devel Jun 11, 2024
2 checks passed

dsavinov-actionengine deleted the support_vtt branch June 24, 2024 10:26

txsentinel mentioned this pull request Jun 25, 2024

Release: 0.0.123 #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for WebVTT (Web Video Text Tracks) (.vtt) format #337

Adding support for WebVTT (Web Video Text Tracks) (.vtt) format #337

dsavinov-actionengine commented May 17, 2024

dsavinov-actionengine commented May 17, 2024

kbairak left a comment

kbairak May 23, 2024

dsavinov-actionengine May 24, 2024

kbairak left a comment

kbairak May 30, 2024

dsavinov-actionengine Jun 3, 2024

kbairak May 30, 2024

dsavinov-actionengine Jun 3, 2024

kbairak May 30, 2024

dsavinov-actionengine Jun 3, 2024

kbairak May 30, 2024

dsavinov-actionengine Jun 3, 2024

kbairak May 30, 2024

dsavinov-actionengine Jun 3, 2024

kbairak left a comment

		string = OpenString(timings, string_to_translate,
		occurrences=f"{start},{end}")

Adding support for WebVTT (Web Video Text Tracks) (.vtt) format #337

Adding support for WebVTT (Web Video Text Tracks) (.vtt) format #337

Conversation

dsavinov-actionengine commented May 17, 2024

Problem and/or solution

How to test

1. Running unit-tests

2. Through testbed

Reviewer checklist

dsavinov-actionengine commented May 17, 2024

kbairak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbairak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbairak left a comment

Choose a reason for hiding this comment