Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TTF_GetScript to retreive the script to which unicode character belongs #312

Closed
wants to merge 1 commit into from

Conversation

j9hnny97
Copy link

At the moment SDL_ttf provides TTF_SetFontScriptName, a Harfbuzz related function that allows a script to be set (for a specific font) to be used for text shaping.

This is useful as long as the client knows the origin of its texts, allowing him to simply set the script relative to his language. If this isn't the case how would he know what to script to use? And what would happen if there are different characters in his text, each one requiring different scripts?

Harfbuzz already provides a way to obtain the best script to be used by each character. This PR exposes this functionally with TTF_GetScript, allowing the client to deal with this issue however he desires, without compromising the current library implementation

SDL_ttf.c Show resolved Hide resolved
SDL_ttf.c Outdated Show resolved Hide resolved
SDL_ttf.c Outdated Show resolved Hide resolved
@j9hnny97 j9hnny97 force-pushed the getscript branch 3 times, most recently from f9b636f to 624d5ac Compare May 27, 2024 13:08
SDL_ttf.c Outdated Show resolved Hide resolved
SDL_ttf.c Show resolved Hide resolved
SDL_ttf.h Outdated Show resolved Hide resolved
@sezero
Copy link
Contributor

sezero commented Jun 26, 2024

Better after the latest changes. I suggest the following to accomodate @slouken's arguments at #312 (comment): it makes it safer, just in case. Also changes some error returns to simply return TTF_SetError(), because TTF_SetError (which is SDL_SetError) already returns -1 (there are other places in existing code that needs a similar change, but that's out of the scope of this PR.)

diff --git a/SDL_ttf.c b/SDL_ttf.c
index d6f6301..54d3575 100644
--- a/SDL_ttf.c
+++ b/SDL_ttf.c
@@ -3143,25 +3143,25 @@ int TTF_SetFontScriptName(TTF_Font *font, const char *script)
 #endif
 }
 
-extern DECLSPEC int SDLCALL TTF_GetScript(Uint32 ch, char script[5])
+int TTF_GetScript(Uint32 ch, char *script, size_t bufsize)
 {
 #if TTF_USE_HARFBUZZ
 
     TTF_CHECK_POINTER(script, -1);
+    if (bufsize < 5) {
+        return TTF_SetError("Insufficient script buffer size");
+    }
 
     hb_buffer_t* hb_buffer = hb_buffer_create();
-
     if (hb_buffer == NULL) {
-        TTF_SetError("Cannot create harfbuzz buffer");
-        return -1;
+        return TTF_SetError("Cannot create harfbuzz buffer");
     }
 
     hb_unicode_funcs_t* hb_unicode_functions = hb_buffer_get_unicode_funcs(hb_buffer);
 
     if (hb_unicode_functions == NULL) {
-        TTF_SetError("Cannot get harfbuzz unicode funcs");
         hb_buffer_destroy(hb_buffer);
-        return -1;
+        return TTF_SetError("Cannot get harfbuzz unicode funcs");
     }
 
     hb_buffer_clear_contents(hb_buffer);
@@ -3179,8 +3179,7 @@ extern DECLSPEC int SDLCALL TTF_GetScript(Uint32 ch, char script[5])
     return 0;
 
 #else
-    TTF_SetError("Unsupported");
-    return -1;
+    return TTF_SetError("Unsupported");
 #endif
 }
 
diff --git a/SDL_ttf.h b/SDL_ttf.h
index 03b9045..aacb109 100644
--- a/SDL_ttf.h
+++ b/SDL_ttf.h
@@ -2333,11 +2333,12 @@ extern DECLSPEC int SDLCALL TTF_SetFontScriptName(TTF_Font *font, const char *sc
  * \param ch the character to check.
  * \param script on return, filled in with the the script as a null-terminated
  *               string of exactly 4 characters
+ * \param bufsize size of the script buffer, must be at least 5 (see above.)
  * \returns 0 on success, or -1 on error.
  *
- * \since This function is available since SDL_ttf ?.
+ * \since This function is available since SDL_ttf 2.24.0.
  */
-extern DECLSPEC int SDLCALL TTF_GetScript(Uint32 ch, char script[5]);
+extern DECLSPEC int SDLCALL TTF_GetScript(Uint32 ch, char *script, size_t bufsize);
 
 /* Ends C function definitions when using C++ */
 #ifdef __cplusplus

After you apply this, maintainers will review and decide, I guess.

@Magueija Magueija force-pushed the getscript branch 2 times, most recently from 35a716c to b164925 Compare June 28, 2024 10:37
@sezero
Copy link
Contributor

sezero commented Jul 1, 2024

This looks fair to me now. @1bsyl, @slouken: what do you think?

@1bsyl
Copy link
Contributor

1bsyl commented Jul 1, 2024

looks ok to me!
minor suggestion, since we have the function "TTF_SetFontScriptName()", maybe this should be called TTF_GetScriptName()

also what is the use case ? check the script of a string by the first char ?
If we want to validate a full string, we need a big loop, with per char, an hb_buffer_create/destroy().
Also usefull, could be an helper function to parse a long string, return an array a scripts and indexes of sub strings, for each string that has different scripts.
Or a function that get the script of a string and advance into the string as much as possible (behavior like strstr()).

@Magueija
Copy link

Magueija commented Jul 2, 2024

Hi @1bsyl,

The function has the name "TTF_GetScript()" since we also have the "TTF_SetScript()" and because it is not related with fonts as "TTF_SetFontScriptName()". However, if you still think we should add "Name" at the end, I am ok with it.

Regarding the use of it:
Each character has a defined script. 'A' belongs to the Latin script, while 'ب' belongs to the Arab script, but there are some characters that could be rendered with more than one script, it just depends on the context, for example, the application current language, and, on these cases, it should be decided by the application.
Actually there can be the case where the application could render the whole string just depending on the first character, as is recommended on the unicode.org. But this does not cover all cases, some other applications could want to render the string with multiple scripts (applying them for each chunk) because they have multi-language strings.

If we want to validate a full string, we need a big loop indeed, but if we leave it to the application, now it has the possibility to do some other things on this loop and avoids a loop on the SDL_TTF and another loop on the application.

Having an additional function like strstr() doesn't seem a bad ideia, but how do we decide what to do on the special cases? Which scripts should we choose?

See more: https://www.unicode.org/reports/tr24/

@1bsyl
Copy link
Contributor

1bsyl commented Jul 2, 2024

@Magueija Are you sure, I can't find TTF_SetScript ?!
otherwise I am ok with anything

@Magueija
Copy link

Magueija commented Jul 2, 2024

@1bsyl I just noticed it's deprecated and with HarfBuzz types, I will add the suffix.
https://wiki.libsdl.org/SDL2_ttf/TTF_SetScript
https://github.com/libsdl-org/SDL_ttf/blob/SDL2/SDL_ttf.h#L2281

slouken added a commit to slouken/SDL_ttf that referenced this pull request Sep 27, 2024
slouken added a commit to slouken/SDL_ttf that referenced this pull request Sep 27, 2024
@slouken slouken closed this in 6e4a977 Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants