sng files encoding

Let us know about any problems in SongBeamer
Post Reply
ycharles
Posts: 1
Joined: Sun Aug 03, 2014 8:42 pm

sng files encoding

Post by ycharles »

Good evening!

We are using SongBeamer to project translations of some songs. As we were collaborating to edit the sng files, we ran into encoding issues in the past (I don't know which version we used at that time, but now, we use v4.28a), as the sng typically contains lyrics in German (with special characters such as ß, ü...), English and French (also special characters such as œ, À, É...)

Now, it seems that we don't have problems anymore, provided that we edit sng files only with songbeamer, and not with another text editor.


I was thinking of writing some kind of script that can extract the song lyrics from the sng file with mixed languages (so it only reads from it), in order to generate for example separated booklets with lyrics in different languages.

My question is, which encoding standard does SongBeamer actually use for sng files? If I open the file in a text editor (e.g. Notepad++), it states that encoding is "ANSI", but if I am not mistaken, pure ANSI does not handle non-English characters.
Characters are correctly read though, and it even looks like SongBeamer can still read it if I modify it with Notepad++ (but for now I don't want to risk it for other files without more testing). But in any case, if I want to read from an sng file with an external script or custom program, it would help me a lot to know how it is encoded.

Thank you very much for your time and attention.
Best regards,

Yannick CHARLES
Sebastian
SongBeamer
Posts: 208
Joined: Sat Apr 22, 2006 10:58 pm

Post by Sebastian »

SongBeamer accepts 4 encodings for sng files: ANSI, UTF-8, Unicode, Big-Endian-Unicode.

When reading a *.sng file, SongBeamer takes a look at the byte order mark (first 2-3 characters of the sng file). This will determine the encoding that is used when reading the file. If there is no byte order mark, then SongBeamer assumes that the file is ANSI with codepage 1252. We don't use heuristics for detecting UTF-8 files as this could lead to unpredictable results.

The byte order mark for UTF-8 is $EF $BB $BF. So if your file starts with these 3 bytes and it is UTF-8 you'll be fine.
phill
Posts: 2
Joined: Sun Nov 05, 2017 7:15 pm

Post by phill »

Hi,

Can you confirm this is still the case, especially with the mac version?
Sebastian
SongBeamer
Posts: 208
Joined: Sat Apr 22, 2006 10:58 pm

Post by Sebastian »

That is how it should be according to the specification. If you see any different behaviour, please let us know.
phill
Posts: 2
Joined: Sun Nov 05, 2017 7:15 pm

Post by phill »

Thank you for your reply.
Sebastian wrote:I presume by Unicode & Big-Endian-Unicode you are talking about UTF-16, or perhaps UCS2? Rather than UTF-32?
Post Reply