![]() ![]() It does not know many codecs though and it only examines the first few kB of a file, assuming that the rest will not contain any new characters. However, let's get back from explaining what you can't do to what you actually can do:įor a basic check on ASCII / non-ASCII (normally UTF-8) text files, you can use the file command. That means for example a text saved as UTF-8 that only contains simple latin characters, it would be identical to the same file saved as ASCII. the ASCII encoding is a part of most commonly used codecs like some of the ANSI family or UTF-8. You must also know that some character sets are actually subsets of others, like e.g. ![]() The computer can't really detect which way to interpret the byte results in correctly human readable text (unless maybe if you add a dictionary for all kinds of languages and let it perform spell checks.). For example, an ä in one encoding might correspond to é in another or ø in a third. The problem is that many codecs are similar and have the same "valid byte patterns", just interpreting them as different characters. If you find any bytes that are not valid for a given encoding, it must be something else. What you can easily do though is to verify whether the complete file can be successfully decoded somehow (but not necessarily correctly) using a specific codec. You can not really automatically find out whether a file was written with encoding X originally. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |