Converting text files with french character encoding to utf8
I have some text files from MSDOS written in the 90s before UTF-8 was common and I don't remember which character encoding was used.
When I open a file, it looks like this:
{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation) }
{ Unit<82> FIGURES }
{ Copyright (c) 1989 Borland International, Inc. }
{**************************************************************************}
The <82> should be é. I used the file command to detect the encoding:
file -bi PASCAL/FIGURES.PAS
application/octet-stream; charset=binary
Not so helpful, so I installed the python program chardet:
pip install chardet
chardet PASCAL/FIGURES.PAS
PASCAL/FIGURES.PAS: Windows-1252 with confidence 0.711673640167364
Windows-1252 is the french encoding, running iconv -f windows-1252 -t utf-8 PASCAL/FIGURES.PAS -o out.file doesn't give a good result.
Searching the internet, I found out the character encoding is CP850:
iconv -f CP850 -t utf-8 PASCAL/FIGURES.PAS -o out.file
{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation) }
{ Unité FIGURES }
{ Copyright (c) 1989 Borland International, Inc. }
{**************************************************************************}