====== Latin1 to Utf-8 ======
===== Tools/Services which must be checked =====
* NFS
* Samba
* Windows
* sbclient
* rsync
* musync
===== Config changes =====
* Samba - smb.conf
unix charset = ISO8859-1 -> unix charset = UTF-8 (not tested yet)
===== Converting the filenames =====
* convmv
Just test what would be done:
convmv -r -f latin1 -t utf-8 /path/to/files
===== Converting textfiles =====
* recode
To convert a textfile that contains latin1 characters to utf-8 format you can use the recode tool
recode latin1..utf-8
converts a file from latin1 to utf-8 format
A problem can be if you happen to convert a file which is already in utf-8 format.
I wrote the follwowing script that will convert only files with an iso-8859* charset to utf-8. Each file will be backed up before converting.
convert_files.sh:
- !/bin/sh
-
- Convert latin1 files to utf-8 files
-
BAK_EXT=".convert_backup"
while read FILE; do
if file -i "$FILE" | cut -d":" -f2 | tr -d ' ' | grep "^text/" | cut -d"=" -f2 | grep -q "iso-8859"; then
if echo "$FILE" | grep -q "$BAK_EXT\$"; then
echo "Skipping $FILE"
continue;
fi
echo "Converting $FILE"
cp -a "$FILE" "$FILE"$BAK_EXT
recode latin1..utf-8 "$FILE"
fi
done
The script will read filenames from stdin. To convert all textfiles in your hone directory use
find ~ -type f | convert_files.sh
--[[user:mschiff|mschiff]] 13:25, 19 Apr 2005 (CEST)