User Tools

Site Tools


linuxtips:latin1toutf8

Latin1 to Utf-8

Tools/Services which must be checked

  • NFS
  • Samba
  • Windows
  • sbclient
  • rsync
  • musync

Config changes

  • Samba - smb.conf

unix charset = ISO8859-1 → unix charset = UTF-8 (not tested yet)

Converting the filenames

  • convmv

Just test what would be done:

 convmv -r -f latin1 -t utf-8 /path/to/files

Converting textfiles

  • recode

To convert a textfile that contains latin1 characters to utf-8 format you can use the recode tool

 recode latin1..utf-8 <filename>

converts a file from latin1 to utf-8 format

A problem can be if you happen to convert a file which is already in utf-8 format.

I wrote the follwowing script that will convert only files with an iso-8859* charset to utf-8. Each file will be backed up before converting.

convert_files.sh:

  - !/bin/sh
  - 
  -  Convert latin1 files to utf-8 files
  - 

BAK_EXT=".convert_backup"

while read FILE; do
<code>
  if file -i "$FILE" | cut -d":" -f2 | tr -d ' ' | grep "^text/" | cut -d"=" -f2 | grep -q "iso-8859"; then
    if echo "$FILE" | grep -q "$BAK_EXT\$"; then
      echo "Skipping $FILE"
      continue;
    fi
    echo "Converting $FILE"
    cp -a "$FILE" "$FILE"$BAK_EXT
    recode latin1..utf-8 "$FILE"
  fi

done </code>

The script will read filenames from stdin. To convert all textfiles in your hone directory use

 find ~ -type f | convert_files.sh

mschiff 13:25, 19 Apr 2005 (CEST)

linuxtips/latin1toutf8.txt · Last modified: 2012/01/14 05:03 by mschiff