Table of Contents

Latin1 to Utf-8

Tools/Services which must be checked

Config changes

unix charset = ISO8859-1 → unix charset = UTF-8 (not tested yet)

Converting the filenames

Just test what would be done:

 convmv -r -f latin1 -t utf-8 /path/to/files

Converting textfiles

To convert a textfile that contains latin1 characters to utf-8 format you can use the recode tool

 recode latin1..utf-8 <filename>

converts a file from latin1 to utf-8 format

A problem can be if you happen to convert a file which is already in utf-8 format.

I wrote the follwowing script that will convert only files with an iso-8859* charset to utf-8. Each file will be backed up before converting.

convert_files.sh:

  - !/bin/sh
  - 
  -  Convert latin1 files to utf-8 files
  - 

BAK_EXT=".convert_backup"

while read FILE; do
<code>
  if file -i "$FILE" | cut -d":" -f2 | tr -d ' ' | grep "^text/" | cut -d"=" -f2 | grep -q "iso-8859"; then
    if echo "$FILE" | grep -q "$BAK_EXT\$"; then
      echo "Skipping $FILE"
      continue;
    fi
    echo "Converting $FILE"
    cp -a "$FILE" "$FILE"$BAK_EXT
    recode latin1..utf-8 "$FILE"
  fi

done </code>

The script will read filenames from stdin. To convert all textfiles in your hone directory use

 find ~ -type f | convert_files.sh

mschiff 13:25, 19 Apr 2005 (CEST)