photo

Steve MICHELET

shared this problem
3 months ago

Employees Involved

photo

SCM

Admin

Statistics

1
Comments
1
Views

Share

9
votes

UTF-8 Problem for non-english

I have a problem for non-english articles...

I can see in article generated this :

coût important (coût important)

entraîne (entraîne)

où (où)

Visibly, it's a UTF-8 ASCII problem...

Can you resolve this ? thanks for your help :-)

Kinds regards

Official Answer
photo Employee
SCM Posted 3 months ago

The immediate solution is to use content filter.

Place use " Ã " as the filter so that SCM will remove any content with that character in it.

Add Comment

Comments (1)

photo Employee
5

The problem is most likely the page being scraped has incorrect character encoding.

SCM can read and write to UTF8 files fine.

I think the solution is to use a language filter that tries to detect content and throws away anything it can't figure out.

Leave Comment

photo