Looking for a tool (Localization)

Technical forums » Localization »
Looking for a tool
Track this topic

Looking for a tool

Thread poster: Brandis (X)

Brandis (X)
Local time: 13:23
English to German
+ ...

Sep 23, 2004

Hi all! I am searching for a tool, using which complete website (source) content can be extracted, format is ofcourse .html. Here I have various websites, automobiles, medical, etc., I thought a tool like this would be wonderful, especially to go about pre-planned TMs and develope the target content in course of time. I shall appreciate all help
Regards,
Brandis

Judy Rojas

Chile
Local time: 07:23
Spanish to English
+ ...

Try Webreaper

Sep 23, 2004

Hi:
Try webreaper. You can download it at http://www.webreaper.net/download.html
Regards,
Ricardo

Brandis (X)
Local time: 13:23
English to German
+ ...

TOPIC STARTER

I know webreaper

Sep 23, 2004

Ricardo Martinez de la Torre wrote:

Hi:
Try webreaper. You can download it at http://www.webreaper.net/download.html
Regards,
Ricardo

Hi I know this tool already. I am using others, but what I am searching for is a tool for source terminology extract function from multiple webpages pertaining to one topic or product, with a view to build professional TMs.But thank you.A closer description is Trados Tageditor, where one could extract terminology from multiple bi-lingual files, i am in search of something similar, only as a separate tool.
brandis

[Edited at 2004-09-23 01:13]

Luciano Monteiro

Brazil
Local time: 14:23
English to Portuguese
+ ...

Fusion

Sep 23, 2004

Hello Brandis

You might like to try Fusion. It has a terminology feature that I think would suit your needs.

Best regards,

Luciano Monteiro

Marc P (X)

Local time: 13:23
German to English
+ ...

Website retrieval and translation

Sep 23, 2004

Here's one way of doing it:

First, retrieve the web site with wget. For example, if you want to retrieve the OmegaT web site at www.omegat.org/omegat/omegat.html, you enter:

wget http://www.omegat.org/omegat/omegat.html -r -p

on the command line. The -r option causes folders to be saved recursively (i.e. sub-folders will be saved), the -p option causes any files needed for complete display of the pages to be saved.

Then you create a new project in OmegaT and place all the files you have downloaded in the /source folder of that project exactly as you downloaded them, i.e. with the same folder structure. (You can of course create the empty project first, then on the command line, switch to the /source folder, and then download the web site into it directly.) When you have finished translating the html files in OmegaT, compiling the project in OmegaT will reproduce the structure with the translated files in the /target folder.

Get wget from:

http://wget.sunsite.dk/

and OmegaT (latest version 1.4.3 is just out, September 2004) from:

http://sourceforge.net/projects/omegat

wget and OmegaT both run on both Linux and Windows.

Marc ▲ Collapse

Brandis (X)
Local time: 13:23
English to German
+ ...

TOPIC STARTER

Thank you

Sep 23, 2004

Luciano Monteiro wrote:

Hello Brandis

You might like to try Fusion. It has a terminology feature that I think would suit your needs.

Best regards,

Luciano Monteiro

But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is.
Rgds,
Brandis

Piotr Bienkowski

Poland
Local time: 13:23
English to Polish
+ ...

Try SDLX

Sep 23, 2004

SDLX can do web formats, html, and html like files (this week I was translating chunks of html files {incomplete html code} which its web formats filter accepted happily), and many other formats, including XML and SGML, as well as RC and some programming languages files.

It will not download a web site for you but other than that it can handle translation of tagged files pretty well.

An until Sept. 30 it is available at half price.

For more information go to http://www.sdl.com/intltransday

HTH

Piotr

Brandis (X)
Local time: 13:23
English to German
+ ...

TOPIC STARTER

I have sdlx

Sep 23, 2004

syntaxpb wrote:

Piotr Bienkowski

Poland
Local time: 13:23
English to Polish
+ ...

Terminology lists?

Sep 24, 2004

Brandis wrote:

Piotr

I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis[/quote]

Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered).

Piotr

Brandis (X)
Local time: 13:23
English to German
+ ...

TOPIC STARTER

I do not mean that

Sep 24, 2004

syntaxpb wrote:

Brandis wrote:

Piotr

Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered).

Piotr

[/quote]Hi! again a small correction. This could be any website. For example, Metal working websites, here you may find anywhere from 100 to a few thousand, all use some standard terminology in their product presentation or descriptions via web,if one could extract that type of content as to build monolongual glossary initially switch to target webs and compare, one would have a field specific glossary, I guess. It is that kind of a tool I am looking for. Sofar in case of fusion (doesn´t process .html files) we have a wonderful term extraction facility basing on the files fed to fusion, whereas other tools actually require you of doing the translation in order to generate a TM. My search is hence two-fold, term extraction (monolingual) using a functinality as in fusion, but extracting from websites. As in my case my outsourcer either indicates the website or sends me the website for local processing and I start with Trados, as I cannot process these sites directly in Fusion, despite it´s term extraction ability. Sometimes my outsourcer gives me a TM (5 - 10%) of the file prepared and fights over the price. Another point is also, that most of the webcontent is a global publication ( see kudoz , mostly you see webreferences), so the idea is, I guess it is obvious now.
Regards,
Brandis
Regards,
Brandis

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze	[Call to this topic]
Mahmoud Akbari	[Call to this topic]

You can also contact site staff by submitting a support request »

Looking for a tool

Forum rules

Help and orientation

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Wordfast Pro
Translation Memory Software for Any Platform Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Looking for a tool

Looking for a tool

You have native languages that can be verified

Your current localization setting

Select a language