The differences between the following specialized terms (Q.3)

This are the differences between the following specialized terms: machine translation, machine aided translation, multilingual content management and translation technology.

  • Machine translation: Sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.
  • Computer-assisted translation,computer-aided translation or CAT: Is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation.
  • Multilingual content management: It contains information, mostly in the form of more or less structured text documents, but potentially also including audio clips, video clips and images.
  • Translation: Is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text.”


  • Wikipedia, 19 May 2008. Retrived: 19 May 2008, 18:04

  • Wikipedia, 15 May 2008. Retrived: 19 May 2008, 18:05

  • Wikipedia, 7 May 2008. Retrived: 19 May 2008, 18:06

  • Wikipedia. Retrived: 19 May 2008, 18:08


MT systems: example of translation (Q.3)

I have use transledium to make the following translation into different languages:

  • SPANISH: Internet es un método de interconexión descentralizada de redes de computadoras implementado en un conjunto de protocolos denominado TCP/IP y garantiza que redes físicas hetereogéneas funcionen como una red lógica única, de alcance mundial. Sus orígenes se remontan a 1969, cuando se estableció la primera conexión de computadoras, conocida como ARPANET, entre tres universidades en California y una en Utah, EE.UU.


  • GERMAN: Internet ist eine dezentralisiert-Verbindungs-Methode von Computernetzen, in einer TCP/IP genannten Gesamtheit von Protokollen implementiert, und er garantiert, dass heterogene|ungleichartige körperliche|physikalische Netze als ein einzigartiges logisches Netz, von Weltreichweite funktionieren. Seine|Ihre Herkunft geht auf 1969 zurück, als man die erste Computerverbindung, als ARPANET gekannt, zwischen drei Universitäten in Kalifornien und einer in Utah, EE festlegte. UU.


  • ENGLISH: Internet is a method of decentralized interconnection of nets|networks of computers implemented in an ensemble|set of protocols named TCP/IP and it|he guarantees that heterogeneous physical nets|networks work as a unique logical net|network, of world scope. Its|His|Her|Their origins go back to 1969, when the first connection of computers, known as ARPANET, between three universities in California and one in Utah, EE, was established. UU.


  • FRENCH: Internet est une méthode d’interconnexion décentralisée de filets|réseaux d’ordinateurs exécutée dans un ensemble de protocoles dénommé TCP/IP et il garantit que des filets|réseaux physiques hétérogènes fonctionnent comme un filet|réseau logique unique, de portée mondiale. Ses|Leurs origines remontent à 1969, quand on a établi la première connexion d’ordinateurs, connue comme ARPANET, entre trois universités en Californie et une en Utah, EE. UU.

To sum up, we can see that the transladium has had some problems with the meanings because for some words it gives us more than one meaning. For example: “físicas hetereogéneas” wich it has been translated into: “heterogene|ungleichartige körperliche|physikalische”.


  • Wikipedia, 18 May 2008. Retrived: 19 May 2008, 18:50.

  • Traductor automático. Retrived: 19 May 2008, 18:51


Characteristics of a translation task according to the FEMTI report.

FEMTI means ,The Framework for Machine Translation Evaluation in ISLE is a resource that helps MT evaluators define contextual evaluation plans. FEMTI consists of two interrelated classifications or taxonomies: the first one lists possible characteristics of the contexts of use that are applicable to MT systems. The second one lists the possible characteristics of an MT system, along with the metrics that were proposed to measure them.

According to the FEMTI report, the characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation. The main characteristics are the following:

  • Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.
  • Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.
  • Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.



  • Femti- a Framework for the Evaluation of Machine Translation in ISLE. (2002). Retrived: 19 May 2008, 17:13


Explanation of the topics (Q.2)

The first topic that I am going to explain is “Corpus-based language modeling” which belongs to The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

Corpus linguistics is the study of language as expressed in samples (corpora) or “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are largely derived by an automated process, which is corrected.

Computational methods had once been viewed as a holy grail of linguistic research, which would ultimately manifest a ruleset for natural language processing and machine translation at a high level. Such has not been the case, and since the cognitive revolution, cognitive linguistics has been largely critical of many claimed practical uses for corpora. However, as computation capacity and speed have increased, the use of corpora to study language and term relationships en masse has gained some respectability.

The corpus approach runs counter to Noam Chomsky‘s view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus linguistics does away with Chomsky’s competence/performance split; adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference.

The second topic is “Pragmatics”  which belongs to The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

Pragmatics is the study of the ability of natural language speakers to communicate more than that which is explicitly stated. The ability to understand another speaker’s intended meaning is called pragmatic competence. An utterance describing pragmatic function is described as metapragmatic. Another perspective is that pragmatics deals with the ways we reach our goal in communication. Suppose, a person wanted to ask someone else to stop smoking. This can be achieved by using several utterances. The person could simply say, ‘Stop smoking, please!’ which is direct and with clear semantic meaning; alternatively, the person could say, ‘Whew, this room could use an air purifier’ which implies a similar meaning but is indirect and therefore requires pragmatic inference to derive the intended meaning.

Pragmatics is regarded as one of the most challenging aspects for language learners to grasp, and can only truly be learned with experience.

The last topic is “Speech Recognition”  which belongs to  The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to the binary code for a string of character codes). The term voice recognition may also be used to refer to speech recognition, but more precisely refers to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said.

Speech recognition applications include voice dialing (e.g., “Call home”), call routing (e.g., “I would like to make a collect call”), domotic appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and in aircraft cockpits (usually termed Direct Voice Input).


  • Wikipedia, 14 May 2008. Retrived: 19 May 2008, 17:52

  • Wikipedia, 25 April 2008. Retrived: 19 May 2008, 17:53

  • Wikipedia, 15 May 2008. Retrived: 19 May 2008, 17:54

Research topics on HTL (Q2)

In this article we are going to see the most recent researches mentioned in many sites of Human Language Technologies.

Refering to the German Research Center for Artificial Intelligence we realise that:

These themes are elaborated in research, development and commercial projects:

  • exploiting – and automatically extending – ontologies for content processing
  • tighter integration of shallow and deep techniques in processing
  • enriching deep processing with statistical methods
  • combining language checking with structuring tools in document authoring
  • document indexing for German and English
  • automatically associating recognized information with related information and thus building up collective knowledge
  • automatically structuring and visualizing extracted information
  • processing information encoded in multiple languages, among them Chinese and Japanese.

The Edimburgh Language Technology Group develops the following areas:

  • Combining Shallow Semantics and Domain Knowledge (EASIE).
  • Text Mining for Biomedical Content Curation (TXM).
  • Cross-retail Multi-agent Retail Comparison (CROSSMARC).
  • Smart Qualitalive Data: Methods and Community tools for Data Mark-up (SQUAD).
  • Machine Learning for Named Entity Recognition (SEER).
  • Integrated Models and Tools for Fine-Grained Prosody in Discourse (Synthesis).
  • Joint Action Science and Technology (JAST).
  • AMI consorting projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams).

The Common Language Resources and Technology Infrastructure wants Common Language Resources and Technology Infrastructure, at March 17/18/19. the CLARIN Kick-Off meeting will take place to start our pan-European research infrastructure work. We want to achieve a number of goals at these three days:

  • We need a broad and deep understanding of the goals of CLARIN by everyone involved. Yet we cannot assume that the knowledge is already sufficiently spread.
  • We need to start the interaction with everyone involved and interested and to take up the comments and ideas from all the experts.
  • We need to spread the relevant messages about the different layers of the work that is involved when setting up a research infrastructure in particular since it involves aspects that were not yet topic of the general discussions in our field.
  • We need to create a positive atmosphere and an enthusiasm which will be important to meet our challenging goals.
  • We need to start the actual work in the working groups and invite all experts to participate.
  • Of course those who are partners in the EC funded project need to understand the rules of the game. In particular the double funding scheme – national and EC funding – needs careful attention from all of us. Other members need to be informed about the national groups.

The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio) invite student researchers to submit their work to the workshop.

  • Pragmatics, discourse, semantics, syntax and the lexicon.
  • Phonetics, phonology and morphology.
  • Linguistic, mathematical and psychological models of language.
  • Information retrieval, information extraction, question answering.
  • Summarization and paraphrasing.
  • Speech recognition, speech synthesis.
  • Corpus-based language modeling.
  • Multi-lingual processing, machine translation, translation aids.
  • Spoken and written natural language interfaces, dialogue systems.
  • Multi-modal language processing, multimedia systems.
  • Message and narrative understanding systems.


Hans Uszkoreit and European centres for Human Language Technologies (Q1)

Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. By cooptation he is also Professor of the Computer Science Department.

Hans Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts). During this time, he also taught at the University of Stuttgart. In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation). He is also co-founder and professor of the “European Postgraduate Program Language Technology and Cognitive Systems”, a joint Ph.D. program with the University of Edinburgh.Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards. He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.His current research interests are computer models of natural language understanding and production, advanced applications of language and knowledge technologies such as semantic information systems, cognitive foundations of language and knowledge, grammar formalisms and their implementation, syntax and semantics of natural language and the grammar of German.Talking about his recent publications we can find:

  • Uszkoreit, H., F. Xu, Weiquan Liu, J. Steffen, I. Aslan, J. Liu, C. Müller, B. Holtkamp, M. Wojciechowski (2007)
    A Successful Field Test of a Mobile and Multilingual Information Service System COMPASS2008. In Proceedings of HCI International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007.
  • Uszkoreit, H., F. Xu, J. Steffen and I. Aslan (2006) The pragmatic combination of different cross-lingual resources for multilingual information services In Proceedings of LREC 2006, Genova, Italy, May, 2006.
  • Uszkoreit, H., U. Callmeier, A. Eisele, U. Schäfer, M. Siegel, J. Uszkoreit (2004): Hybrid Robust Deep and Shallow Semantic Processing for Creativity Support in Document Production. In Proceedings of KONVENS 2004, Vienna, Austria.

About Hans Uszkoreit’s researches we can find:

  • Research Assistant at the Center for Advanced Study in the Social and the Behavioral Sciences at Stanford. (1981-82).
  • Research Associate in the project METAL at Linguistics Research Center in Austin, Texas. (1977-80).
  • Research Assistant at the Cognitive Science Program at Stanford University.

This are some of the European Research Centres of Human Language Technologies:

  • National Centre for Language Technology (NCLT) : Dublin, Ireland “carries out basic researchs and develops applications”.
  • OFAI Language Technology Group: Australian center “conduct research in modelling and processing human languages, especially for German”.
  • Edimburgh Language Technology Group (LTG) : The LTG has been working since 1990s they “building practical solutions to real problems in text processing”.
  • Language Technology Documentation Centre in Finland: Developed “in order to make speech-to-speech translation real”.


  • Hans Uszkoreit, February 2007. Retrived: 21:05, 1 April 2008.

  • European research centres for Human Language Technologies. Retrived: 22:41, 1 April 2008.

Human Language Technology (Q1)

There are so many definitions of Human Language Technologies that can be founded on the Net the first definition we can find it on the Wikipedia but we find it as Natural Language Processing (NPL).

Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages.

Natural-language-generation systems convert information from computer databases into normal-sounding human language. Natural-language-understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

Another definition referred to the Human Language processing is the one given by Hans Uszkoreit,

Language technology — sometimes also referred to as human language technology — comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics”.

On Hans Uszkoreit’s book “Language Technology A First Overview”, he says

“Language Technologies are information technologies that are specialized for dealing with the most complex information medium in our world: human language. Therefore rhese technologies are also subsumed under the term Human Language Technology. Human language occurs in spoken and written form. Whereas speech is the oldest and most natural mode of language comunication, complex information and most of human knowledge is maintained and transmitted in written texts. Speech and text technologies process or produce language in these two modes of realization. But language also has aspects that are shared between speech and txt such as dictionaries, most of grammar and the meaning of sentences. Thus large parts of language technology cannot be subsumed under speech and text techmologies. Among those are technologies that link language to knowledge. We de not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology had to create formal representation systems that link language to concepts and tasks in the real world. This provides the interface to the fast growing area of knowledge technologies.

In our comunication we mix language with other modes of comunication and other information media. We combine speech with gesture and facial expressions. Digital texts are combined with pictures and sounds. Movies may contain language and spoken and written form. Thus speech and text technologies overlap and interact with many other technologies that facilitate processing of multimodal communication and multimedia documents.”


  • Hans Uszkoreit book “Language Technology A First Overview”. Retrived: 16.15, 1 April 2008.

  • German Research Center for Artificial Intelligence “Language Technology Lab”. Retrived: 17:25, 1 April 2008.

  • Wikipedia, 18 March 2008. Retrived: 18:23, 1 April 2008.