This WS substitutes proper nouns with tags. This process anonymizes an input text by eliminating any person, place, corporation, etc. name. The service automatically calls the FreeLing WS and makes use of its Named Entity Recognition tool to detect proper nouns. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, Russian and Portuguese.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
...
Provider:
ws-iulaterm-upf-edu
This WS allows analyzing an already indexed corpus (see CQP indexer WS for indexing details). The WS returns an Excel file with some statistical metrics such as number of nouns, verbs, ngrams, etc. The languages supported are Spanish and English.
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description : Corpus analysis bases on the CWB corpus workbench
installation : Clam Web Services default installation
...
Provider:
ws-iulaterm-upf-edu
This WS allows querying an already indexed corpus (see CQP indexer WS for indexing details). The WS is based on the IMS Open Corpus Workbench (CWB). Language independent WS.
Input:
CorpusId: {id of the corpus you created with the CQP index web service}
Query: A CQP query.
Output:
The CQP output as it would be in the command line.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description : CQP Query Web ...
Provider:
ws-iulaterm-upf-edu
CQP indexer WS based on the IMS Open Corpus Workbench (CWB). The input is an annotated corpus in tabular format. The output is the Corpus ID to be used by the CQPquery Web Service. Language independent WS.
Input:
corpus: Annotated corpus in tabular format.
structure: structure of the corpus.
More info.: http://cwb.sourceforge.net/documentation.php
Output: The Corpus id to be used in the CQP Query web service.
Input example:
corpus: http://ws02.iula.upf.edu/panacea/examples/ws/cqp/cqp...
Provider:
ws-iulaterm-upf-edu
This WS is based on Ted Pedersen's Text Similarity module. It measures the similarity of two documents based on the number of shared words scaled by the lengths of the files. Text Similarity WS computes the F-Measure, the Dice Coefficient, the Cosine, and the Lesk measure. Language independent WS.
Provider:
ws-iulaterm-upf-edu
This WS performs the Count function from Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.). Language independent WS.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description :
cat: 'Funció Count del Ngram Statistics Pack...
Provider:
ws-iulaterm-upf-edu
Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
Details:
ds_lsr_analysis :
analysis :
input :
analysis_extension :
description :
cat: 'Ngram Statistics Package' de Ted Pedersen (s'utilitza per calcular la coocurrència entre paraules).
...
Provider:
ws-iulaterm-upf-edu
This web service calculates different lexicometric measures and displays them graphically (tokens, types, hapaxes & type/token ratio).
Input: plain text corpus with one token per line
Input example:
This
is
an
example
Provider:
ws-iulaterm-upf-edu
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description :
cat: Analitzador de dependències utilitzant Malt Parser
es: Analizador de dependencias utilizando Malt Parser
en: Dependency parsing using Malt Parser.
output :
installation : Clam Web Service default installation
name : malt_parser
type : Syntactic_Tagging
Provider:
ws-iulaterm-upf-edu
Details:
ds_lsr_analysis :
analysis :
input :
analysis_extension :
description :
cat: Analitzador de dependències utilitzant Bohnet Parser
es: Analizador de dependencias utilizando Bohnet Parser
en: Dependency parsing using Bohnet's graph-based Parser.
installation : Clam Web Service default installation
output :
name : bohnet_parser
type : Syntactic_Tagging
Provider:
ws-iulaterm-upf-edu
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description :
cat: .
es: .
en: Tmx translation units scrambler
output :
installation : Clam Web Service default installation
name : tmx_shuffling
type : Others
Provider:
ws-iulaterm-upf-edu
This WS converts PDF documents to plain text format. Language independent WS.
Details:
ds_lsr_analysis :
analysis :
input :
analysis_extension :
description : cat: conversor de pdf a txt.
es: conversor de pdf a txt.
en: pdf to txt converter.
installation : Clam Web Service default installation
output :
name : pdftotext
type : Format_Conversion
Provider:
ws-iulaterm-upf-edu
A WS to convert HTML documents to plain text format. Language independent WS.
Details:
ds_lsr_analysis :
analysis :
input :
analysis_extension :
description : cat: conversor d'html a txt.
es: conversor de html a txt.
en: Html to txt converter.
installation : Clam Web Service default installation
output :
name : html2text
type : Format_Conversion
Provider:
ws-iulaterm-upf-edu
This WS is used to filter text. It extracts part of a file using pattern matching or substituting multiple occurrences of a string within a file with the sed command.
Sed is typically used for extracting part of a file using pattern matching or substituting multiple occurrences of a string within a file.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description :
cat: .
es: .
...
Provider:
ws-iulaterm-upf-edu
Details:
ds_lsr_analysis :
analysis :
input :
analysis_extension :
description :
cat: conversor de Word doc a txt.
es: conversor de Word doc a txt.
en: Word doc to txt converter.
installation : Clam Web Service default installation
output :
name : catdoc
type : Format_Conversion
Provider:
ws-iulaterm-upf-edu
Convert character encoding of given files from one encoding to another.
Based on the Linux command that converts text from one encoding to another encoding.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
output :
name : iconv
Provider:
ws-iulaterm-upf-edu
Processor to extract desired data by columns. Based on Linux awk.
Columns: indicate the columns number you desire separated by commas.
Input: Raw data. Default column separator is blank space or tabs.
You can optionally specify the input and output separators.
Example:
Columns: 4,2
Input: http://ws02.iula.upf.edu/panacea/examples/ws/columns_selector/input_example_1.txt or http://ws02.iula.upf.edu/panacea/examples/ws/columns_selector/input_example_2.txt
Output example: http://ws02.iula.upf....
Provider:
ws-iulaterm-upf-edu
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description :
cat: Donat un fitxer, el parteix en fitxers més petits, del nombre de línies indicat com a paràmetre d'entrada (defecte 1000 línies)
es: Dado un fichero, lo parte en ficheros más pequeños, del número de líneas indicado como parámetro de entrada (defecto 1000 lineas).
en: Given a file, split it into smaller files containing the ...
Provider:
ws-iulaterm-upf-edu
This WS will scramble the lines in a parallel text corpus keeping the alignment. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
Details:
ds_lsr_analysis :
analysis :
analysis_extension :
input :
description : cat: .
es: .
en: Web service to scramble the lines in a parallel corpus.
output :
installation : Clam Web ...
Provider:
ws-iulaterm-upf-edu
This WS scrambles the lines in a file. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
Provider:
ws-iulaterm-upf-edu