Andreas Hess // Projects // Assam // WSDL Collection

Collection of Categorized Web Services

For our work on machine learning for the annotation of web services we have gathered WSDL files from salcentral and XMethods and organized them in a hierarchy.



The web services are hierarchically classified. The directory structure serves as the label, i.e. a wsdl file in the communication\mail directory was classified as a "mail" webservice, where "mail" is a subclass of "communication".

The labeled instances were crawled from the SALCentral website, the unlabeled instances (in directory "unlabelled") are from the xmethods website.

Each .wsdl file is accompanied by a .txt file with the following structure:

  1. line = service provider
  2. line = original classification by SALCentral
  3. line = service name
  4. line = URL of the original WSDL file on the Web
  5. line = Plain text description of the service crawled from the SALCentral/xmethods web page

Note that the SALCentral classification is not very useful (that's why we wanted to have our own...)

The filenames are serviceNN.OriginalClassification.[txt|wsdl], where OriginalClassification refers to the label assigned by SALCentral. The XMethods web site does not categorize the web services, therefore line 2 in the .txt files for the unlabeled instances is always "XMethods".

The classes are highly unbalanced and unfortunately not noise-free.

Related Publications

26 Mar 2007, Andreas Hess, andreas at idirlion dot de