WWW2009 Developers Track



Kevin Miller. Will an easy to use language attract content experts?

Abstract: We propose a new web plugin, enabling content creators to step easily to the next level of producing compelling web applications from scratch without
needing to learn obscure syntax and concepts. The plugin is based on the
well established Revolution desktop development system, an English-like
scripting language descended from the HyperCard tradition. The same language
will run on the server side and on the client side directly embedded in HTML
and executed as the page is sent or loaded. We will demonstrate how to build
a web application using a simple simulation complete with multimedia,
integrated information and embedded Revolution tags.
We will include details
of the public beta program.

Fernando Moreno-Torres. A new tool to improve the filtering options in advanced searching

Abstract: We have developed a software application that analyzes in detail a text in English and  labels the text with linguistic attributes plus additional information in a fully automatic way.
With this new texts, a search engine is able to index the information in a way that provides new filtering possibilities for advanced searches.

Tom White and Christophe Bisciglia. Web data processing with MapReduce and Hadoop

Abstract: With massive growth in website traffic, extracting valuable information from clickstreams is a challenge as existing tools struggle to scale with web scale data. Apache Hadoop is a system for storing and processing massive amounts of data in parallel on clusters of commodity machines. With Hadoop and MapReduce it becomes feasible to make ad hoc queries over the massive datasets, opening up new possibilities for unearthing insights in web scale data.

This talk will consist of two parts. The first part will be a brief introduction to MapReduce and Hive, Hadoop's processing and data warehousing components, and will explain how these technologies are designed to handle big data. The second part will be a demo, showing how Hadoop can be used in practice to mine web logs.

Jeff Hammerbacher is giving a keynote, and was asked to have Cloudera also submit a more technical talk for this developer track. Tom White will be joining Jeff in Madrid. Christophe Bisciglia manages our conference schedule, and does not need to be cited for this submission.
Only Tom will be speaking for this talk.

Víctor Torres, Jaime Delgado, Xavier Maroñas, Silvia Llorente and Marc Gauvin. A web-based rights management system for developing trusted value networks

Abstract: We present an innovative architecture that enables the digital representation of original works and derivatives while implementing Digital Rights Management (DRM) features. The architecture’s main focus is on promoting trust within the multimedia content value networks rather than solely on content access and protection control. The system combines different features common in DRM systems such as licensing, content protection, authorization and reporting together with innovative concepts, such as the linkage of original and derived content and the definition of potential rights. The transmission of reporting requests across the content value network combined with the possibility for authors to preserve rights over derivative works enables the system to distribute income amongst all the actors involved in different steps of the creation and distribution chain. The implementation consists of a web application which interacts with different external services plus a desktop user application used to render protected content. It is currently publicly accessible for evaluation.

Geetha Manjunath, Thara S, Hitesh Bosamiya, Santhi Guntupalli, Vinay Kumar and Ragu Raman G. Creating Personal Mobile Widgets without Programming

Abstract: Our goal is to radically simplify the web so that end-users can perform their personal tasks with just a single click. For this, we define a concept called TaskLet to represent a task-based personal interaction pattern and propose a platform for automatically creating, sharing & executing them. TaskLets can be created even by a naive web user without programming knowledge, as we use the technique of Programming-By-Demonstration. TaskLets can be deployed on the client, cloud or on telecom provider network – enabling intuitive web interaction through widgets, thin mobile browsers as well as from mobile phones via SMS and Voice. Our key innovation is a tool and platform to enable end-users to simplify their personally valuable task. We wish to share the proposed tool with both expert web developers and naïve users of WWW to promote a wider developer community for mobile web applications.

Peter Baumann. A Semantic Web Ready Service Language for Large-Scale Earth Science Archives

Abstract: Geo data classically are categorized into vector, raster, and meta data. The latter category receives plentiful attention by the Web research community; vector data are considered to some extent, but raster data, contributing the largest data volumes, are neglected hitherto. Hence, today only on metadata level service offerings can only be consumed in a semantically adequate manner, while requests addressing the contents of raster sets cannot be done at all or only through APIs. In the end, they lack automatic contents discovery, service chaining, as well as flexible retrieval and analysis.

The Open GeoSpatial Consortium (OGC) Web Coverage Processing Service (WCPS) standard defines a raster retrieval language suitable for ad-hoc navigation, extraction, and analysis of large, multi-dimensional data sets. Due to its formalized semantics definition it is suitable for reasoning and automatic service chaining. Based on real-life use cases drawn from remote sensing, oceanography, geophysics, and climate modeling we discuss the language concept, design rationales, and features like discoverability, declarativeness, safe evaluation, and optimizability. Further, the reference implementation stack, which will be released in open source, is detailed.

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Konig and Dong Xin. Query Portals

Abstract: Our goal is to enable users to efficiently and effectively search the web for informational queries and browse the content relevant to their queries. We achieve a unique “portal” like functionality for each query by effectively exploiting structured and unstructured content. We exploit existing structured data to identify and return per query a set of highly relevant entities such as people, products, movies, locations. Further, we are able to return additional information about the retrieved entities, such as categories, refined queries, and web sites which provide detailed information for each entity. The combination of search results and structured data creates a rich set of results, for the user to focus on and refine their search.

Christopher Adams and Tony Abou-Assaleh. Creating Your Own Web-Deployed Street Map Using Open Source Software and Free Data

Abstract: Street maps are a key element to Local Search; they make the connection between the search results, and the geography. Adding a map to your website can be easily done, using an API from a popular local search provider. However, the lists of restrictions are lengthy and customization can be costly, or impossible. It is possible to create a fully customizable web-deployed street map without sponsoring the corporate leviathans, at only the cost of your time and your server.  Being able to freely style and customize your map is essential; it will distinguish your website from websites with shrink wrapped maps that everyone has seen. Using open source software adds to the level of customizability - you will not have to wait two years for the next release and then maybe get the anticipated new feature or the bug fix; you can make the change yourself. Using free data rids you of contracts, costly transactions, and hefty startup fees. As an example, we walk through creating a street map for the United States of America.

A Web-deployed street map consists of a server and a client. The server stores the map data including any custom refinements. The client requests a portion of the map and the server renders that portion and returns it to the client, which in turn displays it to the user. The map data used in this example is the Tiger/LINE data. Tiger/LINE data covers the whole of the USA. Another source of free road network data is OpenStreetMap, which is not as complete as Tiger/LINE but includes additional data such as points of interest and streets for other countries. Sometimes the original data is not formatted in a manner that attributes to a good looking, concise map. In such cases, data refinement is desired. For instance, performance and aesthetics of a map can be improved by transforming the street center lines to street polygons. For this task, we use the Python language, which has many extensions that make map data refinement easy. The rendering application employed is MapServer. MapServer allows you to specify a configuration file for your map, which consists of layers referencing geographical information, as well as the style attributes to specify how the layers are visualized. MapServer contains utilities to speed up the rendering process, and organize similar data. On the front end, we need a web-page embeddable client that can process requests for map movements, and scale changes in real time. In our experience, OpenLayers is this best tool for this task; it supports many existing protocols for requesting map tiles and is fast, customizable, and user friendly. Thus, deploying a street map service on the Web is feasible for individuals and not limited to big corporations.

Marina Buzzi, Maria Claudia Buzzi, Barbara Leporini and Caterina Senette. Improving interaction via screen reader using ARIA: an example

Abstract: An interface that conforms to W3C ARIA (Accessible Rich Internet Applications) suite would overcome accessibility and usability problems that prevent disabled users from actively contributing to the collaborative growth of knowledge. In a previous phase of our study we first identified problems of interaction via screen reader with Wikipedia [2], then proposed an ARIA-based modified Wikipedia editing page  [1]. At this stage we only focused on the main content for editing/formatting purposes (using roles). To evaluate the effectiveness of an ARIA-based formatting toolbar we only dealt with the main content of the editing page, not the navigation and footer sections.
The next step using ARIA is to introduce landmarks (regions) and use the “flowto” property to be able to change the order of how page content is announced via screen reader. In this way the new UI becomes functionally equivalent to the original Wikipedia editing page and its appearance is very similar (apart from an additional combobox instead of a list of links), but usability is greatly enhanced. In this demo we will show the interaction via Jaws screen reader using both the original and the proposed Wikipedia editing pages.

Christian Bizer, Julius Volz and Georgi Kobilarov. Silk – A Link Discovery Framework for the Web of Data

Abstract: The Web of Linked Data is built upon two simple ideas: First, to employ the RDF data model to publish structured data on the Web. Second, to set explicit RDF links between data items within different data sources.

The Silk link discovery framework supports developers in accomplishing the second task. Using a declarative language, developers can specify which types of RDF links are set between data items and which conditions data items must fulfil in order to be interlinked. Link conditions can combine various similarity metrics, aggregation and transformation functions. Link conditions may take the graph around a data items into account, which is addressed using an RDF path language. Silk accesses remote data sources via the SPARQL protocol and can be employed in situations where terms from different vocabularies are mixed and where no consistent RDFS or OWL schemata exist.

The talk gives an overview about the Silk framework and explains how developers can use Silk to interlink their data with the data cloud that is published by the W3C Linking Open Data community effort.

Georgi Kobilarov, Chris Bizer, Sören Auer and Jens Lehmann. DBpedia – A Linked Data Hub and Data Source for Web Applications and Enterprises

Abstract: The DBpedia project provides Linked Data identifiers for currently 2.6 million things and serves a large knowledge base of structured information. DBpedia developed into the central interlinking hub for the Linking Open Data project, its URIs are used within named entity recognition services such as OpenCalais and annotation services such as Faviki, and the BBC started using DBpedia as their central semantic backbone. DBpedia's structured data serves as background information in the process interlinking datasets and provides a rich source of information for application developers. Beside making the DBpedia knowledge base available as linked data and RDF dumps, we offer a Lookup Service which can be used by applications to discover URIs for identifying concepts, and a SPARQL endpoint that can be retrieve data from the DBpedia knowledge base to be used in applications.

This talk will give an introduction to DBpedia for web developers and an overview of DBpedia's development over the last year. We will demonstrate how DBpedia URIs are used for document annotation and how Web applications can via DBpedia facilitate Wikipedia as a source of structured knowledge.

Philippe Poulard. Intrusive unit testing for Web applications

Abstract: Several tools have been designed for automating tests on Web applications. They usually drive browsers the same way people do: they click links, fill in forms, press buttons, and they check results, such as whether an expected text appears on the page.

WUnit is such a tool, but goes a step further: it can act inside the server, examine server-side components, and even modify them, which gives more controls to the tests to write. For example, in our tests we can get the user session server-side, store an arbitrary object (that does have sense in our test) in the user session, and get the page that renders it. Cheating like this allows to bypass the normal functioning of a Web application and really perform independent unit tests, which gives more flexibility to test-driven development.

In this session, we'll present the Active Tags framework from which is derived WUnit, and experiment how to design simply a test suite for an AJAX-based application: how to get a page, fill a form, upload files, and of course how to act on server-sides components.

Olivier Rossel. The Web, Smart and fast.

Abstract: The Web is considered literature by many users, who cannot spend that much time browsing and reading textual content.
This issue is even bigger when dealing with Web 2.0 sites, where the user is more interested in data than textual content.

Considering the expectations of people for instant access to data on the Web, we propose the paradigm of the Smart & Fast Web.
We discuss the concept, the expectations, the technologies.
At last, we point to Datao.net, a project for a semantic web browser, to surf the Web of Data with all the requirements for a Smart & Fast Web.

Sheila Méndez Núñez, Jose Emilio Labra Gayo and Javier De Andrés. Towards a Semantic Web Environment for XBRL

Abstract: XBRL is an emerging standard that allows enterprises to present their financial accounting information in a XML format, according to their applicable legislation. In this paper we present a system which offer the possibility of adding semantic annotations contained in a financial ontology to XBRL reports. Furthermore, we present a new approach that will apply probabilistic methods to determine the degree of similarity between concepts of different XBRL taxonomies. . This approach will allow invertors, enterprises and XBRL users in general, to perform comparisons between reports that fit onto different taxonomies, even belonging to different countries.

Jun Wang, Xavier Amatriain and David Garcia Garzon. Combining multi-level audio descriptors via web identification and aggregation

Abstract: In this paper, we present the CLAM Aggregator tool. It offers a convenient GUI to combine multi-level audio descriptors. A reliable method is embedded in the tool to identify users’ local music collection with the open data resources. In the context of CLAM framework and Annotator application, Aggregator allows users to configure, aggregate and edit music information ranging from low-level frame scales, to segment scales, and further to any metadata from the outside world such as semantic web. All these steps are designed in a flexible, graphical and user-defined way.

Ted DRAKE. The Future of Vertical Search Engines with Yahoo! Boss

Abstract: While general search engines, such as Google, Yahoo!, and Ask, dominate the search industry; there is a new batch of vertical, niche search engines that could fundamentally change the behavior of search. These search engines are built on the open API’s of Yahoo, Google, and other major players. However, Yahoo’s recently released BOSS API has made these engines more powerful, more specialized, and easier to build and maintain.

These specialized search engines are finding the quality information in the haystack of general search terms. The general search engines, in turn, surface the niche results pages. This talk will discuss how to use the Yahoo! Boss search API to create a niche search engine. It will also show how this information can be mashed together with other APIs and finally how the results pages begin appearing in the general search engines.
The talk is appropriate for engineers, entrepreneurs, and managers.

Julio Camarero and Carlos A. Iglesias. A REST Architecture for Social Disaster Management

Abstract: This article presents a social approach for disaster management, based on a public portal, so-called Disasters2.0, which provides facilities for integrating and sharing user generated information about disasters. The architecture of Disasters2.0 is designed following REST principles and integrates external mashups, such as Google Maps. This architecture has been integrated with different clients, including a mobile client, a multiagent system for assisting in the decentralized management of disasters, and an expert system for automatic assignment of resources to disasters. As a result, the platform allows seamless collaboration of humans and intelligent agents, and provides a novel web2.0 approach for multiagent and disaster management research and artificial intelligence teaching.

Matt Sweeney. YUI 3: Faster, Lighter, Easier

Abstract: The Yahoo! User Interface Library has been widely adopted by the mainstream web development community, and is used to power websites worldwide.

YUI 3 is the next generation of YUI, tailored to the feedback that we have received from thousands of users and developers.

This talk will give an overview of YUI 3, including what's new, what's changed, and where we are going from here.

Patrick Sinclair, Nicholas Humfrey, Yves Raimond, Tom Scott and Michael Smethurst. Using the Web as our Content Management System on the BBC Music Beta

Abstract: In this paper, we describe the BBC Music Beta, providing a comprehensive guide to music content across the BBC. We publish a persistent web identifier for each resource in our music domain, which serves as an aggregation point for all information about it. We describe a promising approach in building web sites, by re-using structured data available elsewhere on the Web --- the Web becomes our Content Management System. We therefore ensure that the BBC Music Beta is a truly Semantic Web site, re-using data from a variety of places and publishing its data in a variety of formats.

laurent denoue, Scott Carter, John Adcock and Gene Golovchinsky. WebNC: efficient sharing of web applications

Abstract: WebNC is a browser plugin that leverages the Document Object Model for efficiently sharing web browser windows or recording web browsing sessions to be replayed later. Unlike existing screen-sharing or screencasting tools, WebNC is optimized to work with web pages where a lot of scrolling happens. Rendered pages are captured as image tiles, and transmitted to a central server through http post. Viewers can watch the webcasts in real-time or asynchronously using a standard web browser: WebNC only relies on html and javascript to reproduce the captured web content. Along with the visual content of web pages, WebNC also captures their layout and textual content for later retrieval. The resulting webcasts require very little bandwidth, are viewable on any modern web browser including the iPhone and Android phones, and are searchable by keyword.

Sharad Goel, Jake Hofman, John Langford, David Pennock and Daniel Reeves. CentMail: Rate Limiting via Certified Micro-Donations

Abstract: We present a plausible path toward adoption of email postage stamps--an oft-cited method for fighting spam--along with a protocol and a prototype implementation. In the standard approach, neither senders nor recipients gain by joining unilaterally, and senders lose money. Our system, called CentMail, begins as a charity fund-raising tool: Users donate $0.01 to a charity of their choice for each email they send. The user benefits by helping a cause, promoting it to friends, and potentially attracting matching donations, often at no additional cost beyond what they planned to donate anyway. Charitable organizations benefit and so may appeal to their members to join. The sender’s email client inserts a uniquely generated CentMail stamp into each message. The recipient’s email client verifies with CentMail that the stamp is valid for that specific message and has not been queried by an unexpectedly large number of other recipients. More generally, the system can serve to rate-limit and validate many types of transactions, broadly construed, from weblog comments to web links to account creation.

Jason Hines and Tony Abou-Assaleh. Query GeoParser: A Spatial-Keyword Query Parser Using Regular Expressions

Abstract: There has been a growing commercial interest in local information within Geographic Information Retrieval, or GIR, systems. Local search engines enable the user to search for entities that contain both textual and spatial information, such as Web pages containing addresses or a business directory. Thus, queries to these systems may contain both spatial and textual components—spatial-keyword queries. Parsing the queries requires breaking the query into textual keywords, and identifying components of the geo-spatial description. For example, the query ‘Hotels near 1567 Argyle St, Halifax, NS’ could be parsed as having the keyword ‘Hotels’, the preposition ‘near’, the street number ‘1567’, the street name ‘Argyle’, the street suffix ‘St’, the city ‘Halifax’, and the province ‘NS’. Developing an accurate query parser is essential to providing relevant search results. Such a query parser can also be utilized in extracting geographic information from Web pages.

One approach to developing such a parser is to use regular expressions. Our Query GeoParser is a simple, but powerful, regular expression-based spatial-keyword query parser. Query GeoParser is implemented in Perl and utilizes many of Perl’s capabilities in optimizing regular expressions. By starting with regular expression building blocks for common entities such as number and streets, and combining them into larger regular expressions, we are able handle over 400 different cases while keeping the code manageable and easy to maintain. We employ the mark-and-match technique to improve the parsing efficiency. First we mark numbers, city names, and states. Following, we use matching to extract keywords and geographic entities. The advantages of our approach include manageability, performance, and easy exception handling. Drawbacks include a lack of geographic hierarchy and the inherent difficulty in dealing with misspellings. We comment on our overall experience using such a parser in a production environment, what we have learnt, and suggest possible ways to deal with the drawbacks.

Sean McCleese, Chris Mattmann, Rob Raskin, Dan Crichton and Sean Hardman. A Virtual Oceanographic Data Center

Abstract: Oceanographic datacenters at the National Aeronautics and Space Administration (NASA) are geographically sparse and disparate in their technological strengths and standardization on common Internet data formats. Virtualized search across these national assets would significantly benefit the oceans research community. To date, the lack of common software infrastructure and open APIs to access the data and descriptive metadata available at each site has precluded virtualized search. In this paper, we describe a nascent effort, called the Virtual Oceanographic Data Center, or VODC, whose goals are to overcome the challenge of virtualized search and data access across oceanographic data centers, and to provide a common and reusable capability for increasing access to large catalogs of NASA oceanographic data.

Clint Hall. Bootstrapping Web Pages for Accessibility and Performance

Abstract: In this talk, I present a technique called Web Bootstrapping, a process by which an accurate collection of only those static resources and metadata necessary for a unique experience be delivered passively, by the most performant means possible.  In further contrast to existing methodologies, rather than focus on the web client's identity or version, this approach determines resources based on capability, form factor and platform by collecting the often-immutable attributes of the client. Bootstrapping allows for rule-based, externalized, server-side configuration, further promoting progressive enhancement and client performance.

Josep M. Pujol and Pablo Rodriguez. Towards Distributed Social Search Engines

Abstract: We describe a distributed social search engine build upon open-source tools aiming to help the community to {\em take back the Search}. Access to the Web is universal and open, and so the mechanisms to search should be. We envision search as a basic service whose operation is controlled and maintained by the community itself. To that end we present an alpha version of what could become the platform of a distributed search engine fueled by the shared resources and collaboration of the community.

Govind Kabra and Kevin Chang. Integration at Web-Scale: Scalable Agent Technology for Enabling Structured Vertical Search

Abstract: The Web today has ``everything'': Every object of interest in real world is starting to find its presence in the online World. As such, the search needs of users are getting increasingly sophisticated. How do you search for apartments? How do you find products to buy? Traditional paradigm of Web search, starting from keyword input, and ending in Web pages as output, stifles users---requiring intensive manual post-processing of search results.

As the solution, we present a platform for enabling vertical search. At the core is our novel agent technology for structured crawling. We showcase the promise of our platform using two concrete products that clearly demonstrate the possibility of Integration at Web-Scale.

The talk will show demos of two concrete products (apartment search, shopping search) and key underlying technologies.

The slides are attached in pdf format. The power-point version are online (for better animation):