Web Proxy Portlet Replacement

Jasig would like to rewrite the existing web proxy portlet as modern, Spring-based portlet project.  This portlet could furthermore also serve as a more general content transformation portlet, replacing the historical XSLT channel.

Potential Transformation Types

  • Web Proxy: Web Proxy can be viewed as a simple transformation type where the end output is similar to the input.  Web proxy might still be subject to a content transformation pipeline that includes HTML validation, content clipping, etc.   This pipeline might use HttpClient and OWASP AntiSamy.
  • XSLT: Transform source XML using an XSLT.
  • JSON: Deserialize JSON using the Jackson Library into a Java Map object, then transform into HTML using the configured Spring view name.

Technology

  • SpringMVC
  • HttpClient4 for requesting remote content
  • OWASP AntiSamy for validating remote content
  • NekoHTML for parsing HTML into processable SAX events
  • Jackson for JSON deserialization
  • Standard JDK classes for XSLT transformation
  • Look into http://jsoup.org/ as a replacement for NekoHTML, it would also handle clipping and manipulation

Features

  • Pluggable authentication
    • Form-based credential replay
    • Proxy-CAS
    • Delegated SAML
    • Certificate?
  • Proxying of embedded web resources, including CSS, JS, and images
  • HTML Clipping, preferably using a jQuery like syntax
  • Support regex-y whitelist of URLs to be proxied
    • All re-written URLs should be tracked in session to prevent exploiting poorly written whitelists and turning this into an open proxy
  • Ability to load source content from the filesystem in addition to requesting remote web content
  • Mechanism for adding user attributes / other interesting dynamic parameters to initial URL
  • Play nice with the portlet 2.0 caching controls
  • Persist the HttpClient state data (cookies) on a per user per instance basis
  • Ability to add HTTP headers which could contain user attributes
  • Ability to re-write proxied CSS to scope the included CSS to just the proxied content