The content of this page was adopted from the README.txt file included with WebProxy portlet. Please note that there are several places where "????" notation was used because no reasonable value was known as of this writing. Perhaps someone may be able to fill those in.
Portlet Name: Authenticated Web Proxy
Version Number: 1.0
Designers: University of Wisconsin
Developers:
Eric Dalquist
David Grimwood
Nabeel Ramzan
Requirements spec: Authenticated Web Proxy Requirements Specifications, version 1.0:
Design spec: Authenticated Web Proxy Design Document, version 1.0:
Summary
The Authenticated WebProxy portlet allows seamless integration with web-based services regardless of the technology used to implement them. These services are represented within the portal as individual channels on a user's layout. All content from the proxied site is scraped, parsed and rendered inside the portal. Pages are refreshed and kept inside the portal when interacted with. HTTP standards are followed, allowing communication between the browser and dynamic web-based applications. In addition to this, Authenticated WebProxy provides additional technologies for authentication(Form and Basic authentication), clipping, and content caching. Authenticated Webproxy also provides a mechanism for passing user-specific information to the back-end application.
Configuration
All aspects of the Authenticated WebProxy portlet are configurable(i.e., Http Management, HTML Parsing,URL Filtering, and Http Clipping) . All implementation classes are plugged in via placing references to them in the applicationContext.xml file in the WebProxy\web\WEB-INF directory.
Authentication
The portlet has been enhanced to add more methods of authentication. One, using Shibboleth, is documented in the uPortal manual here. Authentication using CAS proxy tickets is documented here.
Build and Installation
The ant build file for the project provides a deploy target. This depends on the clean, init, compile and dist targets to create the WAR file and then copies the file in the designated web container. The web container home and portal directory locations need to be configured within the corresponding build.properties
In order for Authenticated WebProxy to utilize session and cache persistence, the latest version of Common Storage must be deployed. The appropriate version of the JAR is included in this WebProxy release. Please see the CommonStorage documentation for detailed instruction on deployment.
Database
WebProxy portlet supports caching of the contents retrieved from other Web servers for the purpose of improving performance. Optionally, this caching can also be stored persistently in a database to survive and be available beyond user sessions. This requires that two database tables be available, and as of this writing no mechanism exists for the portlet to create these tables automatically.
The database connection information is stored in /WEB-INF/datasource.properties. This is the database where the portlet expects to find its tables: WP_CACHE_STORE and WP_STATE_STORE. There are two DDL scripts provided with the portlet: create.sql and drop.sql. They are not a part of the portlet's war file. They can be found in the doc directory of the WebProxy portlet project. The DDL in create.sql is Oracle-specific because it uses VARCHAR2 and BLOB data types. For HSQLDB these need to be changed to VARCHAR and LONGVARBINARY.
Publishing
This channel can be published via the channel manager or by using the 'pubchan' target from your portal directory. However, since many specific configuration variables need to be properly set, it is recommended to use the channel manager method. To do so, please follow the steps outlined below:
- Login as user with admin privileges
- Click on 'Channel Manager'
- Click on 'Publish a new channel'
- Select Portlet for Channel Type
- Provide a Channel Title, Name, Functional Name, Description, and Timeout.
- The 'Portlet definition ID' for Authenticated Web Proxy is WebProxyPortlet.WebProxyPortlet
- No portlet preferences are required for this portlet
- Select 'Editable' for the Channel Controls, if you would like the end user to be able to go to a predetermined URL to edit the proxied web application configuration.
- Select the channel categories for the portlet
- Select the groups or people who should have access to this portlet
Configuration
You should now be in the custom CONFIG mode of the portlet, please consult this readme doc for descriptions of each possible configuration.
General Configuration
- Base URL (REQUIRED): The Base URL is the starting point of the proxied application and will be the first page proxied for the end user after authentication.
edu.wisc.my.webproxy.webproxy.general.config.sBaseUrl- This value must contain the protocol of the URL. (e.g., http://www.foo.bar/, http://foo.bar/example.html)
- This value may include placeholders for user attributes (e.g. http://www.foo.bar/${uid} becomes http://www.foo.bar/admin for the admin user at run time). User attributes used in this way must be declared in portlet.xml.
- Edit URL (OPTIONAL): This URL will provide a link that will allow the end user to configure the proxied web-application for their own personal needs. This value must also contain the protocol. (e.g., http://foo.bar/edit.html)
edu.wisc.my.webproxy.webproxy.general.config.sEditUrl - URL Rewrite Masks (OPTIONAL): Must contain the URLs of the web application you would like or would not like to be proxied, depending on the URL List Type (see below). The portlet URL list uses regular expressions for matching. (e.g., .foo.)
edu.wisc.my.webproxy.webproxy.general.config.sPortletUrl - URL List Type: This value is set to Include by default. This value will designate how the portlet will read the Portlet Url list. If this value is set to Include,the portlet will rewrite and proxy all matching URL expressions in the list. If set to Exclude, the portlet will proxy all the sites that do not match the URL expressions listed.
edu.wisc.my.webproxy.webproxy.general.config.sListType - Target FName (OPTIONAL): Functional name of another channel to redirect the user to where a functional URL rewrite mask matches the URL.
edu.wisc.my.webproxy.webproxy.general.config.funcNameTarget Functional URL Rewrite Masks (OPTIONAL): Regular expression(s) matching URLs which should trigger a redirect of the user to another channel. If matched (or not matched, depending on the "FName URL List Type" setting) the user is redirected to the channel identified by the functional name specified in the field "Target FName".
edu.wisc.my.webproxy.webproxy.general.config.funcNameUrlRegEx
- Functional URL list type: This value is set to Include by default. This value will designate how the portlet will read the Functional URL Rewrite Masks. If this value is set to Include, the portlet will redirect the user to the alternative channel for all matching URL expressions in the list. If set to Exclude, the portlet will redirect when the sites does not match the URL expressions listed.
edu.wisc.my.webproxy.webproxy.general.config.sListType
edu.wisc.my.webproxy.webproxy.general.config.funcNameListType - Pre-Interceptor class (Optional): This is an optional configuration that can be used to manipulate the http request before it is sent to the web-application. This will require a custom class file to use. (e.g. edu.wisc.my.webproxy.CPreInterceptor)
edu.wisc.my.webproxy.webproxy.general.config.sPreInterceptor - Post-Interceptor class (Optional): This is an optional configuration that can be used to manipulate the http response after it is received from the web-application. This will require a custom class file to use. (e.g. edu.wisc.my.webproxy.CPostInterceptor)
edu.wisc.my.webproxy.webproxy.general.config.sPostInterceptor
Cache Configuration
Please note that turning of cache support will require a database connection with two WebProxy-specific tables. Please see the Database section for details.
- Use Cache (Optional) : Select this if you would like to enable the Authenticated WebProxy cache. Values: true, false (default).
edu.wisc.my.webproxy.webproxy.cache.useCache - Cache Scope (Optional): If edu.wisc.my.webproxy.webproxy.cache.user is selected, the cache will only be valid for the end portal user. If edu.wisc.my.webproxy.webproxy.cache.application is selected, all users will share the data stored in cache. Default: edu.wisc.my.webproxy.webproxy.cache.user
edu.wisc.my.webproxy.webproxy.cache.cacheScope - Cache Timeout (Optional): The amount of seconds you would like the cache to be valid for.
edu.wisc.my.webproxy.webproxy.cache.cacheTimeOut - Use expired data if the remote server is not responding (Optional): If selected the portal will use expired data if the remote server stops responding. If selected, you must designate the amount of seconds Authenticated Web Proxy will wait before trying to contact the non-responding servers. Values: true, false (default).
edu.wisc.my.webproxy.webproxy.cache.useCExpired - Retry Delay (Optional): If an HTTP timeout occurred, edu.wisc.my.webproxy.webproxy.cache.useCExpired = true, this value is the amount of seconds you would like to wait (e.g. retain the item in cache) before another attempt is made by a future request to access the resource. Default: 0 (do not retain cached data so next request should retry)
edu.wisc.my.webproxy.webproxy.cache.retryDelay - Persist Cache (Optional): If you would like to keep the cache beyond the user's HTTP session. Makes sense only with user caching. Values: true, false. NOTE: It does not appear that EhPageCache.java actually uses this value and persists to a store.
edu.wisc.my.webproxy.webproxy.cache.persistCache
HTTP Headers
- Header name value pair: This provides a list of Header Names and their corresponding value you would like to be included in all Http Requests.
edu.wisc.my.webproxy.webproxy.httpheader.sHeaderName and edu.wisc.my.webproxy.webproxy.httpheader.sHeaderValue
Static HTML Configuration
- Static Header (Optional): Any HTML you would like prepended to the displayable parsed content.
edu.wisc.my.webproxy.webproxy.statichtml.sStaticHeader - Static Footer (Optional): Any HTML you would like appended to the displayable parsed content.
edu.wisc.my.webproxy.webproxy.statichtml.sStaticFooter
Http Configuration
- Http Timeout (Required): The amount of seconds you would like Authenticated WebProxy to wait before determining the remote server is non-responsive.
edu.wisc.my.webproxy.webproxy.httpclient.httpTimeout - Maximum Redirects (Optional): This configurable option has a default value of 5 and will determine the maximum number of times the proxied site is able to redirect the end user.
edu.wisc.my.webproxy.webproxy.httpclient.redirects - Enable Authentication (Optional): Select this box if you would like to enable any type of authentication.
edu.wisc.my.webproxy.webproxy.httpclient.authEnable - Type of Authentication (Optional): Select the type of Authentication the web-based application requires. (BASIC, NTLM or FORM)
edu.wisc.my.webproxy.webproxy.httpclient.sAuthType - Enable Session Persistence (Optional): Click on this to have the session of the end user persisted after the user logs out
edu.wisc.my.webproxy.webproxy.httpclient.sessionPersistenceEnable - Shared Session Key (Optional): A key for storing session to be shared between other WebProxy portlets for user. If left blank, Shared Sessions will be disabled. It is recommended that this variable be unique so that only the intended proxied web applications share the session.
edu.wisc.my.webproxy.webproxy.httpclient.sessionKey - Enter User Name (For Basic Auth ONLY): The username or Ldap value with the option to prompt the end user for individual username and persist this value beyond the end user's session. (e.g., photo_id, or 1234556789)
edu.wisc.my.webproxy.webproxy.httpclient.userName - Enter Password (For Basic Auth ONLY): The password or Ldap value with the option to prompt the end user for individual password and persist this value beyond the end user's session. If using the ldap value, you must wrap the value within ???? for the user specific substitution to occur (e.g., ????). Use must also add this user-attribute to your portlet.xml
edu.wisc.my.webproxy.webproxy.httpclient.password - Session Timeout(For Form Auth ONLY): The amount of minutes until the user's credentials must be posted again.
edu.wisc.my.webproxy.webproxy.httpclient.sessionTimeout - Authentication URL (For Form Auth ONLY): The URL the credentials will be posted to.
edu.wisc.my.webproxy.webproxy.httpclient.sAuthenticationUrl - Additional Dynamic Authentication parameters (For Form Auth ONLY): Dynamic parameters are parameters that must be posted for authentication and are not the same for every portal user. You can enter the parameter name, whether or not you would like to persist the value beyond the user's session, and whether the value is sensitive to the end user. (e.g., userName, password).
edu.wisc.my.webproxy.webproxy.httpclient.sDynamicParameterNames
edu.wisc.my.webproxy.webproxy.httpclient.sDynamicParameterValues
edu.wisc.my.webproxy.webproxy.httpclient.sDynamicParameterPersist
edu.wisc.my.webproxy.webproxy.httpclient.sDynamicParameterSensitive - Additional Static Authentication Parameters: These parameters will be the same for every user that has permission to use the Authenticated WebProxy portlet. If the parameter name does not have a corresponding value, leave blank.
edu.wisc.my.webproxy.webproxy.httpclient.sStaticParameterNames
edu.wisc.my.webproxy.webproxy.httpclient.sStaticParameterValues
Clipping Configuration
- Do Clipping: Click on this checkbox if you would like to configure the Authenticated WebProxy portlet for HTML Clipping. Html Clipping can be used to only display content within certain Absolute Element Paths (e.g., /html/body/), Comments(e.g., <!-clipping->), and Element (e.g., <script>). Please keep in mind that once clipping is enabled the end user will only see the content that has been clipped, all other content will be dropped.
edu.wisc.my.webproxy.webproxy.clipping.sClippingDisable (see note below)
edu.wisc.my.webproxy.webproxy.clipping.sXPath
edu.wisc.my.webproxy.webproxy.clipping.sComment
edu.wisc.my.webproxy.webproxy.clipping.sElement
NOTE: The edu.wisc.my.webproxy.webproxy.clipping.sClippingDisable parameter is poorly named and implies opposite of its behavior. Set to true to PERFORM clipping.
HTML Parser Configuration
- Insert DocType (Optional): Default value of false. Specifies whether the HTML parser should override the public and system identifier values specified in the document type declaration.
edu.wisc.my.webproxy.webproxy.htmlparser.sInsertDocType - Notify References (Optional): Default value of false. Specifies whether the XML built-in entity references (e.g. &, <, etc) should be reported to the registered document handler. This only applies to the five pre-defined XML general entities -- specifically, "amp", "lt", "gt", "quot", and "apos". This is done for compatibility with the Xerces feature.
edu.wisc.my.webproxy.webproxy.htmlparser.???? - Balance Tags (Optional): Default value of false and only recommended for non-malformed HTML. Specifies if the HTML parser should attempt to balance the tags in the parsed document. Balancing the tags fixes up many common mistakes by adding missing parent elements, automatically closing elements with optional end tags, and correcting unbalanced inline element tags.
edu.wisc.my.webproxy.webproxy.htmlparser.sBalanceTags - Strip JavaScript commenting (Optional): Default value of false. Specifies whether the scanner should strip HTML comment delimiters (i.e. "<!-" and "->") from <script> element content.
edu.wisc.my.webproxy.webproxy.htmlparser.sScriptStripComment - Strip Comments (Optional): Default value of false. Specifies whether the scanner should strip HTML comment delimiters (i.e. "<!-" and "->") from <style> element content.
edu.wisc.my.webproxy.webproxy.htmlparser.sStripComments - Report Errors (Optional): Default value of false and should only be used when debugging. Specifies whether errors should be reported to the registered error handler.
edu.wisc.my.webproxy.webproxy.htmlparser.sReportErrors
Testing
The applicationContext.xml file references the default implementations as of the initial check in. One can modify the existing implementations, or implement their own and update the applicationContext.xml accordingly to do further testing.
Default Wiring (Oracle-specific)
<bean id="lobHandler" class="org.springframework.jdbc.support.lob.OracleLobHandler"> <property name="nativeJdbcExtractor"><ref local="nativeJdbcExtractor"/></property> </bean> <bean id="nativeJdbcExtractor" class="org.springframework.jdbc.support.nativejdbc.CommonsDbcpNativeJdbcExtractor"> </bean>
Microsoft SQL Server Wiring:
Use the DefaultLobHandler
for Microsoft SQL Server. As taken from the Javadocs for ClobStringType
:
"...as
DefaultLobCreator
will work with most JDBC-compliant databases respectively drivers. In this case, the field type does not have to beCLOB
: For databases like MySQL and MS SQL Server, any large enough text type will work."
Also, no NativeJdbcExtractor
is necessary as determined from http://forum.springframework.org/showthread.php?t=44047:
You don't need a
NativeJdbcExtractor
for theDefaultLobHandler
. It uses standard JDBC features so there is no need to access the native JDBC Connection.
<bean id="lobHandler" class="org.springframework.jdbc.support.lob.DefaultLobHandler"> </bean>
Running
What's there to say?