Setting up Squid Caching in a ZEO Cluster

Meta:

Valid for:  Silva 0.9
Author:     Jan-Wijbrand Kolman
Email:      jw@infrae.com
CVS:        $Revision: 1.4 $ $Date: 2002/12/23 20:52:08 $

Introduction

These notes were collected during a Squid[1]/Apache/HTTPS/ZEO[7] Cluster setup process to use Silva in a cached, secured and clustered environment.

Although these notes try to assemble a concise set of instructions as accurately as possible, they do require above average knowledge and experience of Apache (including HTTPS) and Zope. Squid knowledge is highly recommended.

Feedback and/or corrections are welcome.

Problem description, Goals

Complex web applications may put a strain on system resources, decreasing performance. A possible solution to increase performance on web applications is to cache content, provided this content is static to a certain degree (e.g. in time spans of minutes, possibly hours, maybe even longer).

Squid can provide such a cache. It can act as a frontend server for underlying application servers. Each web request will be handled by the cache system: it checks whether the requested object is in cache and not yet expired. If this is indeed the case, this object will be served from cache. If not, the request will be forwarded to the web application backend, which will compute the object. Squid, then, stores this object (if certain criteria are met) for consecutive requests, and serves it.

In a clustered environment, a Squid cache on one cluster node is able to communicate with the caches on sibling nodes. This helps spread the load of the web application over the different node even more - only one node needs to compute a requested object, while all other nodes may keep this object in cache.

We will setup Squid so it will:

  • Receive public web requests for objects
  • Serve these objects, if available and not yet expired, or
  • query sibling (neighbor) caches for this object and serve this object, or
  • forward the request to the web application backend (which in this case is Apache. Apache in its turn is a frontend for the Zope instance) and
  • store, if possible, this object for future requests and queries.

We also will setup the web application backend so it will:

  • Provide enough information for Squid to decide which objects may exist in cache at all (and what expiration conditions apply) using HTTP headers in the response to a web request.

Requirements

  • ZEO Clients running as backend for Apache (e.g. FCGI, ProxyPass).
  • An Apache frontend for the ZEO Clients (needed to facilitate virtual hosting, SSL, FCGI etc.) running on port 8080
  • Squid installed.

Squid configuration

A minimal Squid configuration ([2] and [3]) follows. This configuration ignores most configuration options for tuning cache performance, file locations, RAM usage etc., which are not in the scope of this document:

## Squid port
http_port 80

## ACL's taken from standard Squid conf
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/255.255.255.255

## ACL for cache peers in network:
acl peers_src src node-1.domain.tld node-2.domain.tld \
                                             ... node-N.domain.tld
acl peers_dst dst node-1.domain.tld node-2.domain.tld \
                                             ... node-N.domain.tld

## ACL for public 
acl public_access dst virtualhost-1.domain.tld virtualhost-2.domain.tld \
                                           ... virtualhost-N.domain.tld
acl public_access_port port 80 8080

## Define cache siblings. Comment the lines which point to cache
## node "itself":
##cache_peer node-1.domain.tld sibling 80 3130 no-digest proxy-only
##cache_peer_access node-1.domain.tld allow public_access
##cache_peer_access node-1.domain.tld deny all

cache_peer node-2.domain.tld sibling 80 3130 no-digest proxy-only
cache_peer_access node-2.domain.tld allow public_access
cache_peer_access node-2.domain.tld deny all

...

cache_peer node-N.domain.tld sibling 80 3130 no-digest proxy-only
cache_peer_access node-N.domain.tld allow public_access
cache_peer_access node-N.domain.tld deny all

## Allow ICP communication between cache peers
icp_access allow peers_src

## FIXME: not sure about this option
prefer_direct off

## Host being accelerated
httpd_accel_host 127.0.0.1
httpd_accel_single_host on
httpd_accel_port 8080

## Proxy on, needed to make cache peers intercommunicate
## Without proper security measures, this could result in an "open
## proxy". 
httpd_accel_with_proxy on

## Keeps information for Apache; needed for virtual hosting       
httpd_accel_uses_host_header on

## Final access control
http_access allow peers_src peers_dst
http_access allow public_access public_access_port
http_access deny all

HTTP Response headers

The HTTP protocol [8] defines several response headers to instruct intermediate caches what to do with requested objects.

The values for these HTTP headers need experimentation and are highly dependent on the nature of the web application, the content served, expected request patterns, cluster setup and estimated use of resources (CPU, memory, network capacity, redundancy, etc.).

HttpHeaders Examples (in Zope Page Template tal expressions):

  • Setting maximum age for this object::

    "request.RESPONSE.setHeader('Cache-Control','max-age=360')"

  • Setting a expiration date/time for this object::

    "request.RESPONSE.setHeader('Cache-Control','Expires: Thu, 01 Dec 1994 16:00:00 GMT')"

Where "max-age" is in seconds and the "Expires" date expects a RFC1123 formatted string [9].

If both headers are present, the "max-age" overides "Expires" in any case.

Caveats / ToDo

  • If either the Apache or Zope instances are down, Squid will return HTTP-responses to the requesting Client. Can the load balancer distinguish "correct" responses from responses indicating serious errors on one of the nodes?
  • It should (could?) be possible to proactively invalidate objects in the Squid caches. This would increase performance, validity of the cached object and responsiveness to changes in the content.

References

[1] Squid Caching Proxy

[2] Squid configuration

[3] Squid User Guide

[4] HTTP Caching and Zope

[5] Squid as an Accelerator for Zope

[6] VirtualHosting, HTTPS, Apache and Zope

[7] ZEO Clusters

[8] RFC 2616 HTTP/1.1

[9] RFC 1123 Requirements for Internet Hosts, Application and Support

Copyright © 2002-2004 Infrae. All rights reserved.
See also "LICENSE.txt" in the Silva package.

OSI Certification Mark Public domain: no rights reserved
Public Domain

Scroll to top of page To table of contents for the site: acc-m Search the site: acc-f To site index: acc-i Find content in the site: acc-f No link