Managing Online Forums, a manual for the community admin

Bookmark and Share

Use Static HTML Pages (a Solution to Search Engines)

Use Static HTML Pages (a Solution to Search Engines)

Postby FB-ke » August 16th 2003, 1:50 pm

Before we get started, I want to remind you to please be sure to keep a backup of all files you modify!!!

Introduction
After many people asked how you could make your forum more search-engine friendly and after a lot of suggestions and stuff like that, I have yet to come to a perfect solution. Yes, phpBB 2.0.5 and higher are supposed to be search-engine friendly, but I still haven't seen that working in action (although Google has crawled the forum I am developer on regularly since the upgrades).

I have tried to make something consistent, yet unreleased because it's still in testing phase, by removing all session id's if the user is anonymous. But still, I haven't got a clue if that is a solution (although I have added it in here as well ;-)).

What I do know, is that no matter what, Google will never like the query strings attached to the URL, not even if it's one. Google only likes static pages, pages with an HTML extension, and nothing more. So, why not rewrite certain URLs to just that? It's fairly easy, but asks quite a bit of work.

Where do we start?
How about a little explanation of mod_rewrite? It's an Apache module that rewrites URLs to other URLs using rules and/or conditions. Using simple or regular expressions, you can write powerful rules and conditions that will allow you to make it easier for your users (or in our case: Google) to access your forums.

It's all pretty straightforward once you get into it, but you may be stunned in the beginning - mainly when you need to figure out why that bloody Apache 2 server throws up a 500 server error while Apache 1.3.x runs fine when running per-directory rewrites :roll:.

Per-directory rewrites?
Yes, there's two different kinds of rewrites. There's per-server and per-directory.

Per-server is the best and fastest solution. You put the rewrite code inside the Apache configuration. mod_rewrite is designed to handle rewrites before the URL is sent to the browser.

Per-directory is the safest solution for web server administrators. You have to use a .htaccess file inside the directory where you want to rewrite the URLs. This solution is more practical for them, but may increase page loading time on your site. It depends though and I have never heard anyone say it just plain kills the connection (500 server error).

I take it that you've fallen asleep by now now. Sorry... :blush:

What do we want to achieve?
Basically, we want to make sure that at least our index page, our forums and our topics are spidered correctly by Google (and other bots). So, instead of using the .php extensions with the question mark and a whole bunch of variables after that, we'll rewrite the URLs to .html. So, they'll be converted in a format similar to this:

Code: Select all
viewtopic.php?p=<post_id>#<post_id> = viewpost/<post_id>.html
viewtopic.php?t=<topic_id> = viewtopic/<topic_id>.html
viewforum.php?f=<forum_id> = viewforum/<forum_id>.html


We will be using fake directories to define what page we want to see and in that directory, we retrieve a file with the specific id (post id, topic id, forum id).

How to do the per-server rewrite?
As I said, this is the fastest way to rewrite URLs, but you need server access for this. If you have server access, carry on reading this part, otherwise go to the next part which handles the per-directory rewrite.

Of course, you must have mod_rewrite activated in Apache (or Apache 2). Make sure this is the case. Now we can continue. :-)

What we'll do, is create a directory block that specifies what directory we want to rewrite files in and how. Try to find a good place in the configuration yourself. If you have server access, I am going to assume that you have knowledge of configuring Apache. ;-)

Code: Select all
#
#-----[ OPEN ]--------------------------------------------
#
/path/to/httpd.conf

#
#-----[ FIND ]--------------------------------------------
#
<a good place to add the new directory block -- usually after the other directory blocks is fine>

#
#-----[ AFTER, ADD ]--------------------------------------
#
<Directory "/path/to/phpBB">
	RewriteEngine on
	RewriteRule    viewpost/([1-9]+?[0-9]*)\.html$     viewtopic.php?p=$1#$1 [NE,L]
	RewriteRule    viewtopic/([1-9]+?[0-9]*)\.html$    viewtopic.php?t=$1    [L]
	RewriteRule    viewforum/([1-9]+?[0-9]*)\.html$    viewforum.php?f=$1    [L]
</Directory>

#
#-----[ SAVE & CLOSE ALL FILES ]--------------------------
#


Save the configuration and restart Apache. Now try to access your phpBB and try to open a file like e.g. http://yourdomain.com/phpBB2/viewtopic/1.html. It should rewrite the URL fine except for the images. The paths will probably not look good yet; this is due to the fake directory path we added. We'll fix that later on.

How to do the per-directory rewrite?
Maybe this will increase loading time (only a little bit though), but at least you'll be listed on Google. ;-)

First, you have to make sure that Apache allows you to use rewrites. The server administrator must have set the options for your site at least to Options FollowSymLinks. I have found that it doesn't work on Apache 2.0.47 on Mac OS 10.2.6 (I have to "AllowOverride All" as well :-(). Probably this is not the case on other platforms. Be sure to contact your server administrator if you feel you don't know anything about this. Just tell him you want to rewrite URLs using mod_rewrite. S/he'll know what to do ;-).

Anyways, create a file named .htaccess with Notepad (Mac: BBEdit). If you have this file on your server already, just append the following, otherwise paste this in the new file.

Code: Select all
RewriteEngine on
RewriteBase    /phpBB2
RewriteRule    viewpost/([1-9]+?[0-9]*)\.html$     viewtopic.php?p=$1#$1 [NE,L]
RewriteRule    viewtopic/([1-9]+?[0-9]*)\.html$    viewtopic.php?t=$1    [L]
RewriteRule    viewforum/([1-9]+?[0-9]*)\.html$    viewforum.php?f=$1    [L]


See the line that says RewriteBase /phpBB2? It says that the phpBB directory is in /phpBB2. Example: http://yourdomain.com/phpBB2/. If you have a different path to your phpBB, change this accordingly (if it's in the site root, just a / will do). Make sure to never add a trailing slash!

Upload this file to your server, and try to access your phpBB. If it throws up a 500 server error, it probably means you don't have permissions to use mod_rewrite. Contact your server administrator and ask him/her to give you the correct permissions.

Now try to open a file like e.g. http://yourdomain.com/phpBB2/viewtopic/1.html. It should rewrite the URL fine, except for the images. The paths will probably not look good yet; this is due to the fake directory path we added. We'll fix that now.

Why are the paths for the images incorrect?
Well, because we added a fake directory. phpBB thinks the images are in /phpBB2/viewtopic/templates/subSilver/images/, and not in /phpBB2/templates/subSilver/images/. We'll fix that with a simple solution in the page header template file(s).

Code: Select all
#
#-----[ OPEN ]--------------------------------------------
#
includes/page_header.php

#
#-----[ FIND ]--------------------------------------------
#
//
// The following assigns all _common_ variables that may be used at any point
// in a template.
//

#
#-----[ BEFORE, ADD ]-------------------------------------
#
//
// Set the base href
//
$script_path = trim($board_config['script_path']);
$server_name = trim($board_config['server_name']);
$server_protocol = ( $board_config['cookie_secure'] ) ? 'https://' : 'http://';
$server_port = ( $board_config['server_port'] <> 80 ) ? ':' . trim($board_config['server_port']) : '';
$base_href = $server_protocol . $server_name . $server_port . $script_path;

#
#-----[ FIND ]--------------------------------------------
#
	'PAGE_TITLE' => $page_title,

#
#-----[ AFTER, ADD ]--------------------------------------
#
	'BASE_HREF' => $base_href,

#
#-----[ OPEN ]--------------------------------------------
#
# Make sure to edit this file for every template installed
#
templates/subSilver/overall_header.tpl

#
#-----[ FIND ]--------------------------------------------
#
<title>{SITENAME} :: {PAGE_TITLE}</title>

#
#-----[ AFTER, ADD ]--------------------------------------
#
<base href="{BASE_HREF}" />

#
#-----[ SAVE & CLOSE ALL FILES ]--------------------------
#


Now try to access one of the HTML URLs again. The images should look up fine.

How to have Google see the HTML pages?
So, we can access those HTML pages ourselves now, but Google won't see them. So how do we fix that?

The simplest solution I could come up with was writing a function and rewrite a few URLs to show up as HTML pages when the user is anonymous or to show up as the regular pages when s/he's not.
On top, we will make sure that no session id's are attached to any other URLs if the user is an anonymous user, ever! This is not to solve the problem with only viewtopic.php and viewforum.php, but for all other files on your forum.

Here comes the function:

Code: Select all
#
#-----[ OPEN ]--------------------------------------------
#
includes/functions.php

#
#-----[ FIND ]--------------------------------------------
#
?>

#
#-----[ BEFORE, ADD ]-------------------------------------
#
function rewrite_url($mode, $id, $append_info = '')
{
	global $phpEx, $userdata;

	switch( $mode )
	{
		case 'post':
			$url = ( $userdata['user_id'] == ANONYMOUS ) ? "viewpost/$id.html" : append_sid("viewtopic.$phpEx?" . POST_POST_URL . "=$id$append_info") . "#$id";
			break;

		case 'topic':
			$url = ( $userdata['user_id'] == ANONYMOUS ) ? "viewtopic/$id.html" : append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$id$append_info");
			break;

		case 'forum':
			$url = ( $userdata['user_id'] == ANONYMOUS ) ? "viewforum/$id.html" : append_sid("viewforum.$phpEx?" . POST_FORUM_URL . "=$id$append_info");
			break;

		default:
			$url = '';
			break;
	}

	return $url;
}

#
#-----[ OPEN ]--------------------------------------------
#
includes/sessions.php

#
#-----[ FIND ]--------------------------------------------
#
function append_sid($url, $non_html_amp = false)
{
	global $SID;

#
#-----[ AFTER, ADD ]--------------------------------------
#
	global $userdata;

	if( $userdata['user_id'] == ANONYMOUS )
	{
		return $url;
	}

#
#-----[ SAVE & CLOSE ALL FILES ]--------------------------
#


How to make the HTML URLs show up?
The key to the mystery will be solved now. We're almost there. :-) After this, I take it you're ready for the big trials and you better order yourself a Googlebot visit. ;-)

What we'll do, is replace the most frequent URLs with the rewrite_url function. It's not that hard at all.

Code: Select all
#
#-----[ OPEN ]--------------------------------------------
#
index.php

#
#-----[ FIND ]--------------------------------------------
#
								$last_post .= '<a href="' . append_sid("viewtopic.$phpEx?"  . POST_POST_URL . '=' . $forum_data[$j]['forum_last_post_id']) . '#' . $forum_data[$j]['forum_last_post_id'] . '"><img src="' . $images['icon_latest_reply'] . '" border="0" alt="' . $lang['View_latest_post'] . '" title="' . $lang['View_latest_post'] . '" /></a>';

#
#-----[ REPLACE WITH ]------------------------------------
#
								$last_post .= '<a href="' . rewrite_url('post', $forum_data[$j]['forum_last_post_id']) . '"><img src="' . $images['icon_latest_reply'] . '" border="0" alt="' . $lang['View_latest_post'] . '" title="' . $lang['View_latest_post'] . '" /></a>';

#
#-----[ FIND ]--------------------------------------------
#
								'U_VIEWFORUM' => append_sid("viewforum.$phpEx?" . POST_FORUM_URL . "=$forum_id"))

#
#-----[ REPLACE WITH ]------------------------------------
#
								'U_VIEWFORUM' => rewrite_url('forum', $forum_id))

#
#-----[ OPEN ]--------------------------------------------
#
viewforum.php

#
#-----[ FIND ]--------------------------------------------
#
	'U_VIEW_FORUM' => append_sid("viewforum.$phpEx?" . POST_FORUM_URL ."=$forum_id"),

#
#-----[ REPLACE WITH ]------------------------------------
#
	'U_VIEW_FORUM' => rewrite_url('forum', $forum_id),

#
#-----[ FIND ]--------------------------------------------
#
							$newest_post_img = '<a href="' . append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$topic_id&view=newest") . '"><img src="' . $images['icon_newest_reply'] . '" alt="' . $lang['View_newest_post'] . '" title="' . $lang['View_newest_post'] . '" border="0" /></a> ';

#
#-----[ REPLACE WITH ]------------------------------------
#
							$newest_post_img = '<a href="' . rewrite_url('topic', $topic_id, '&view=newest') . '"><img src="' . $images['icon_newest_reply'] . '" alt="' . $lang['View_newest_post'] . '" title="' . $lang['View_newest_post'] . '" border="0" /></a> ';

#
#-----[ FIND ]--------------------------------------------
#
						$newest_post_img = '<a href="' . append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$topic_id&view=newest") . '"><img src="' . $images['icon_newest_reply'] . '" alt="' . $lang['View_newest_post'] . '" title="' . $lang['View_newest_post'] . '" border="0" /></a> ';

#
#-----[ REPLACE WITH ]------------------------------------
#
						$newest_post_img = '<a href="' . rewrite_url('topic', $topic_id, '&view=newest') . '"><img src="' . $images['icon_newest_reply'] . '" alt="' . $lang['View_newest_post'] . '" title="' . $lang['View_newest_post'] . '" border="0" /></a> ';

#
#-----[ FIND ]--------------------------------------------
#
				$goto_page .= '<a href="' . append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=" . $topic_id . "&start=$j") . '">' . $times . '</a>';

#
#-----[ REPLACE WITH ]------------------------------------
#
				$goto_page .= '<a href="' . rewrite_url('topic', $topic_id, "&start=$j") . '">' . $times . '</a>';

#
#-----[ FIND ]--------------------------------------------
#
		$view_topic_url = append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$topic_id");

#
#-----[ REPLACE WITH ]------------------------------------
#
		$view_topic_url = rewrite_url('topic', $topic_id);

#
#-----[ FIND ]--------------------------------------------
#
		$last_post_url = '<a href="' . append_sid("viewtopic.$phpEx?"  . POST_POST_URL . '=' . $topic_rowset[$i]['topic_last_post_id']) . '#' . $topic_rowset[$i]['topic_last_post_id'] . '"><img src="' . $images['icon_latest_reply'] . '" alt="' . $lang['View_latest_post'] . '" title="' . $lang['View_latest_post'] . '" border="0" /></a>';

#
#-----[ REPLACE WITH ]------------------------------------
#
		$last_post_url = '<a href="' . rewrite_url('post', $topic_rowset[$i]['topic_last_post_id']) . '"><img src="' . $images['icon_latest_reply'] . '" alt="' . $lang['View_latest_post'] . '" title="' . $lang['View_latest_post'] . '" border="0" /></a>';

#
#-----[ OPEN ]--------------------------------------------
#
viewtopic.php

#
#-----[ FIND ]--------------------------------------------
#
$view_forum_url = append_sid("viewforum.$phpEx?" . POST_FORUM_URL . "=$forum_id");

#
#-----[ REPLACE WITH ]------------------------------------
#
$view_forum_url = rewrite_url('forum', $forum_id);

#
#-----[ FIND ]--------------------------------------------
#
# For phpBB 2.0.3 Only
#
	'U_VIEW_TOPIC' => append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$topic_id&start=$start&postdays=$post_days&postorder=$post_order&highlight=" . $HTTP_GET_VARS['highlight']),

#
#-----[ REPLACE WITH ]------------------------------------
#
# For phpBB 2.0.3 Only
#
	'U_VIEW_TOPIC' => rewrite_url('topic', $topic_id, "&start=$start&postdays=$post_days&postorder=$post_order&highlight=" . $HTTP_GET_VARS['highlight']),

#
#-----[ FIND ]--------------------------------------------
#
# For phpBB 2.0.4 - 2.0.6 Only
#
	'U_VIEW_TOPIC' => append_sid("viewtopic.$phpEx?" . POST_TOPIC_URL . "=$topic_id&start=$start&postdays=$post_days&postorder=$post_order&highlight=$highlight"),

#
#-----[ REPLACE WITH ]------------------------------------
#
# For phpBB 2.0.4 - 2.0.6 Only
#
	'U_VIEW_TOPIC' => rewrite_url('topic', $topic_id, "&start=$start&postdays=$post_days&postorder=$post_order&highlight=$highlight"),

#
#-----[ FIND ]--------------------------------------------
#
	$mini_post_url = append_sid("viewtopic.$phpEx?" . POST_POST_URL . '=' . $postrow[$i]['post_id']) . '#' . $postrow[$i]['post_id'];

#
#-----[ REPLACE WITH ]------------------------------------
#
	$mini_post_url = rewrite_url('post', $postrow[$i]['post_id']);

#
#-----[ SAVE & CLOSE ALL FILES ]--------------------------
#


There. What we've done is parsed the URLs through the function, specifying what mode we need. I have pre-written 'post', 'topic' and 'forum," and you are thus able to access these three using the HTML pages.

If you want other pages to be using the .html extension, e.g. viewprofile, just add a new rule in the rewrite_url function (although I restrict viewprofile to regged members only).

One note about the 'post' rewrite. I have read through the entire mod_rewrite reference and the Rewrite Guide on apache.org, but I have not figured out yet why the hash is not added unescaped. I have specified it to not escape the hash, but yet it does anyway :-(. If anyone knows the solution to this, please let me know.


All done? Yeah!
Now that's done! Be sure you at least have notified Google of the presence of your website, by adding it to the list here.

Disclaimer:
This code is tested on a phpBB 2.0.3 - 2.0.6 and works fine offline. I haven't tested it online yet and thus have no idea if you will be more successfully listed on Google (although changes are extremely high!). Thus, I give no guarantee that this will have you listed (higher) on Google or any other search engine. I can only state this is one of the best solutions anyone has come up with thus far.
Last edited by FB-ke on September 3rd 2003, 11:39 am, edited 1 time in total.
FB-ke
VIP
 
Posts: 3175
Joined: December 12th 2002, 10:14 pm
Location: Scotland, UK

Postby FB-ke » August 20th 2003, 6:20 pm

As per Patrick's request, I will specify how to use one session id for all Googlebots that are currently on your site. Mainly because, if all bots end up on your forum at the same time, it may kill your forum because the sessions table gets full and the database dies (no data lost though, don't worry!).

This add-on is mainly based on a patch I found on another website, but I can't remember where. All I know, is that the IP used in that patch, is no longer used by the Google bots. Thus I have searched Google to find one of the IP's they currently use, and I found '4044524d' (this is the coded one of course) to be mostly used. So I modified it, and made it look a little more structured.

Code: Select all
#
# OPEN
#
includes/sessions.php

#
# FIND
#
function session_begin($user_id, $user_ip, $page_id, $auto_create = 0, $enable_autologin = 0)
{
	global $db, $board_config;
	global $HTTP_COOKIE_VARS, $HTTP_GET_VARS, $SID;

#
# AFTER, ADD
#
	global $HTTP_SERVER_VARS;

#
# FIND
#
		$session_id = md5(uniqid($user_ip));

#
# REPLACE WITH
#
		$session_id = ( !stristr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'googlebot') ) ? md5(uniqid($user_ip)) : md5('4044524d');

#
# FIND
#
function session_pagestart($user_ip, $thispage_id)
{
	global $db, $lang, $board_config;
	global $HTTP_COOKIE_VARS, $HTTP_GET_VARS, $SID;

#
# AFTER, ADD
#
	global $HTTP_SERVER_VARS;

#
# FIND
#
# For phpBB 2.0.4 Only
#
# Around line 230
#
	else
	{
		$sessiondata = '';
		$session_id = ( isset($HTTP_GET_VARS['sid']) ) ? $HTTP_GET_VARS['sid'] : '';
		$sessionmethod = SESSION_METHOD_GET;
	}

#
# AFTER ADD
#
# For phpBB 2.0.4 Only
#
	if( empty($session_id) && stristr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'googlebot') )
	{
		$sessiondata = '';
		$session_id = md5('4044524d');
		$sessionmethod = SESSION_METHOD_GET;
	}

#
# FIND
#
# For phpBB 2.0.5 - 2.0.6 Only
#
# Around line 230
#
	else
	{
		$sessiondata = array();
		$session_id = ( isset($HTTP_GET_VARS['sid']) ) ? $HTTP_GET_VARS['sid'] : '';
		$sessionmethod = SESSION_METHOD_GET;
	}

#
# AFTER ADD
#
# For phpBB 2.0.5 - 2.0.6 Only
#
	if( empty($session_id) && stristr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'googlebot') )
	{
		$sessiondata = '';
		$session_id = md5('4044524d');
		$sessionmethod = SESSION_METHOD_GET;
	}

#
# FIND
#
			if ($ip_check_s == $ip_check_u)

#
# REPLACE WITH
#
			if ( ($ip_check_s == $ip_check_u) || ($session_id == md5('4044524d') && (stristr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'googlebot'))) )

#
# SAVE & CLOSE ALL FILES
#


This code is compatible with phpBB 2.0.4 to 2.0.6.

Remind you, this will not give you more visits from Google, nor will it put you on Google. It only makes sure your database doesn't die when Google visits your forum.
FB-ke
VIP
 
Posts: 3175
Joined: December 12th 2002, 10:14 pm
Location: Scotland, UK


Return to phpBB 2: Fixes and Code Changes

Who is online

Users browsing this forum: No registered users and 0 guests