Make IT Simple: September 2007

Friday, September 28, 2007

Hibernate dynamic mapping and Dom4J enabled sessions

Hibernate from version 3.0? provide a very useful feature for people who develop application frameworks. Indeed this feature allows you to work directly with XML documents and elements which represent entities.
Imagine that you have an application or an SDK which help users to manipulate data from different RDBMSs. Hibernate provide rich configuration facilities which help you configure Hibernate dynamically in term of adding mapping data or other configuration artifacts that usually stores in hibernate.cfg.xml or equal properties files.

As we are planning to use Hibernate dynamic mapping and Dom4J entity mode i am going to blog about it during my evaluation.
OK, Hibernate provide 3 kinds of entity mode

POJO
DOM4J
MAP
Default mode sets to be POJO as it is most commonly used mode. This modes tell session how it should handle entities. We can configure a session to use any of this modes when we need that mode, but we can configure it in hibernate configuration file for by adding a property like

dom4jTo hibernate.cfg.xml . but for our sample we will create a session with dom4j entity mode. you can find a complete sample for this blog entry here . Make sure that you read readme file in project folder before you go toward executing it. For this sample I used Netbeans 6.0 M6 (which really rules) and Hibernate 3.2.1 . I wont tell steps to create project, XML file or ... but just actions and core required for hibernate side. you can see project structure in the following image.

As you can see it is a basic ant based project.
Let me give you content of each file and explain about it as much as i could. First of all lets see what we have in hibernate.cfg.xml

com.mysql.jdbc.Driver jdbc:mysql://localhost/hiberDynamic root root 5 20 300 50 3000 org.hibernate.dialect.MySQLDialect The configuration file is a simple and traditional hibernate configuration file with pooling enabled and dialect sets to MySQL ones.
We have one mapping file which is named student.hbm.xml so we include it into the configuration file. If you do not have MySQL around then use Derby which is included into NetBeans ;-) .

Log4J configuration is another traditional one, as you see log4j.appender.stdout=org.apache.log4j.FileAppender log4j.appender.stdout.File=messages_dynamic.log log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n log4j.rootLogger=WARN, stdout

We used a file appender which send formatted log entry into a file named messages_dynamic.log in project root directory. next file which we are going to take a look is Student.hbm.xml it is our mapping file, where we define the student as a dynamic entity.

As you can see there is just one change in mapping file, we have entity-name attribute instead of class attribute. You should know that can have both class and entity-name attribute so an entity could be dynamic or mapped to a concrete class.

Next step is looking at our HibernateUtil which is known to the community for Hibernate booting and hibernate instance management.
here is its code:

package persistence;
import org.hibernate.*;
import org.hibernate.cfg.*;
public class HibernateUtil {
private static SessionFactory sessionFactory;

static {
try { sessionFactory = new Configuration().configure().buildSessionFactory(); }
catch (Throwable ex) { throw new ExceptionInInitializerError(ex); }
}

public static SessionFactory getSessionFactory() { return sessionFactory; }
public static void shutdown() { getSessionFactory().close(); }
}

Noting extra here. lets look at last part in which we try to use dom4j session to manipulate our data.

package dynamic;
import java.util.*;
import org.hibernate.EntityMode;
import org.hibernate.Query; import org.hibernate.Session;
import org.hibernate.Transaction;
import persistence.HibernateUtil; import org.dom4j.*;

public class DynamicMapping {
public static void main(String[] args) {
Session session = HibernateUtil.getSessionFactory().openSession().getSession(EntityMode.DOM4J);
Transaction tx = session.beginTransaction(); Query deleteQuery = session.createQuery("delete from Student");
deleteQuery.executeUpdate(); tx.commit(); tx = session.beginTransaction();
//create some some student and save them
{ Element anStudent = DocumentHelper.createElement("Student");
Element nameElement = DocumentHelper.createElement("name"); nameElement.setText("Alice");
Element lastNameElement = DocumentHelper.createElement("lastName"); lastNameElement.setText("Cooper");
anStudent.add(nameElement);
anStudent.add(lastNameElement);
session.save(anStudent);
}
{
Element anStudent = DocumentHelper.createElement("Student");
Element nameElement = DocumentHelper.createElement("name"); nameElement.setText("Lea"); Element lastNameElement = DocumentHelper.createElement("lastName");
lastNameElement.setText("Connor");
anStudent.add(nameElement); anStudent.add(lastNameElement);
session.save(anStudent);
}

tx.commit();
//List all student Query q = session.createQuery("from Student ");
List students = q.list(); org.dom4j.Element el = (org.dom4j.Element)students.get(0); System.out.println(el.getText());
for (Iterator it = students.iterator(); it.hasNext();)
{
org.dom4j.Element student = (org.dom4j.Element)it.next();
System.out.println("Printing an Student details: ");
for ( Iterator i = student.elementIterator(); i.hasNext(); )
{
Element element = (Element) i.next();
System.out.println( element.getName()+": "+ element.getText());
} }
//retrieve an student, update and save it
q = session.createQuery("from Student where name =:studentName "); q.setParameter("studentName", "Alice"); Element alice = (Element) q.uniqueResult(); alice.element("name").setText("No Alice any more");
tx=session.beginTransaction();
session.save(alice);
tx.commit(); session.close();
HibernateUtil.shutdown();
} }

In the begging we create a session with dom4j entity mode. so it will return Dom4J elements as our entities. in next two blocks i have create two students one is Alice Cooper and the other is John connor (what does this name remind you? ;-) . we simply ask our session to save them as we do for usual POJO mode. Session know what to do with dom4j elements as it is configured as a DOM4J session.
In Second block we query our table and retrieve all entities into a list, but this list is not a list of Student POJOs instead it is a list of DOM4J elements. so we need to do some XML processing when we want to extract our entity properties. you can learn more about DOM4J at Here .

Next step we retrieve a single row, edit and save it into our database, Its all simple DOM4J operation which you should use over some elements to manipulate your data.

Build file that i used contains two target that we will use during this project. first one is hbm2ddl which will create our database structure and the second one is run target which will execute our main class. it is not required to include build file here you can download the sample and check it yourself. make sure you look at readme file before digging into execution of application.

In next few days I will try to do a simple benchmark for some simple CRUD operation to have a basic clue about DOM4J entity mode in our environment.

NetBeans 6 M7 released. There are some cool features available and cool modules in UC.

NetBeans 6 milestone 7 has released and you can download it from NetBeans web site by at least two methods

Traditional setup file for separate packs
using new NetBeans installer which allows you to customize your package before you download it.

Traditional method for downloading NetBeans 6 which is a development build is using http://www.netbeans.info/downloads/dev.php and the new installer which I am going to talk about is available at http://nbi.netbeans.org/ .
Google pack installer is somehow similar to new NetBeans installer. In one simple paragraph it allows you to customize what you need in one mega installer and then you can download your customized package to your local machine to install those components into your system.
features that new installer has and I really like them are as follow (sure you may find some there feature that I have not mentioned )

Download what you need by customizing the download package.

From bundle that you have download you may not install all of them at the same time.

When you choose to install glassfish you have option to select its path, ports, and default JDK which are important options.

You are allowed to uninstall some of components when you do not need them by running the installer again.

Installer register any component that you install into other components (Application server, mobility features,....).
Tight integration with windows add or remove programs.
And many other features that I do not noticed yet.

But there are some other cool changes in NetBeans IDE, look what we have as our icon set, it remind me some of Firefox themes with fantasy icons. I hope we could see some modularity mechanism for icons which will allows NetBeans fans to develop some new icon set or use older version of icon set.

Based on NetBeans 6 Feature Plan we will have RoR and JavaScript support in NetBeans 6 M7, so this features should be inside our downloaded IDE or in update center, lets look and see what does we have in NetBeans new project and file and also in update centers.
Yes, we have JavaScript and ROR support built-in.

Lets see what else we can find in update center that is not available for 5.5.* or in NetBeans 6 M7 out of the box.

	As you can see RoR support is available in update center. If you are one of those RoR fans you should be happy now.
	Ok, there are tens of new language support based on new common scripting language support. Support for this languages is in several levels, from syntax highlight to code completion. you can try one that you need and if you find it useful you can vote for this future. NetBeans development team will value community inputs.
	There are some other good module which i have selected to be installed. As you can see Maven support is there too.

OK, what that really amazed me about new modules for NetBeans IDE is Jasper Report Visual designer, Its really cool to see a report designer for jasper report integrated into NetBeans.

I know that it has some basic integration but it is the initial release of this really cool module. here are some images of Jasper report designer integrated into NetBeans.

the project home page is https://jarvis.dev.java.net/ credits goes to developer of this really required module.

NetBeans 6 release is as amazing as NetBeans 5 release was.

Wikipedia: Our virtual Middle Ages

Wikipedia embodies a democratic medievalism that does not respect claims to personal expertise in the absence of verifiable sources.

By Steve Fuller

Wikipedia , the online encyclopedia, is the most impressive collective intellectual project ever attempted - and perhaps achieved. It demands both the attention and the contribution of anyone concerned with the future of knowledge. Because of the speed with which it has become a fixture in cyberspace, Wikipedia 's true significance has gone largely unremarked. Since its sixth anniversary in 2007, Wikipedia has consistently ranked in the top ten most frequently viewed Web sites worldwide. Everyday it is consulted by 7% of all 1.2 billion Internet users, and its rate of usage is growing faster than that of Internet usage as a whole.

Wikipedia is an encyclopedia to which anyone with a modicum of time, articulateness, and computer skills can contribute. Anyone can change any entry or add a new entry, and the results will immediately appear for all to see - and potentially contest. "Wiki" is a Hawaiian root that was officially added to English in 2007 to signify something done quickly - in this case, changes in the collective body of knowledge. Some 4.7 million "Wikipedians" have now contributed to 5.3 million entries, one-third of which are in English, with the rest in more than 250 other languages. Moreover, there is a relatively large group of hardcore contributors: roughly 75,000 Wikipedians have made at least five contributions in any given 30-day period.

The quality of articles is uneven, as might be expected of a self-organizing process, but it is not uniformly bad. True, topics favored by sex-starved male geeks have been elaborated in disturbingly exquisite detail, while less alluring matters often lie fallow. Nevertheless, according to University of Chicago Law professor Cass Sunstein, Wikipedia is now cited four times more often than the Encyclopedia Britannica in US judicial decisions. Moreover, Nature 's 2005 evaluation of the two encyclopedias in terms of comparably developed scientific articles found that Wikipedia averaged four errors to the Britannica 's three. That difference probably has been narrowed since then.

Wikipedia's boosters trumpet it as heralding the arrival of "Web 2.0." Whereas "Web 1.0" facilitated the storage and transmission of vast amounts of different kinds of information in cyberspace, "Web 2.0" supposedly renders the whole process interactive, removing the final frontier separating the transmitter and receiver of information. But we have been here before - in fact, for most of human history.

The sharp divide between producers and consumers of knowledge began only about 300 years ago, when book printers secured royal protection for their trade in the face of piracy in a rapidly expanding literary market. The legacy of their success, copyright law, continues to impede attempts to render cyberspace a free marketplace of ideas. Before, there were fewer readers and writers, but they were the same people, and had relatively direct access to each other's work.

Indeed, a much smaller, slower, and more fragmented version of the Wikipedia community came into existence with the rise of universities in twelfth- and thirteenth-century Europe. The large ornamental codices of the early Middle Ages gave way to portable "handbooks" designed for the lighter touch of a quill pen. However, the pages of these books continued to be made of animal hide, which could easily be written over.

This often made it difficult to attribute authorship, because a text might consist of a copied lecture in which the copyist's comments were inserted and then perhaps altered as the book passed to other hands. Wikipedia has remedied many of those technical problems. Any change to an entry automatically generates a historical trace, so entries can be read as what medieval scholars call a "palimpsest," a text that has been successively overwritten. Moreover, "talk pages" provide ample opportunity to discuss actual and possible changes.

While Wikipedians do not need to pass around copies of their text - everyone owns a virtual copy - Wikipedia 's content policy remains deeply medieval in spirit.

That policy consists of three rules:

No original research;
A neutral point of view;
Verifiability.

These rules are designed for people with reference material at their disposal but no authority to evaluate it. Such was the epistemic position of the Middle Ages, which presumed all humans to be mutually equal but subordinate to an inscrutable God. The most one could hope for, then, was a perfectly balanced dialectic. In the Middle Ages, this attitude spawned scholastic disputation.

In cyberspace, the same practice, often dismissed as "trolling," remains the backbone of Wikipedia 's quality control. Wikipedia embodies a democratic medievalism that does not respect claims to personal expertise in the absence of verifiable sources. To fully realize this ideal, participation in Wikipedia might be made compulsory for advanced undergraduates and Master's degree candidates worldwide. The expected norms of conduct of these students correspond exactly to Wikipedia 's content policy: one is not expected to do original research, but to know where the research material is and how to argue about it.

Compulsory student participation would not only improve Wikipedia 's already impressive collective knowledge base, but also might help curb the elitist pretensions of researchers in the global knowledge system.

(Steve Fuller is Professor of Sociology at the University of Warwick, United Kingdom. He is the author of The Knowledge Book: Key Concepts in Philosophy, Science and Culture.)

Copyright: Project Syndicate/Institute for Human Sciences, 2007. Exclusive to The Sunday Times

Fighting the OutOfMemoryError

Searching for memory leaks in an application is always a very difficult task. I experienced memory problems in a Java web application two months ago. After running some days an OutOfMemoryException was thrown because the application went out of heap memory.

Monitoring the application with the jconsole - which is an excellent tool for this task and it is included in the Sun JDK - showed that there was an amount of heap memory that could not be garbage collected and it slightly but steady increased over time. This meant that I had to search for the references that prevent the objects from getting cleared.

The first thing I did is to create multiple histograms of the heap while working with the application, destroying the session and starting garbage collection from jconsole. I did this by using jmap which is a small tool included in Sun JDK since version 5:

jmap -histo PID > histo.txt

Of course I had to fill in the PID of the VM running my application. Comparing the generated histograms I found that the number of instances of some objects in my application got more and more. These objects should have been garbage collected because they were just entity beans that should not be referenced anymore.

To go to the next level I used new features of Sun JDK 6. The jmap version of JDK 6 has more options than the version of JDK 5 and is much faster. Creating the histogram with version 5 took 58 seconds compared to about 1 (!) second with version 6. Additionally I used the :live option of jmap 6 to get only live objects in my histogram:

jmap -histo:live PID > histo.txt

Now that I identified some suspicious objects I had to find out what references them. There is another tool that comes with Sun JDK 6 for exactly that purpose: jhat. jhat is able to analyze binary memory dump files of a Java VM. Therefor I created a dump file of my application using the following command line:

jmap -dump:live,format=b,file=heap.bin PID

This creates a binary dump file of all live objects in the VM with the given PID and writes them in the specified file. To look into the dump I run jhat like

jhat -J-mx768m heap.bin

I had to add the memory option because otherwise jhat had not enough memory to display my 50MB dump file. After the successful start of jhat I could see the following on my shell:

Reading from heap.bin...
Dump file created Thu Jun 21 11:17:40 GMT+01:00 2007
Snapshot read, resolving...
Resolving 570010 objects...
Chasing references, expect 114 dots..................................................................................................................

Eliminating duplicate references..................................................................................................................

Snapshot resolved.
Started HTTP server on port 7000
Server is ready.Pointing my browser to localhost:7000 brought up the following page:

This is a list of all the classes loaded in the VM. Now I searched for the classes I identified before in the histogram and on the details page of the class I clicked on the link named “All instances including subclasses”:

After selecting one of the instances I was able to see all other objects referencing this object instance:

The classes in the screenshots are not the classes of my project - these are just examples - but in my project I found out that the problem were references from ThreadLocals. In my web application there some persistent objects and whole transaction/sessions were bound temporary to a ThreadLocal but the references were never set to null.

In every thread of the Tomcat server (which are 25 per default) references were hold to objects that could not be garbage collected. Under load the server created even more threads which were not closed early enough to avoid the OutOfMemoryError. To fix my problem I checked my code to make sure that after each request all the ThreadLocal references were cleared - and that resolved it.

5 Regular Expressions Every Web Programmer Should Know

I’m going to assume you have a basic understanding of regular expressions at this point. If you’re a regex n00b (or /n0{2}b/, as I like to call them), or if you need a quick refresher, check out my previous post on the absolute bare minimum that every programmer should know about regular expressions. You won’t be disappointed.

So, without further adu, here are the five regular expressions that I have found the most useful for day-to-day web programming tasks.

Matching a username
This one’s quite easy, but it’s really invaluable if you’re trying to build a user registration system for a website. We typically want to limit usernames to a restricted set of characters in order to make development easier, and to keep malicious users from spoofing someone else’s name (e.g. replacing a space with multiple spaces or a newline character, which are all displayed the same by a web browser).

Without regular expressions, this would be a tedious exercise that would involve splitting the string into it’s component characters and examining each one individually. With regular expressions, it’s a breeze. First, let’s define what we want to accept, we’ll keep it simple and limit the example to the following characters:

Alphanumeric characters (letters and numbers)
The underscore character (_)
We’ll also want to enforce a 3 character minimum and a 16 character maximum length. Here’s the regular expression that matches this fairly standard set of criteria:

/[a-zA-Z0-9_]{3,16}/
If you’re familiar with regular expressions you may have notice something missing at this point - don’t worry, I’ll get to it.

If you’ve read my introductory to regular expressions you should already know how this regex works. First we’re defining a character class that will match any letters (a through z, and A through Z) and any numbers (0 through 9), as well as the _ (underscore) character. Next comes an interval quantifier that tells the regex engine we’ll only match sequences of between 3 and 16 characters. Because the quantifier follows a character class rather than a single character it attaches itself to the entire class, and will match every sequence between 3 and 16 characters so long as each character falls within our restricted character set.

So what’s missing? As it stands our regex will match anywhere within a string. It won’t just match ‘mike_84′, it will also match any ‘%! mike_84&’, which contains several characters we don’t want. What we need are anchors, the ^ (caret) and $ (dollar) characters will anchor our regex to the beginning and end of the string, ensuring that the whole username meets our requirements and not just a portion of it.

So our revised regex will look like this:

/^[a-zA-Z0-9_]{3,16}$/
Here’s a quick PHP code snippet that shows how we can use this regex in production (we could just as easily use perl, java, ruby, or even javascript to do this validation).

function validate_username( $username ) {
if(preg_match('/^[a-zA-Z0-9_]{3,16}$/', $_GET['username'])) {
return true;
}
return false;
}
Matching an XHTML/XML tag
Matching an XML or XHTML tag can be extremely useful if you’re scraping a website for data, or trying to quickly extract information from an XML document. A simple regex to accomplish this sort of extraction follows this form (the word ‘tag’ should be replaced with whatever tag you are looking for):

{]*>(.*?)}
The question mark following the star turns the start into a lazy quantifier. By default, quantifiers are greedy, meaning they’ll consume as much of the input text as they can. Lazy quantifiers, by contrast, will match as little of the input text as they can. If we used a greedy quantifier in this case, our regex would not work as advertised on an input document like

item 1item 2
Instead of matching a single tag, a greedy quantifier would match up to the final closing tag in the input text.

Here’s a simple PHP function to extract the contents of each matching XML or XHTML tag as an array:

function get_tag( $tag, $xml ) {
$tag = preg_quote($tag);
preg_match_all('{<'.$tag.'[^>]*>(.*?).'}',
$xml,
$matches,
PREG_PATTERN_ORDER);

return $matches[1];
}
Matching an XHTML/XML tag with a certain attribute value (e.g. class or tag)
This regex is very similar to the last example, except we only want tags with a certain attribute value. This comes in handy when you want to extract a tag with a particular class or ID value, for example. The regex is just slightly more complicated than our previous example (again, replace tag, attribute, and value with whatever you’re looking for):

{]*attribute\s*=\s*(["'])value\\1[^>]*>(.*?)}
We use a character class to allow either single or double quotes around our value. The portion of the regex following the value is called a backreference. It will be replaced with whatever is captured by the first set of parenthesis in the expression (either a single quote or double quote). That way we can be sure that the opening and closing quotes match.

Here’s a PHP function that shows how you can extract information form an XHTML document with this regex. The function tags an attribute, value, input text, and an optional tag name as arguments. If no tag name is specified it will match any tag with the specified attribute and attribute value.

function get_tag( $attr, $value, $xml, $tag=null ) {
if( is_null($tag) )
$tag = '\w+';
else
$tag = preg_quote($tag);

$attr = preg_quote($attr);
$value = preg_quote($value);

$tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
"(['\"])$value\\2[^>]*>(.*?)<\/\\1>/"

preg_match_all($tag_regex,
$xml,
$matches,
PREG_PATTERN_ORDER);

return $matches[3];
}
Matching and parsing an email address
This one comes courtesy of Cal Henderson, the programmer behind Flickr and author of Building Scalable Web Sites (a great read). For more information check out Cal’s article on parsing email addresses.

This one’s such a behemoth that it’s easier to digest when broken into it’s component parts. Constructing a regex like this is a bit like describing a grammer in Backus-Naur form (BNF), which is convenient because many of the things we’re trying to match are already described using BNF in their specifications. This is the case for email addresses, which are described in RFC 822. Anyways, here’s a PHP function that will check the validity of an e-mail address:

function is_valid_email_address($email){
$qtext = '[^\x0d\x22\x5c\x80-\xff]';
$dtext = '[^\x0d\x5b-\x5d\x80-\xff]';
$atom = '[^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c'.
'\x3e\x40\x5b-\x5d\x7f-\xff]+';
$quoted_pair = '\x5c[\x00-\x7f]';
$domain_literal = "\x5b($dtext|$quoted_pair)*\x5d";
$quoted_string = "\x22($qtext|$quoted_pair)*\x22";
$domain_ref = $atom;
$sub_domain = "($domain_ref|$domain_literal)";
$word = "($atom|$quoted_string)";
$domain = "$sub_domain(\x2e$sub_domain)*";
$local_part = "$word(\x2e$word)*";
$addr_spec = "$local_part\x40$domain";

return preg_match("!^$addr_spec$!", $email) ? 1 : 0;
}
The ‘\x##’ sequences are hexadecimal character references. It’s just a fancy way of specifying a character using it’s underlying code point (the numerical representation of a particular symbol). Otherwise this is a fairly straightforward, albeit incredibly complex regular expression. I’ll refrain from any further analysis since it’s been done elsewhere.

Tim Fletcher has ported Cal’s original PHP function into Ruby and Perl, if that’s what you’re into..

Matching a URL
Matching a URL is a lot like matching an e-mail address, except that you tend to do it in more controlled, and less critical situations where you can tolerate a few false positives. I use this regex frequently in projects when I need to automatically generate links when a URL is typed in a comment field, for example. Like the email regex, this one’s a doozy, but it’s pretty easy to understand.

I’ve taken advantage of the ‘x’ and ‘i’ pattern modifiers for this regex. Pattern modifiers are tacked onto the end of a regex and change the way the regex engine interprets the expression. The ‘x’ modifier tells the engine to ignore whitespace, except when escaped or used inside of a character class. It also tells the engine to interpret any text following a ‘#’ character outside of a character class as a comment (i.e. ignore it). The ‘i’ modifier makes the regex case insensitive. This can drastically simplify a complicated regex like this one when case doesn’t matter. This regex is derived from one developed by Jeffrey Friedl in his book Mastering Regular Expressions.

{
\b
# Match the leading part (proto://hostname, or just hostname)
(
# http://, or https:// leading part
(https?)://[-\w]+(\.\w[-\w]*)+
|
# or, try to find a hostname with more specific sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t|fo)\b # .int or .info
| mil\b
| net\b
| org\b
| [a-z][a-z]\.[a-z][a-z]\b # two-letter country code
)
)

# Allow an optional port number
( : \d+ )?

# The rest of the URL is optional, and begins with /
(
/
# The rest are heuristics for what seems to work well
[^.!,?;"\'<>()[]{}sx7F-\xFF]*
(
[.!,?]+ [^.!,?;”\’<>()\[\]{\}s\x7F-\xFF]+
)*
)?
}ix
The comments in this expression are fairly self explanatory, so I don’t think it needs a whole lot of explanation. There are a few things to watch out for though. First, this regex will match some things that are not valid URLs. The regex assumes that any two-letter combination is a valid top-level domain (TLD), which is not the case. It also misses TLDs that were recently added to the IANA list like .travel, .name, and .museum. You can fix this by downloading the latest IANA TLD list and adding any missing TLDs in the list of alternatives mid-way through the expression.

That being said, this regex works great 99.9% of the time. Here’s a quick PHP function that will parse a section of text, replacing any URLs it finds with links. I’m going to assume you’ve set the variable $url_regex to the the above pattern so I don’t have to repeat it here.

function auto_link( $text ) {
$url_regex = ...

return preg_replace( $url_regex,
'a href="$0" 0="/a"',
$text );
}
So that’s it. If you think I left something off the list that deserves mention, or if you have any suggestions for improvements, post a comment and let me know.

Thousands of convicted sex offenders evicted from MySpace

The number of convicted sex criminals found on MySpace has more than quadrupled since software designed to ferret them out began running in May, according to prosecutors in two US states.

Attorneys general from Connecticut and North Carolina said in published reports on Tuesday that MySpace identified 29,000 convicted sex offenders that had profile pages on the popular social-networking website.

MySpace refused to discuss the number but said its Sentinel Safe software is working "24 hours a day" and that profile pages belonging to sex criminals are deleted as soon as they are discovered.

"We partnered with Sentinel Safe to build technology to remove registered sex offenders from our site," MySpace chief security officer Hemanshu Nigam said in a written response to an AFP inquiry.

"Through this innovative technology, we're pleased that we've successfully identified and deleted these registered sex offenders and hope that other social networking sites follow our lead."

News Corporation-owned MySpace is the only Internet firm to develop a database and software to rid online properties of convicted sex offenders.

MySpace deletes the profiles but saves information about them for law enforcement officials, according to Nigam, a former US prosecutor who handled sex crimes.

MySpace is lobbying for a federal law requiring convicted sex offenders to register their e-mail addresses to make it easier to screen them from membership websites used by young people.

US law already requires people convicted of sex crimes to register their addresses with local police after they are released from custody.

There are an approximately 600,000 registered sex offenders in the United States.

Nearly 180 million profiles are posted on MySpace, allowing users to post blogs, music and videos.

North Carolina attorney general Roy Cooper is backing state legislation that would require websites to confirm that children have their parents' permission before creating online profiles.

One chip does three memory jobs

By Chris Mellor

Memory system designer Silicon Storage Technology, Inc. (SST) has combined NAND and NOR-like flash memory with RAM to produce a single chip with three memory applications.
SST says this memory consolidation will simplify the host interface, shorten design time, reduce overall system costs and improve quality and reliability for a variety of mobile and embedded intelligent devices.
The three different memory types will be used for storing programme code, and for data and system operations (RAM).
The first All-in-OneMemory product, called the SST88VP1107, comes configured with 512 kbyte instant-on boot NOR, 128 Mbyte execute-in-place code storage in a NOR-like flash, 120 Mbyte data storage in NAND flash and 12 Mbyte system RAM for mobile and embedded applications.
These three memory functions are accessed via a single PSRAM bus. The data storage is accessed sequentially, being laid out as a memory-mapped ATA disk; the others are accessed randomly.
The NOR function is carried out by pseudo-NOR composed of RAM cache and NAND flash memory. This saves money by eliminating the need for costly, high-density NOR flash memory.
Caching NAND flash content helps extend endurance and improve reliability by minimizing direct read/write access to the NAND flash.
Sleek design

SST says the All-in-OneMemory chip is well-suited for applications that need high-density memory with enhanced performance, superior quality and high reliability.
It is a small package, at 10mm x 13mm X 1.44mm, and so suited to deployment in hand-held intelligent devices with limited internal space as well as larger ones.
By intelligently managing all memory components with a resident 32-bit microcontroller, All-in-OneMemory offers instant secure boot, memory demand paging, NAND flash management and industry-standard ATA data storage protocol on a single PSRAM bus.
This reduces overall system complexity and lowers cost, SST claims.
SST plans to introduce other products in the All-in-OneMemory family and says the default configuration of the first product can be altered for specific application requirements.
The SST88VP1107 is sampling now with production chips expected to be available in December.
Pricing starts at $17.00 in 10K unit quantities.

http://www.lakbimanews.lk/education/edu1.htm

Hacking extortionist resurfaces

2006 Trojan besets users again, demands $300 to unlock encrypted files

By Gregg Keizer

“Ransomware” last seen in 2006 has reappeared and is trying to extort $300 from users whose files the malware has encrypted, a Russian security researcher said today.
GpCode, a Trojan horse which last made a run at users last summer, has popped up again, said Aleks Gostev, senior virus analyst with Moscow-based Kaspersky Lab Inc., in a posting to the research centre’s blog.
Noting the long quiet time, Gostev added: “So you can imagine our feelings this weekend, when some of our non-Russian users told us their documents, photos, archive files etc. had turned into a bunch of junk data, and a file called ‘read_me.txt’ had appeared on their systems.”

Ransom Note

The text file contained the “ransom” note.
“Hello, your files are encrypted with RSA-4096 algorithm. You will need at least few years to decrypt these files without our software.
All your private information for last 3 months were collected and sent to us. To decrypt your files you need to buy our software. The price is $300.”
So-called ransomware typically follows the GpCode pattern: malware sneaks onto a PC, encrypts files, and then displays a message demanding money to unlock the data.
Gostev hinted that the blackmailer was likely Russian.
“The e-mail address is one that we’ve seen before in LdPinch and Banker [Trojan horse] variants, programmes which were clearly of Russian origin,” he said.
The blackmailer’s claim that the files were enciphered with RSA-4096 — the RSA algorithm locked with a 4,096-bit key — is bogus, said Gostev. Another oddity, he added, was that the Trojan has a limited shelf life: from July 10 to July 15.
“Why? We can only guess,” said Gostev.

Advice

Kaspersky is working on a decryption scheme to recover the files; that process has been the usual salvation — and solution — for users attacked by ransomware. “[But] we’d just like to remind you, if you’ve fallen victim to any type of ransomware, you should never pay up under any circumstances.
“Contact your anti-virus provider, and make sure you back up your data on a regular basis.”

http://www.lakbimanews.lk/education/edu2.htm

One chip does three memory jobs

By Chris Mellor

Gadgets threaten energy savings’77

The growing popularity of hi-tech devices, such as flat-screen TVs and digital radios, threaten to undermine efforts to save energy, a report says. UK consumers spend E12bn a year on electronics, much of which is less efficient than older technology, a study by the Energy Saving Trust found. By 2020, the gadgets will account for about 45% of electricity used in UK households, the organisation projected. It said flat-screen TVs and digital radios were among the worst offenders. Paula Owen, author of the report called The Ampere Strikes Back, said household appliances currently consumed about a third of an average home’s electricity.

But she warned this was likely to increase as a result of people buying more energy-intensive devices.
“Your old-fashioned, bulky cathode ray tube TV on average consumed about 100 watts of electricity when it was switched on,” Dr Owen explained. “What we are seeing now is a trend for much bigger flat-screened TVs. On average, we are seeing a three-fold increase in the energy needed to power these TVs.
“Pretty much in every other sector [such as fridges and washing machines], we find that as the technology moves on, the products get more and more efficient.
“Consumer electronics does not work like that.”

‘Radio ga-ga’

The equivalent of 14 power stations will be needed just to power consumer electronic devices by 2020, the report warned.
By that time televisions on standby will consume 1.4% of all domestic electricity, it predicted.
Digital radios were also singled out by the report as being energy intensive.
“Traditional analogue radios consume about two watts when they are switched on,” Dr Owen said. “We’ve looked at digital radios and the average consumption of these is eight watts.” She added that listening to the radio via digital TVs or set-top boxes had an average consumption of more than 100 watts. Recent research by the communications watchdog Ofcom said that more than 80% of UK homes now had digital TV. More people are buying digital TVs or set-top boxes because by the end of 2012 the analogue TV signal will no longer be available in the UK. But not all new technology was criticised by the report.
“Mobile phones and their chargers are one area where we have seen an improvement,” Dr Owen said. A few years ago, she said, the current being drawn by chargers that were plugged in but not actually attached to a phone was about three to five watts. “We have done some testing on the newest mobile phones and chargers you can buy today and reassuringly we could see that ‘no-load’ consumption had fallen below one watt.”
“The simple message to people is switch things off when you have finished using them,” urged Dr Owen.

Using a Robot to Teach Human Social Skills

By Emmet Cole

Children with autism are often described as robotic: They are emotionless. They engage in obsessive, repetitive behaviour and have trouble communicating and socializing.
Now, a humanoid robot designed to teach autistic children social skills has begun testing in British schools.
Known as KASPAR (Kinesics and Synchronisation in Personal Assistant Robotics), the $4.33 million bot smiles, simulates surprise and sadness, gesticulates and, the researchers hope, will encourage social interaction amongst autistic children.
Developed as part of the pan-European IROMEC (Interactive Robotic Social Mediators as Companions) project, KASPAR has two “eyes” fitted with video cameras and a mouth that can open and smile.
Children with autism have difficulty understanding and interpreting people’s facial expressions and body language, says Dr. Ben Robins, a senior research fellow at the University of Hertfordshire’s Adaptive Systems Research Group, who leads the multi-national team behind KASPAR.
“Human interaction can be very subtle, with even the smallest eyebrow raise, for example, having different meanings in different contexts,” Robins said. “It is thought that autistic children cut themselves off from interacting with other humans because, for them, this is too much information and it is too confusing for them to understand.”
With this in mind, the team designed KASPAR to express emotion consistently and with the minimum of complexity.
KASPAR’s face is made of silicon-rubber supported on an aluminium frame. Eight degrees of freedom in the head and neck and six in the arms and hands enable movement.
The researchers hope that the end result is a human-like robot that can act as a “social mediator” for autistic children, a steppingstone to improved social interaction with other children and adults.
“KASPAR provides autistic children with reliability and predictability. Since there are no surprises, they feel safe and secure,” Robins said, adding that the purpose is not to replace human interaction and contact but to enhance it.
Robins has already tested some imitation and turn-taking games with the children and his preliminary findings are positive.
“When I first started testing, the children treated me like a fly on the wall,” he said. “But each one of them, in their own time, started to open themselves up to me. One child in particular, after weeks on end of ignoring me, came and sat in my lap and then took my hand and brought me to the robot, to share the experience of KASPAR with me.”
Using robots to interact with children is nothing new, although there’s been a lot of new research lately into this kind of work. The Robota dolls, a series of mini humanoid bots developed as part of the AURORA project, have been in use as educational toys since 1997.
The Social Robotics Lab at Yale is collaborating with a robotics team from the university’s department of computer science to develop Nico, a humanoid robot designed to detect vulnerabilities for autism in the first year of life.
Relying on a robot to teach human social skills might seem counterintuitive, but autism presents a special case, said Dr. Cathy Pratt, director of the Indiana Resource Center for Autism at Indiana University.
“Autistic kids often interact better with inanimate objects than with other people, so a project like this makes sense and might lead to a safe way for these kids to learn social skills,” she said.
However, autistic children often don’t make the connection between what they have learned in a training situation and the outside world, said Dr. Gary Mesibov, a professor of psychiatry at the University of North Carolina and editor of the Journal of Autism and Developmental Disorders.
“I think this project will still be worthwhile, even if the children don’t fully generalize what they have learned to the real world,” Mesibov said. “But the key question facing the researchers is whether the autistic children will be able to apply what they have learned from KASPAR in different situations and contexts.”
Face recognition and emotion processing is a major area of deficit for autistic children and hampers their social development, said Dr. Jennifer Pinto-Martin, director of the Center for Autism and Developmental Disabilities Research and Epidemiology at the University of Pennsylvania.
Although autistic children often respond well to training, the process can be very labour intensive and the quality of the trainer is paramount, Pinto-Martin said. “People who work in this area need more creative ways to train around the deficits of autism. The quality and consistency of the trainer can be hard to control, but that’s not the case with a robot.
“There is interactive computer software and video out there for testing and interaction, but the idea of using a robot trainer like KASPAR is a creative and wonderful step beyond current technologies and techniques,” she said.

Google Analytics collaborates with Roche Diagnostics Corporation and their Accu-Chek branded Web sites.

Business

Accu-Chek is the world's leading diabetes care brand, with products ranging from blood glucose monitoring systems and lancing devices to information management solutions and insulin delivery systems.

Part of Roche Diagnostics (www.roche-diagnostics.com) the Diabetes Care division team is an important pioneer in the area of diabetes management, with more than 30 years of experience in diabetes monitoring and 20 years in insulin pump therapy. The Accu-Chek team's dedication to diabetes is leading the way in the development of standards of care to reduce the serious consequences diabetes has on both health and the economy..

Approach

The accu-chek.com websites, available in more than 30 countries, offer information and resources to help people better understand and manage diabetes. By visiting www.accu-chek.com, which links to any of the Accu-Chek sites around the world, people can find information to help them better manage their diabetes. This includes such topics as nutrition, foot care, checklists, events, as well as an overview of the broad range of Accu-Chek products - from blood-glucose systems and lancet devices to insulin delivery systems and tailor-made data management solutions.

Given the scope of its global web presence, the Accu-Chek web team directly supports individual country sites and implements best practices. "We think globally, but act locally," said Jim Lefevere, Manager of Marketing. "We were looking for a way to measure traffic or quantify what we were doing on the web. That way we can drive customer focus more effectively."

"The immediate ability to measure and track our entire web portfolio is a great asset - locally as well as for the whole organization. The level of granular reporting is invaluable. This information will ultimately lead to a higher benefit for our customers."

In order to measure online activities, many country marketing managers would summarize the log files in report form, which only provided the volume of visitors per month, or by time of day.

Google Analytics offered the web team a solution to support existing marketing programs internationally, with little or no training. Country marketing managers in the 20 countries where Google Analytics is deployed now have immediate access to on-demand web analytics available anytime from anywhere in the world in multiple languages.

Results

With Google Analytics, the Accu-Chek web team immediately identified increased efficiency at the local country level. In addition, they gained insight into global results.

"We want to provide the tools, advice and best practices so that local country marketing managers can achieve the metrics that they need, and have more time to do other parts of their job," said Gina Mencias, Marketing Manager, Internet content. "Google Analytics was very easy to deploy and manage, and so intuitive, most country managers could use it with no training."

Initially, the web team set a goal to measure a conversion number for registered users. This is the first step in a larger marketing campaign to attract, acquire and ultimately retain customers as loyal advocates.

"Google Analytics was very easy to deploy and manage, and so intuitive, most country managers could use it with no training."

Gina Mencias
Marketing Manager, Internet Content

"The immediate ability to measure and track our entire web portfolio is a great asset - locally as well as for the whole organization," said Lefevere. "The level of granular reporting is invaluable to quantifying success, benchmarking results, and making improvements. This information will help with budget planning, as well as with design of future marketing programs, and will ultimately lead to a higher benefit for our customers."

Some countries are already seeing results. Norway, for instance, was able to test and measure the effect of local advertising on Web site traffic. With Google Analytics, the marketing team was able to instantly report triple the number of visitors to the website during the period the ad was running in print.

"Metrics help us better understand what provides the most value to people on the web," Lefevere said. "This allows us to focus time and effort on areas of the site that will give customers more reasons to come back again."