Mypal/parser/html/java/htmlparser/ruby-gcj/README

66 lines
2.4 KiB
Plaintext

Disclaimer:
This code is experimental.
When some people say experimental, they mean "it may not do what it is
intended to do; in fact, it might even wipe out your hard drive". I mean
that too. But I mean something more than that.
In this case, experimental means that I don't even know what it is intended
to do. I just have a vague vision, and I am trying out various things in
the hopes that one of them will work out.
Vision:
My vague vision is that I would like to see HTML 5 be a success. For me to
consider it to be a success, it needs to be a standard, be interoperable,
and be ubiquitous.
I believe that the Validator.nu parser can be used to bootstrap that
process. It is written in Java. Has been compiled into JavaScript. Has
been translated into C++ based on the Mozilla libraries with the intent of
being included in Firefox. It very closely tracks to the standard.
For the moment, the effort is on extending that to another language (Ruby)
on a single environment (i.e., Linux). Once that is complete, intent is to
evaluate the results, decide what needs to be changed, and what needs to be
done to support other languages and environments.
The bar I'm setting for myself isn't just another SWIG generated low level
interface to a DOM, but rather a best of breed interface; which for Ruby
seems to be the one pioneered by Hpricot and adopted by Nokogiri. Success
will mean passing all of the tests from one of those two parsers as well as
all of the HTML5 tests.
Build instructions:
You'll need icu4j and chardet jars. If you checked out and ran dldeps you
are already all set:
svn co http://svn.versiondude.net/whattf/build/trunk/ build
python build/build.py checkout dldeps
Fedora 11:
yum install ruby-devel rubygem-rake java-1.5.0-gcj-devel gcc-c++
Ubuntu 9.04:
apt-get install ruby ruby1.8-dev rake gcj g++
Also at this time, you need to install a jdk (e.g. sun-java6-jdk), simply
because the javac that comes with gcj doesn't support -sourcepath, and
I haven't spent the time to find a replacement.
Finally, make sure that libjaxp1.3-java is *not* installed.
http://gcc.gnu.org/ml/java/2009-06/msg00055.html
If this is done, you should be all set.
cd htmlparser/ruby-gcj
rake test
If things are successful, the last lines of the output will list the
font attributes and values found in the test/google.html file.