This was a good week, at least from my point of view.
I’ve been working on writing a shell around a data dump from StackOverflow.com that I can take into prison so that the guys can use it as a mini-Google when they have technical issues, rather than having to wait until we come in on Tuesday nights.
I decided to do everything I could in a way I’d never done it before, in order to learn more stuff.
I’m using Git for SCM…and having all sorts of problems. Hopefully I’ll get those straightened out.
I’m writing all the code so far in Scala, which is my first real use of that language. (It was nice to see all eight cores of my desktop machine running at 98-100% utilization; I managed to create 3.5 million XML files in a single directory in about 1.5 hours.)
My idea so far has several steps.
First is to split up the posts.xml file from StackOverflow.com into one XML file for every post. (That’s done: it’s where the three and a half million files came from.)
Next I traverse all the little files and create another file containing thread associations: post 1234 is a question, and posts 2345, 3456, and 4567 are answers to that question.
Then I traverse the thread-associations file and generate two sets of files, with one file in each set representing one discussion thread. Both sets will consist of XML files. One set will have an <?xml-stylesheet … ?> line to point the browser to an XSLT script to make HTML out of its contents; the other will be a full-text index file for Solr, with a URL to the other XML file for payload.
Then I feed all the index files into Solr to make the other XML files searchable and linkable.
I’ll throw together a little CSS so that the XSLT-generated HTML will at least be readable; then I’ll take the whole shebang into the prison, where Mark Roberts will undoubtedly tweak the CSS to make everything look really cool. (I’m not really a CSS guy.)
It turns out that my random choice–Scala–is excellent for this project, both because its Actors are an easy way to do multiprocessing simply and correctly, and because of the way it handles XML directly as a language feature, rather than through libraries.
When I got into prison this evening I was a little curious about how Lee Leonard had handled last week’s problem. He had a Hibernate-based servlet going that could not see changes in the MySQL database unless it was killed and restarted. He had a version using HSQLDB that was working fine, and he couldn’t figure out why the MySQL one didn’t work.
I couldn’t either, but I recommended that he pull the java.sql.Connection out of the Hibernate session and run raw SQL over it and use JDBC to examine the results to see if the same problem existed. He did, and found out that he couldn’t see the data with JDBC either. That’s about when I had to leave.
So it turned out that he had tried using DriverManager–rather than Hibernate–to set up a JDBC connection and used raw SQL over it, and everything worked just fine. So he figured out how to create a Hibernate session factory from that JDBC connection, and then Hibernate started working fine. He wasn’t happy about abandoning the standard Hibernate configuration, though, so he wrote code to read it in and set the session factory’s properties appropriately.
That’s a hack, and he’s looking through the Hibernate configuration to find out what the original problem was; but I’m very proud of him that he was able to do as much hacking as he did to localize and work around the problem. I really hope he and Pillar Technology give each other a chance when he gets out.
Louis spent some time talking to me. Just as a new mother quickly forgets the agony of childbirth and is anxious to become pregnant again, Louis has quickly forgotten the agony and hassle and stress of managing last month’s tech conference, and is talking about putting on another one in July of next year.
He has a number of interesting ideas for it, but the one that hits me hardest is that he wants to see if we can get Uncle Bob Martin to keynote for us this time.
“Uncle Bob?” I said. “Here? Are you serious?”
“What can it hurt to ask?” he replied.
Sheesh. The guy doesn’t know the meaning of the word “intimidation.”
So I’ll see what I can do. Jeff Langr, this year’s keynote speaker, used to work for Uncle Bob. Maybe he can put in a good word for us.
I’m a little scared, though. Everything else Louis has set his hand to has blossomed in beautiful colors; if Uncle Bob keynotes at Marion Correctional, the JavaGuys in general and I in particular will probably get a whole lot more attention than I’m comfortable with.