Forums

http://blog.pythonanywhere.com/42/

The forums are now searchable...I read that, but I don't see how. Am I missing something?

Ah, yes I see I am missing something. You implemented the search functionality using a Google tracker. Okay, maybe that is a bit harsh, but I'll try and explain my perspective. I keep my browser locked down so sites that are monetized by spying on their users can't spy on me. So, items like the Facebook Like and Google +1 are all blocked in my configuration. The same issue arrives with the Google AJAX Search API. If I unblock it, then Google knows where I spend my time. I guess this is my problem, but any other privacy concerned PA users may have the same issue I did. They read the announcement that we got forum search capability, but then they don't see a UI for it.

I don't expect you to change your implementation to satisfy me, but wanted to let you know I didn't benefit from this new feature. We live in an exciting world of easy access to information, but that access comes at a price. Privacy is a price I think we give up too easily as a society. I for one try to do my part.

As a last note...for anyone thinking I'm a bit off the deep end here on the privacy issue. Before you judge do yourself a favor. Install this add-on and take a look at how you are tracked...then decide for yourself if it's no big deal that Google, Facebook, Adobe, and too many more see what you are reading on the PA forums...or if in fact they are watching you across a much wider arena!!

I suspect that Google search was the easiest way to add good quality searching to the site at a time when I'm sure the development team have quite a number of high priority features and issues to be working on. There are solutions like Solr which make it easier to roll out your own search functionality, but it's still considerably more effort than just plugging in something like Google.

I appreciate that some people have privacy concerns, but the difficulty comes from the fact that companies like Google do provide some fairly compelling services - if the quality wasn't there it would be an easier decision. Also, it's clear that society as a whole has been willing to sacrifice a good deal of privacy for the sake of compelling services being available for free - I'm sure there are commercial alternatives to most of what Google offers which people could use should they wish. The difficulty for sites like PA, I suppose, is balancing the effort of implementing features with the number of users who'll find them beneficial. That's not trying to say your opinions are any less valid, just trying to look at it pragmatically. (^_^)

Personally I used to worry about these things, but these days I decided that my general privacy is a reasonable price to pay for decent services. That's only a personal decision, of course, but I really don't do much that I worry about others finding out. That's not to say that I think everyone should feel that way, just that's my personal choice that I've made consciously. Unfortunately for those to whom privacy is more important than it is to me, I suspect a lot of people have also made the same choice.

EDIT: On the subject of privacy (well, anonymity, really) I thought this article postulated an interested trend towards less anonymity online.

@a2j: I don't know what tool you use to thwart the cookie collectors. But I think most privacy tools provide the functionality to exempt trusted sites. I use Ghostery on Firefox and have whitelisted pythonanywhere.com. Not that there was much to exempt: All Ghostery found on PA was a Google Analytics cookie and the Google Search widget.

Still, if you feel that whitelisting is a slippery slope, you can essentially replicate the toolbar by typing "search_term site:pythonanywhere.com/forums" in a Google text box.

Hi guys -- Cartroo has it just right, I'm afraid -- we looked at how much work it would be to write our own decent search function (with proper stemming so that code picked up coding and wsgi picked up WSGIfying and so on) or to build in something like Solr, and it felt like a biggish job. Whacking in Google's search widget, on the other hand, was half an hour's work one afternoon and given how important it was becoming to get search sorted on these forums, we went with that. Someone did privately suggest (via the feedback link) that we consider open-sourcing the forums so that others can help us out on this kind of thing -- that's a great idea, and we're definitely considering it. But of course there would be a little bit of work unpicking PA-specific stuff like the gold stars and the "PythonAnywhere staff" tags.

Oh, and it appears there is a paid version of the Google widget -- that would get rid of the ads, and so we'll definitely use it, but perhaps it would also be less intrusive. Given that it's Google, though, I suspect it won't :-(

There are ads? (o_O)

/me pauses AdBlock and reloads.

Oh yes, so there are.

(^_^)

@giles: On the subject of doing search by steam, should it ever become a feature that's added later (either by the PA team, community, whoever), I suspect that something like Solr is the way to go. I always find it quite unsatisfactory having to run an external daemon like that, not to mention a Java one[1], but getting search right is one of those fiddly things that's probably best not re-invented unnecessarily.

[1] Sorry, my prejudices stem from been bitten by poor garbage collection in old JVMs in the past. Damn things just kept swallowing memory and then suddenly locked up for over a minute while they garbage collected endlessly. Might have been resolved in the last few years, haven't touched Java in quite awhile.

@Cartroo -- yup, definitely the sticky-tape and drinking straws feel of using a Python interface to a Java daemon scanning text files generated from our database did worry us a bit when looking at some of the search options out there. Definitely need to get rid of the ads soonest though.

I guess it's potentially worth it to do search properly, presupposing the Google solution needs to be replaced. There's also PyLucene which gives a Python wrapper around the Java library on which Solr is based. Running a JVM attached to a Python process, though? Definitely drinking straws territory.

Presuming compiling code isn't a problem, Xapian also looks promising. It has Python bindings and C/C++ code tends to integrate with Python a lot more cleanly than Java (unless you're using Jython, I suppose).

At the expense of performance, there's also the pure Python Whoosh, but looks quite a lot slower (unsurprisingly).

My feeling would be that one should plan for success and go for Xapian with Python bindings, but it's reassuring to know that there are a variety of options.

One minor point - I've noticed the Google Site Search box can take a couple of seconds to load, and all the page content shifts down a few lines once it's inserted. A few times this has caused me to click the link to the wrong post - fairly minor, but slightly irritating.

I know the accepted wisdom is to put as much JS towards the end of the page as possible, but in this case I wonder if it's worth moving it up to get it loading more quickly - not sure how much good that will do, however, if the problem is actually latency between me and Google as opposed to browser layout times.

Alternatively it looks like you might be able to impose a fixed min-height on one of the <divs> so that at least the page content shouldn't move around when the box loads. Not perfect but I think it should do the job (providing it doesn't mess up Twitter Bootstrap's styling).

Minor thing I know, but I figure if I've run into it then maybe other people are too.

Good point, I've been bitten by that once or twice. The quickest fix would probably be to put a min-height in; perhaps we could also put a "spinner" in there that gets removed once the Google stuff is loaded. I'll put it on our list.

Thanks. Not high priority by any means, but we wouldn't want the "todo" list to get too empty, would we? (^_^)