pinkie pie

mendel


Rich Lafferty's Journal

(mendelicious mendelusions)


Previous Entry Share Next Entry
Google Blog Search
pinkie pie
mendel
Ages ago Google stopped providing excerpts for LiveJournal searches, but they've now made up for it with Google Blog Search, which lets you search the blogosphere or a particular blog with full excerpts (and some intelligence in terms of figuring out individual authors and blogs by name). It seems to work pretty well, although it appears you need to use the "http://www.livejournal.com/users/USERNAME" URLs even for paid users.

(Cue doomsayers who will now predict that Google will begin excluding blogs from its regular search.)

The great part about this, if you ask me, is this:


Google
 
mendel's journalall of LiveJournalall blogs


Whee!

  • 1
Brand spankin'. (Also, the "limit to my LJ" part wasn't working until right now. See my lj_nifty post in about five minutes.)

The funny thing is I wrote this just yesterday. It's not obsolete...yet. Google certainly has the processing power to parse all the articles for keyword searches on their servers.

It uses a lot of bandwidth, though. I ran it from 7:35 to 9:35 Pacific this morning, and it used almost 30MB of bandwidth. It'll go up as more North Americans wake up and start posting.

I can't tell from their about page if they're getting feeds or if they're spidering. This certainly would mesh with what brad mentioned in the post where he talked about the stream you're using there, though, about companies asking for pings for all entries.

Ah, no, I stand corrected -- they are using feeds, not spidering. It's too bad they can't just integrate their existing non-bloggy index for older posts.

I stole this and made it a "backdated" post for December 31, 2037 at 23:59 (the last date/time combo LJ currently accepts), so that it's always the first post for people looking at my journal but doesn't show up in friends view.

Thanks for this!

Oh hey, that's a good idea. I just stuck it on my userinfo. I suppose the ideal way would be to integrate it into the style somehow. I really should get around to picking up S2.

About those URLs...

Probably because that's all we advertise here:

http://updates.sixapart.com/

Re: About those URLs...

Ah, that makes sense. Was Google one of the ones you wrote the push feed thingy for?

A user preference to choose between publishing the domain-alias URL, paid user URL, or free user URL would be neat, though. I suppose all I'd need to do is ping weblogs.com with a domain-alias URL whenever I posted and it wouldn't know that it wasn't LJ, though. (If I used the domain-alias feature for something other than a livegerbil.com URL, of course.)

sixapart xml updates and spiders

Does updates.sixapart.com also provide feeds if someone has checked the preference to block search spiders? I've seen some (unverified) complaints by people who found their LJs on the google blog search when they thought they'd blocked Google by denying robots. The mechanism for supplying the data is entirely different, of course, but I expect many people will have assumed that their preference carried forward to this newer setup.

Now, cool would be being able to add google (which happens to belong to someone already, pfft!) to your LJ Friends list, which would in turn allow the Googlebot to spider your Friends-locked entries for searching. Google's search results obviously shouldn't show an excerpt for these particular entries in the overall results, nor allow folks to view a cached version of the page ... but at least provide a means of getting the entries into your search results.

Ya, Yo, Yap, Tao, Yaw, Yak, Yam, GAO, Lao, Mao

...and 30 seconds later I discover that someone with a ukulele blog on MSN republished my Lysenko piece. Yao.

  • 1
?

Log in

No account? Create an account