Saturday, August 22, 2009

The Schedule......

I was just reading a short post by PaulGraham on Schedule- Manager's schedule, Maker's schedule. A meeting can sometimes can blow up your whole day and I did face the same problem many a time before. Now, to some extent, I can work without the meetings disturbing my schedule. :)

Yeah, especially when you are programming, you never know what new challenge you would be encountering. Such cases of unexpected challenges surely would keep our eyes gluded to the LCD and hands on the ergonomic keyboards. And m experience that many a time I forgot few social appointments leading to short hard feelings. A work around, is to schedule such work in the night. :D

I am also used to the "grab coffee" kind of meetings with people for two reasons. One, that would give some physical exercise, refresh my brain and get some time to breathe in the natural air. Second(may sound a bit silly) I like eating and drinking a lot.

But, do I really need a schedule when I enjoy the work I am involved in?
hmmm..... my heart would then respond
yes...... its better that I spend some time for family and friends too. :P

Tuesday, August 18, 2009

A9.com

A9.com is into providing product search services on Amazon and many e-commerce websites. A9.com is a subsidiary of Amazon.

Offices: Palo Alto, California and Bangalore, India
Areas of work: Search(product search specifically)


The interface is pleasing enough and A9 is a good platform for searching for products. Especially, the product description section has enough detail about the features of the product.

The interface also features the 'suggest' facility giving us hints of few possible combinations of phrases.
From the job profile provided by A9(Please report if the link is broken), I guess, the company is involved in Information Extraction, Data Mining. And seems they are also trying to use log files(feedback for improvement).
A9 is another company that works in computational linguistics, natural language processing and information retrieval.

TextWise

TextWise

Textwise is into building semantic technology solutions. Similarity search, concept tagging, content categorization, etc are few solutions provided by the company.
From the AboutUs page of the company's website, they are into fields like extraction, search, categorization and classification using both NLP and statistics for building solutions.

The company was formerly MNIS and MNIS- TextWise Labs.

Headquarters: Rochester, New York.
Funding type: VC funded
Founded: 1994

Friday, August 14, 2009

What is this hacker language? - 2

The hacker language is also called Leet. In one sense, Leet is an adjective used to refer substitution ciphers.
This kind of speak is also called as leet speak.

The grammar section in this article is something that caught my eye.
"Leet enjoys a looser grammar than standard English."

So will NLP systems in future focus on the leet grammar? No doubt that is too ambitious. Hopefully once people come up with systems that can understand human language, this can be achieved.
How can be the problems of Named Entity Recognition, multi-word expression finding be mapped in such languages? :P

Guess, there would be movie on AI on this kind of systems that can crack hacker's language.
Cool....... I think my thinking is going too wild :P

Wednesday, August 12, 2009

What is this hacker language?

While doing some experiments with the Google Logo, i landed up in the URL
http://www.google.com/intl/en_ALL/images/logo.gif

And then, tried checking if the directory listing was enabled( I was quite sure that it is not but just for a bit of self -satisfaction) I tried this URL, http://www.google.com/intl/
but this resulted in showing up a page not found message. The interesting part was that it suggested me a URL

www.google.com/intl/ja/

Now, i tried using the search syntax,
intl site:www.google.com/intl/

And my curiosity grew after seeing the word hacker in the URL http://www.google.com/intl/xx-hacker/

This is something interesting.
Learnt that there is a language called a hacker language.
1. Why would Google host such a page?
2. Does anyone use that site?
3. Is there something special about this Hacker Language?
4. Or is this some sort of SEO technique(instead of just showing a page not found exception, also suggesting another URL) to retain users?

Do we need another from Google?

I am shocked to hear that Google is working on another search engine, Caffeine.

Developers are inviting users to use the new service hosted at
http://www2.sandbox.google.com/

And I guess most of the people would be eager to provide feedback. And that is what that keeps Google on top.
User feedback is a priceless treasure and Google is able to get it on a large scale.

Sometimes, I feel that we ourselves are making Google intelligent.

Anyways, at the end of the day, a user wants fast and relevant results.

Will people get addicted to the new Google Caffeine.
Well, I am already addicted to the natural Caffeine. ;)

Wednesday, August 5, 2009

Richard Feynman teaches Physics to students

A great news for Physicists. Bill Gates, the founder of Microsoft, is trying to make the Physics lectures given by Richard Feynman freely available.

He dreamed of this 20 years ago when he saw the series of lectures. All these years, he has been trying to get the copy rights of the lectures. Another instance to say that Thoughts don't die easily. Thoughts live over years and with constant practice they do take life. :)
More in this link
http://news.cnet.com/8301-13860_3-10286732-56.html

It has been a year he left Microsoft and even now he spends 20 percent of his time with Microsoft. He is working with Intellectual Ventures in spinning off a company called TerraPower, that develops nuclear reactors that run on depleted Uranium.

I am eagerly waiting to listen to some of those lectures.

Tuesday, August 4, 2009

Page Cloaking / Code Swapping

Earlier, I have heard of few ways of spamming: keyword stuffing, hidden text and small text. In fact, I was using them to get my page in the top results for some queries. Yippie..... I was to some extent able to drive traffic but that didn't last long( :-( ) as I had links from few sites that had a higher page rank.
And after figuring it out that methods such as keyword stuffing, hidden text won't work, I have removed them. As far as my understanding, the only way to be in the top results is to be popular(indicating a high page rank), have a lot of back links, also have outlinks(not to spam) and relevant content.

Of late, I have also heard of another spamming technique: Page Cloaking, Code Swapping. A spamming technique where one page is submitted to the search engine and another page is shown to the end user. Finally there is a difference in the content that is retrieved as search result and the content that is seen by the user.
Spam of such kind is combated by few search engines by revisiting the pages that are indexed regularly. Spammers may have a tough time here as they don't know when the crawler/spider would again show up.
But, if the user is identified as a search engine, a separate script can run and submit different content to the spider and fool it.

IP deliver is a variation of cloaking where the content is served based on the user's IP address.

Search engines have grown intelligent enough to overcome this spamming to an extent using few anti-spam techniques. I have heard that Bing, the new search engine from Microsoft, is trying to simulate users using bots to check if the content retrieved by the search engine and the user bot is similar.
Good idea, but this would almost eat up a lot of bandwidth. But, given the huge amount of spam on the internet, it is a worth for giving relevant/no spam results.

www[2-6].google.com

I am new to this stuff. This might be a simple fact. But, I am trying to understand some naming conventions used for servers.

www[2-6].google.com

The following domains redirect to www.google.

www2.google.com
www3.google.com
www4.google.com
www5.google.com
www6.google.com

1. Does gfe have any special meaning in
gfe.core.l.google.com

Also, I have heard about google fresh bot and google deep bot.
64.68.82.x - indicates freshbot
216.239.46.x - indicates deepbot

Observing the IPs of the bot may give information about google crawling strategy. :P