Tuesday, November 24, 2009

NetBase Solutions

Company: NetBase Solutions(formerly Accelovation, Inc)
Location: MountainView, CA
URL : http://netbase.com/
Areas: Search, NLP, Semantic Technologies.

NetBase solutions is another company that seems to be working towards semantic technologies and using them in search.
One of the solutions, content-intelligence is something interesting. As far as my understanding, identification and categorization of entities into many classes drives this whole process. It would be more interesting to see how this is completely used in enriching the user experience and cater the user needs.

Another interesting thing about NetBase is their way of using deep linguistic parsing and that too on billions of documents. Deep linguistic parsing is time consuming nevertheless current trends in cloud computing are one solution to the problem.

The semantic search for health is something that could be useful for everyone. (Cause, effect relationship identification is one problem that should be addressed in such cases. And NetBase seems to have done that.)

Isn't that interesting?

NetBase is a technology partnet for Elsevier and the semantic indexing technology powers illumini8.

Friday, November 20, 2009

PhD Comics - comics in reality


Quite sure, most of the students can closely relate yourself to this 'reality' comic:

The comic is taken from :
http://www.phdcomics.com/comics/archive.php?comicid=1139

Wednesday, November 18, 2009

History of Internet and who is going to rule?

1. For readers interested in knowing more about Internet this would be a nice read:
History of internet in a nutshell.

2. And now Google is trying to make the web faster. The new protocol SPDY(SPeeDY) is being tested in the labs.
The documentation can be found here.
Link for the SPDY source code.

What would be impact if SPDY is successful?
Will Google file a patent for SPDY?

watch out ;)

Friday, November 13, 2009

Image search on Blogspot

Issuing a query in google as mentioned in the previous post is most likely to be a query for image search over a set of blogger(blogspot) images.

Though the recall of the system if very less.
I took an example to see if the system is searching blogspot images:
I have this image on blogger
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRBwG-JEIzHUfRRvYzbw7gMk4V9LZr5K0An70X-2k5LVQPBt72_iDVkWd7EDTL4o1G_2xyiOGE-q68QdnAiLFqpN0EUiFOtEM3G7icRCxPTMkUdir4lYRt1yLpqk3GvZJmzDpGhb0FNmk/s1600-h/sollu.jpg

which is uploaded on the http://medhatithi.blogspot.com/2009/05/reunion-at-jan-pashas-wedding.html

Querying google with site:2.bp.blogspot.com sollu didn't show up the actual image.

Hacking into gmail for images????

Try out querying google with the query: The number 2 can be substituted by any digit(1,2 or 3)
site:2.bp.blogspot.com mail

What can we infer from the results?
1. Is it a hack(:P) showing images from others gmail accounts?
2. Is it just showing images that link to a blogspot account?

Let me know if you know anything about this, would like to learn more about this.

Thanks in anticipation.


Thursday, November 12, 2009

The brain behind 'Quick Sort'

Sir Charles Antony Richard Hoare, the developer of the Quick Sort algorithm, talks about some of his ideas in an interview.

He also speaks about the projects that he is involved with at MSR, Cambridge.
The full article can be found in CACM here.

I like that.....,I like that......

This is about a posting about Ph.D studentship in Sentiment Analysis at University of Wolverhampton. The complete job posting can be found here.

There are two reasons behind writing this post.
1. The deadline for receipt of an email is 30 November, 2009. Candidates selected for further consideration will be sent a reply email within 48 hours of the receipt
of their original email. Non-receipt of an email means that you have not
been shortlisted.

These are few lines verbatim from the posting.
Getting a reply within 48 hours of mailing them and "non-receipt of any such reply implies that the candidate is not shortlisted" is something that caught my attention. I rarely see such people/organizations which at least specify what time you can look for a reply(positive or negative) from them. ( People who have faced the problem of not receiving a reply after applying would appreciate this kind of posting. )
:)

I think job ads of such type would draw great attention from the applicants.

2. Students who are working on summarization may find this demo link useful to carry out few experiments by tweaking some parameters.
There are other demos too.

I would appreciate the job posting and also the idea of the providing demos for experimentation.

Tuesday, November 10, 2009

Transliteration now in Mobiles :)

Transliteration is an interesting problem and Tachyon known for Quillpad has now targeted the mobile market.

Was expecting this from Tachyon but this was a bit early than I expected.
More about the release here.
The application is just 355KB and this is something interesting.

Recent workshop on Transliteration is here.

Thursday, November 5, 2009

tachyon technologies

Company: tachyon technologies
Located In: Bangalore and Chennai.
URL: http://tachyon.in/
Areas: Machine Learning, Artificial Intelligence, Programming Language Design.

The site design is in itself something different.
Quill pad and team talk are two products from the company.

More details about Tachyon's Team Talk can be found here.

Intelligent, smart Engines

Here is a small conversation between Inferno and me.

Me: Hey, buddy! What would intelligent, smart, search engines look like?
Inferno: They are the ones that can understand the user's information need, also take into account the current trends blah blah blah..... and give results.
Me: That is good to hear. But could you give an example so that I can have a better understanding?
Inferno: Hmmm! That's something like this. You query with the phrase 'hiring again'
and the so called "intelligent, smart search engines would suggest you "firing again" as the correct query.

Me: That is something interesting. :) .
Inferno: I understand, I know ;)
Kernel level overtake:
User commands disabled
Shutting down Audacious.
Praneeth, Now get back to work.
Me: ..........
..........
...........
Inferno is intelligent too......

Tuesday, November 3, 2009

Software That Fixes Itself :O

Just read an article "Software That Fixes Itself" -
A new tool aims to fix misbehaving programs without shutting them down.
This is part of the research work at MIT carried out by the research group headed by Martin Rinard, a computer science professor at MIT.

A nice read...

If you are interested more, this is the paper about the system.

Monday, October 26, 2009

10 million servers- where are we heading to?

"Google envisions 10 million servers" says Google's Jeff Dean.

As a system user and a guy with passion for large and powerful systems, the statement fascinates me to know more about the network, the electricity, the storage capacity especially the amout of time it takes to carry out a computation. More important, I would be interested in the size of the hardware maintenance team. Isn't that interesting?

The same news would pop up different set of questions for other people(with no computer background).
A person from India would immediately think of the power consumption and the bandwidth required for the data centers.

The actual question remains(at least to me):
1. Do we need such huge computing power?
2. Where are we finally heading in this 'competition for success,name,power' ?
3. What is the situation of the nature with 10 million servers dissipating lot of heat?

Is Google thinking of Technology and its effect on Nature in the long run?
Is 'Technology' making our lives easy or difficult?

No doubt, this is highly debatable. :)

Update: You can find the presentation here
Thanks Sandeep for the link :)

Nobel Peace Prize

Nobel peace prize to be awarded to newspapers and TV channels(excluding spiritual and mythology, NGC and cartoon channels) that do not report about a death, mishap or accident of any type for 24 continuous hours.

And the peace prize authorities can, without any fear, amend the rules. For sure, no TV channel or newspaper would be eligible :P

What do you say ? [;)]

Tuesday, October 13, 2009

Aria2 Project- Update

Aria is good. Was able to download 653MB in 2hours 28 minutes.
I have used only one option which specifies the number of threads to be used.

Now should try exploring other options as well.

Aria2 Project- Lightweight multi-protocol download utility

There are many download accelerators available out there for windows (Free Download Manager, Download Accelerator) that help in managing bulky downloads.

Wget, no doubt is one of the most widely used utility. Given the volume of the file for my task, wget is not suitable.

Recently have come across aria2 project. Features:

  • light-weight
  • multi-protocol
  • multi-source
  • can be operated in command-line
  • HTTP/HTTPS, FTP, BitTorrent( :) ) and Metalink.
  • built-in XML-RPC interface. Can be manipulated aria2 via XML-RPC interface.
  • Runs on Linux, FreeBSD, Mac OS X and Windows.
Hmmm.....That's a decent list of features to atleast motivate me for trying it out.
Lemme(aka Let me( expanded for the crawlers :P)) see how useful this turns out to be.

BTW: Previously, I was using aget, a multi-threaded accelerator that supports HTTP downloads. And yippie that helped me in filling my HD with many useful lectures and files but this was limited to only HTTP. :(
Adding to this the server guys have blocked usage of aget. :(

PS: Here is a small list of available informal contractions.
If you are more interested in the linguistic aspects of contractions and their limitations in usage please read this. I love this article by Browning.
Here is another by Radford.

OK, Thats too much a deviation from the topic and lemme stop here. ;)

Thursday, October 8, 2009

SVOX

Came to know about a job posting related to developing parsers etc on linguist list.

Company: SVOX
address: www.svox.com
Location: Munich, Germany.
Areas: Speech Dilog, TTS, Speech related software.

How To Write A Scientific Paper

Ahh.... don't misunderstand and I am not here to teach any guidelines that fit the title of the post. ;)

The article is from Improbable Research, research that makes people laugh and then think.

This is the link for the article.

Hmmm... yeah the article had stuff that left me thinking :)
A nice read.

Wednesday, October 7, 2009

Linux renew ip address - Force DHCP

We were struggling to bring the network up on Ubuntu machine (using DHCP).
We tried using the commands ifdown and ifup but they were unable to bring network up.

The whole problem was with the IP. The client was not releasing the current IP.
The solution is to release the current IP and renew a new one.

Steps:
1. Release the current IP.
2. Obtain a new IP.
3. Restart the network.

Open the terminal and type the following commands

1. Release the current IP
[praneeth@inferno]$ sudo dhclient -r

2. Obtain a new IP
[praneeth@inferno]$ sudo dhclient

3. Restart the network
[praneeth@inferno]$ sudo ifdown eth0
[praneeth@inferno]$ sudo ifdown eth0

OR

[praneeth@inferno]$ sudo ifdown eth0

OR

[praneeth@inferno]$ /etc/init.d/networking restart



That's it , the network should be up and running :)

Saturday, September 26, 2009

Evolve 24

Company Name: Evolve 24
Headquarters: St.Louis
Other offices: Chicago, Illinois.
Areas: NLP, IE/IR, social networks
Website: http://www.evolve24.com

Though I didn't get much time to go through this company's profile but from the JD(for scientist) they have posted on linguist list(JD link here), it seems evolve24 is involved in many wide areas like NLP, IE/IR, ML, Social Network analysis etc.

A quick overview of the site gave an impression that the company is involved in estimating risk, identifying information posted on web and analyzing them.
It would be interesting to see how close their work matches with one of my ideas posted here.

In case you get a chance to know about the work at Evolve24 please let me know.

Monday, September 7, 2009

Textual Analytics

Location: Bangalore
Core Areas: Information Extraction
Website: www.textualanalytics.com

The Bangalore based company is more into IE from webpages and text.
One of their solutions, IntelliExtract, claims to extract named entities, ideas, concepts, associations and relationships with high precision i.e converting the unstructured text into structured data. They also claim to be able to do this for various file types.
Removing of unwanted text (advertisements, banners etc) is one variant and the other is named entity identification.

InfoProfiler, another solution from the company, is centered around extracting important information and presenting it in way that is easy to understand.
Some of the features of InfoProfiler match with that of IntelliExtract.

More information can be found at the company website.

Saturday, August 22, 2009

The Schedule......

I was just reading a short post by PaulGraham on Schedule- Manager's schedule, Maker's schedule. A meeting can sometimes can blow up your whole day and I did face the same problem many a time before. Now, to some extent, I can work without the meetings disturbing my schedule. :)

Yeah, especially when you are programming, you never know what new challenge you would be encountering. Such cases of unexpected challenges surely would keep our eyes gluded to the LCD and hands on the ergonomic keyboards. And m experience that many a time I forgot few social appointments leading to short hard feelings. A work around, is to schedule such work in the night. :D

I am also used to the "grab coffee" kind of meetings with people for two reasons. One, that would give some physical exercise, refresh my brain and get some time to breathe in the natural air. Second(may sound a bit silly) I like eating and drinking a lot.

But, do I really need a schedule when I enjoy the work I am involved in?
hmmm..... my heart would then respond
yes...... its better that I spend some time for family and friends too. :P

Tuesday, August 18, 2009

A9.com

A9.com is into providing product search services on Amazon and many e-commerce websites. A9.com is a subsidiary of Amazon.

Offices: Palo Alto, California and Bangalore, India
Areas of work: Search(product search specifically)


The interface is pleasing enough and A9 is a good platform for searching for products. Especially, the product description section has enough detail about the features of the product.

The interface also features the 'suggest' facility giving us hints of few possible combinations of phrases.
From the job profile provided by A9(Please report if the link is broken), I guess, the company is involved in Information Extraction, Data Mining. And seems they are also trying to use log files(feedback for improvement).
A9 is another company that works in computational linguistics, natural language processing and information retrieval.

TextWise

TextWise

Textwise is into building semantic technology solutions. Similarity search, concept tagging, content categorization, etc are few solutions provided by the company.
From the AboutUs page of the company's website, they are into fields like extraction, search, categorization and classification using both NLP and statistics for building solutions.

The company was formerly MNIS and MNIS- TextWise Labs.

Headquarters: Rochester, New York.
Funding type: VC funded
Founded: 1994

Friday, August 14, 2009

What is this hacker language? - 2

The hacker language is also called Leet. In one sense, Leet is an adjective used to refer substitution ciphers.
This kind of speak is also called as leet speak.

The grammar section in this article is something that caught my eye.
"Leet enjoys a looser grammar than standard English."

So will NLP systems in future focus on the leet grammar? No doubt that is too ambitious. Hopefully once people come up with systems that can understand human language, this can be achieved.
How can be the problems of Named Entity Recognition, multi-word expression finding be mapped in such languages? :P

Guess, there would be movie on AI on this kind of systems that can crack hacker's language.
Cool....... I think my thinking is going too wild :P

Wednesday, August 12, 2009

What is this hacker language?

While doing some experiments with the Google Logo, i landed up in the URL
http://www.google.com/intl/en_ALL/images/logo.gif

And then, tried checking if the directory listing was enabled( I was quite sure that it is not but just for a bit of self -satisfaction) I tried this URL, http://www.google.com/intl/
but this resulted in showing up a page not found message. The interesting part was that it suggested me a URL

www.google.com/intl/ja/

Now, i tried using the search syntax,
intl site:www.google.com/intl/

And my curiosity grew after seeing the word hacker in the URL http://www.google.com/intl/xx-hacker/

This is something interesting.
Learnt that there is a language called a hacker language.
1. Why would Google host such a page?
2. Does anyone use that site?
3. Is there something special about this Hacker Language?
4. Or is this some sort of SEO technique(instead of just showing a page not found exception, also suggesting another URL) to retain users?

Do we need another from Google?

I am shocked to hear that Google is working on another search engine, Caffeine.

Developers are inviting users to use the new service hosted at
http://www2.sandbox.google.com/

And I guess most of the people would be eager to provide feedback. And that is what that keeps Google on top.
User feedback is a priceless treasure and Google is able to get it on a large scale.

Sometimes, I feel that we ourselves are making Google intelligent.

Anyways, at the end of the day, a user wants fast and relevant results.

Will people get addicted to the new Google Caffeine.
Well, I am already addicted to the natural Caffeine. ;)

Wednesday, August 5, 2009

Richard Feynman teaches Physics to students

A great news for Physicists. Bill Gates, the founder of Microsoft, is trying to make the Physics lectures given by Richard Feynman freely available.

He dreamed of this 20 years ago when he saw the series of lectures. All these years, he has been trying to get the copy rights of the lectures. Another instance to say that Thoughts don't die easily. Thoughts live over years and with constant practice they do take life. :)
More in this link
http://news.cnet.com/8301-13860_3-10286732-56.html

It has been a year he left Microsoft and even now he spends 20 percent of his time with Microsoft. He is working with Intellectual Ventures in spinning off a company called TerraPower, that develops nuclear reactors that run on depleted Uranium.

I am eagerly waiting to listen to some of those lectures.

Tuesday, August 4, 2009

Page Cloaking / Code Swapping

Earlier, I have heard of few ways of spamming: keyword stuffing, hidden text and small text. In fact, I was using them to get my page in the top results for some queries. Yippie..... I was to some extent able to drive traffic but that didn't last long( :-( ) as I had links from few sites that had a higher page rank.
And after figuring it out that methods such as keyword stuffing, hidden text won't work, I have removed them. As far as my understanding, the only way to be in the top results is to be popular(indicating a high page rank), have a lot of back links, also have outlinks(not to spam) and relevant content.

Of late, I have also heard of another spamming technique: Page Cloaking, Code Swapping. A spamming technique where one page is submitted to the search engine and another page is shown to the end user. Finally there is a difference in the content that is retrieved as search result and the content that is seen by the user.
Spam of such kind is combated by few search engines by revisiting the pages that are indexed regularly. Spammers may have a tough time here as they don't know when the crawler/spider would again show up.
But, if the user is identified as a search engine, a separate script can run and submit different content to the spider and fool it.

IP deliver is a variation of cloaking where the content is served based on the user's IP address.

Search engines have grown intelligent enough to overcome this spamming to an extent using few anti-spam techniques. I have heard that Bing, the new search engine from Microsoft, is trying to simulate users using bots to check if the content retrieved by the search engine and the user bot is similar.
Good idea, but this would almost eat up a lot of bandwidth. But, given the huge amount of spam on the internet, it is a worth for giving relevant/no spam results.

www[2-6].google.com

I am new to this stuff. This might be a simple fact. But, I am trying to understand some naming conventions used for servers.

www[2-6].google.com

The following domains redirect to www.google.

www2.google.com
www3.google.com
www4.google.com
www5.google.com
www6.google.com

1. Does gfe have any special meaning in
gfe.core.l.google.com

Also, I have heard about google fresh bot and google deep bot.
64.68.82.x - indicates freshbot
216.239.46.x - indicates deepbot

Observing the IPs of the bot may give information about google crawling strategy. :P

Thursday, July 30, 2009

Even a bad recession is good

I was reading the article Entrepreneurship during a slump by Tim Draper.
A nice read. The author cites companies like GE, IBM, Microsoft, Shell Oil, AT&T, Merck, Johnson & Johnson, Sun Microsystems, Skype, Kodak, Polaroid, HP, and Adobe which were all started during economic downturn.

A very short summary of the article:
According to the author, recession is the best time for starting up companies and takes it positively and states that recession is good :) .
The reasons being:
1.Managers think creatively.
2. During recession a long-lasting frugality is built as a culture in t
3. Enterprenuers dont face “venture fratricide.”

Here is a sentence(verbatim) I liked in the article.

That is my advice for entrepreneurs in these times: If you take an entrepreneurial risk, make sure you go after something big. Extend your imagination. Think flying and self-navigating cars, holo-decks, brain enhancers, salt water purifiers, fusion energy, and space travel.


Another short counter by him.
Bad news is good news to the press
The article is from Communications of the ACM, August 2009.

Is Google search always the best????

I was searching for a basic tutorial on threads in Java. These days I use Bing for my search. But, this time Bing didn't give satisfactory result. As usual, I tried my luck on Google and the results were no better than Bing's. :(

Finally, I tried if Yahoo search could be of any help. and yippie.... I could find related content. :)
This has been my experience even before, whenever Google fails to show relevant results, I would try them on Yahoo and the search results on Yahoo! would surprise me. Results were more relevant which I never saw on Google's first few pages.


Now this raises a question in my mind... is Google search always the best?
When is Yahoo better than Google?

Hope the new deal between Yahoo and Microsoft brings some change in the search market.

Anyways, the tutorial by Brian Goetz on java threads is good. You can get a pdf version here.

This tutorial is presented by developerWorks.

Friday, July 24, 2009

How Robots.txt can be useful sometimes.

What's new from Google? No doubt, most of the internet users would be eager to click any article with such heading or text.
Indeed, I am one among those. ;)

At this point of time, I recall a moment when I won a bet of good food. I was able to gain access to one of my friend's yahoo account 6 years back. This was out of sheer luck that I knew his personal information and the security question was very easy to answer. Yeah, that was it. This friend of mine recollected the incident few years back and that was the time when I got the idea of having a security check for such hacks.
My idea was to record the previous login time (And make it non-editable).
Finally, Google has come up with a similar feature in gmail(adding login time, ip address and time spent). Storing the ip address is too good an idea. This would actually give us an idea of the location where the hacker is located. A useful feature enough to make sure that no one else is reading your (personal) mails.And this is one reason why I like gmail.

Well, coming to the actual point. I was actually searching for a parameter in the robots.txt to set the maximum number of scans a bot can perform on my site. Request-rate is the required parameter/setting I can use.
And while I was reading few articles, Google's robots.txt http://www.google.com/robots.txt caught my eyes. Google allowing most of its content not be crawled surprised me. But, this was no waste an effort. I came to know of Google Ventures , useful for entrepreneurs..
So robots.txt gives us some useful information too(if not for robots).
Perhaps, I may be gaining some useful information from other site one day.

Many lines in Google's robot.txt but with a Disallow tag prefixed.
No harm, I am no robot to be disallowed :P

Wednesday, July 22, 2009

Looking for Internship in France???

Xerox Research Center Europe is part of Xerox Innovation Group with over 550 researchers and engineers and is working in the areas of Parsing & Semantics, Machine Learning, Large Scale Data Mining, Textual & Visual Pattern Analysis, cross-language technologies, statistical analysis, xml, visualization or software development.

Organization: Xerox Research Centre, Europe.
Location: Grenoble.
Working Language: English
Employees and Researchers: ~ 550
Website: www.xrce.xerox.com

The center also has internship positions for Master's and PhD students in mathematics, linguistics, psychology.
http://www.xrce.xerox.com/internships/home.html

A good news for Indian students :)
Xerox also has Open Innovation partnerships with institutions in India.
More details about this are available at:
http://www.xrce.xerox.com/internships/home_India.html

There are few demos available on their site
http://www.xrce.xerox.com/competencies/content-analysis/homepage.en.html

The language guesser/ identifier demo is appreciable effort and it would be good if such work is done for Indian Languages too. :)

I feel XRCE is a good place for students pursuing their Master's degree to do an internship.

PS: Please update me of any broken links in this post.
PS2: The information provided is subject to change. Please check the main site for more details. :)

Monday, July 20, 2009

Installing Flash Plugin for Firefox 3.5 on FC11 x64

Three simple steps for installing adobe flash player for Firefox 3.5 on FC11 x64

1) Download the libflashplayer tar ball (libflashplayer-10.0.22.87.linux-x86_64.so.tar.gz) from Adobe website.

2) Uncompress the tar ball

[praneeth@inferno ]$ tar -zxvf libflashplayer-10.0.22.87.linux-x86_64.so.tar.gz

3) Now copy the file libflashplayer.so from the current directory to /usr/lib64/mozilla/plugins/

[praneeth@inferno ]$ cp libflashplayer-.so /usr/lib64/mozilla/plugins/

4) Restart Firefox :)

Saturday, July 18, 2009

H5 Technologies

I was browsing and found this company that works on providing solutions for IR and document analysis for legal departments and other law firms.
They have offices in SF and New York. They also have an office in Mumbai.
The work is interesting and here is a brief description of the company.

Location: SF, NY
Work: IR and Document Retrieval for law firms and legal departments.
Areas: NLP, IE, IR, Linguistics.
More about the company at http://www.h5.com/about/index.html

Here is a short description of the company:

H5 is the leading provider of information retrieval and document analysis
services for Fortune 500 corporate legal departments and leading law firms.
H5 was recognized with a number three ranking among the fastest growing
technology companies by Deloitte's Technology Fast 50 Program. H5, a
privately held company with strong venture capitalist backing, was also
included in the "Cool Vendors in Content Management, 2007" report by
premier analyst firm, Gartner, Inc.

H5 is an information retrieval firm that helps law firms and corporations
search, assess, and manage electronically stored information. Through a
full range of advisory, document review, and litigation support services,
H5 finds the information clients need for litigation and investigations,
compliance, and litigation readiness. Our approach - which combines
advanced technologies with expertise in law, linguistics, computer science,
and statistics - consistently reduces clients' information management costs
while minimizing their risk.

Employment opportunities with H5 are differentiated by our unique value
proposition. We offer an environment where you will partner with a diverse
team of sophisticated knowledge workers who share a drive to succeed, a
passion for solving our clients' most challenging problems and a strong
intellectual curiosity. Our expectations for performance are high and each
team member has a clear line of sight connecting their individual
contributions toward the achievement of department and corporate
objectives. For more information please visit the website listed above.
http://www.h5.com

I would personally classify this company as NLP, IE/IR company.
Natural Language Processing /Information Retrieval because document analysis would require a bit of NLP.

The work is interesting because it would reduce a lot of manual effort. Let me know if you come across more information about the nature of work.

On-Demand Webinar: Scaling Hadoop for MapReduce Applications

Google's map reduce mechanism has had a considerable effect on computing.

https://dct.sun.com/dct/forms/reg_us_2005_941_0.jsp?

Hadoop, an framework in java for carrying out distributed processing is gaining importance. The framework is specially designed for data intensive applications.
Hadoop is inspired by Google's Map Reduce

Organizations like Facebook, A9, Powerset, NYTimes use Hadoop for distributed computing.

Hadoop is created by Doug Cutting, . He also contributed to Nutch, an open-source search technology.

Friday, July 10, 2009

A smart tutorial by Smart on Lucene

Found this quick introduction to Lucene by Smart.

http://www.informit.com/articles/article.aspx?p=461633

He explains the three basic steps involved in using the lucene library.

1. Creating an Index

2. Indexing an Object

3. Full-Text Searching

Wednesday, July 8, 2009

Changing default Java in Ubuntu

Just perform these steps to change the default Java in Ubuntu. I have been using the absolute path all these days...... but now I need not do that anymore :)
1. Check the current version of Java using the command

java -version

This may result in some output of the form.
java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

2. Check the available Java installed in your system.

update-java-alternatives -l

java-6-sun 63 /usr/lib/jvm/java-6-sun
java-gcj 1042 /usr/lib/jvm/java-gcj


3. Now select the Java version you want as default.

update-alternatives --config java
There are 4 alternatives which provide `java'.

Selection Alternative
-----------------------------------------------
1 /usr/bin/gij-4.2
+ 2 /usr/lib/jvm/java-gcj/jre/bin/java
* 3 /usr/lib/jvm/java-6-sun/jre/bin/java
4 /usr/bin/gij-4.1

4. Enter you Java selection number.

Thats it! You are done.........
Now check the new default java using

java -version

Wednesday, June 24, 2009

It's good to know negatives too!!!

Yippie..... I got an offer from a company X. I got another offer from company Y.
And now the dilemma begins..... whether to go for company X or company Y.

And the natural tendency is to search for a company's positive points and select the
one with more positives.

Interestingly, if one can come up with an 'Oracle' can list the positives and
especially the negatives of a company?

In simple terms,
'A Spider crawls the web for content about companies, performs sentiment analysis for getting polarity of the sentences for ranking'.
A user gives the name of a company he likes to know about:
Input: company X
Output: the positives and negatives of the company X.

This thought flashed in my mind when I was browsing for details
about a company(from which I got an offer). From few employees I have heard that the company
is closing down and there are few legal issues the company is involved in.

The task is quite challenging.
I am eagerly waiting for a product that focuses on this work.