Problem Statement:
Let's say you have a number of documents. Each document has a number of words (we refer them as terms). The problem is to identify two documents which are most similar.
Given below is one of the solution approaches
Use of Term Document Matrix:
Each row of a Term document matrix (let's name it as D) is a document vector,
with one column for every term in the entire corpus of documents. The matrix is sparse as not all document might contain a term. The value in each cell of the matrix is the term frequency.
example:
docid term1 term2 term3 term4 term n
d1 2 1 0 0 3
d2 0 2 3 4 1
d3 1 0 4 2 0
.
.
The transpose of the same Term Document Matrix (DT) will look as follows
docid d1 d2 d3 .............. dn
term1 2 0 1
term2 1 2 0
term3 0 3 4
.
.
We can create a Similarity Matrix (S) by multiplying D with DT { e.g. S = D * DT}
The structure of Similarity matrix will be as follows
docid d1 d2 d3 ............. dn
d1 x11 x12 x13 x1n
d2 x21 x22 x23 x2n
d3 x31 x32 x33 x3n
.
.
dn xn1 xn2 xn3 xnn
where Xmn = SUM Product of all term frequencies of docids dm and dn
Intuitively higher the value of Xmn, the more similar are the documents with doc ids dm and dn.
Coming back to our original problem, find which to documents are most similar.
Simply look into the Similarity matrix and find our the row, col of the highest value in the matrix.
Reflections on impact of Digital on Business, agility in software development, Leadership and Organizational behavior.
Saturday, May 18, 2013
Sunday, May 5, 2013
The Serving Leader
"The Serving Leader" by Ken Jennings and John Stahl-Wert was my weekend reading. The book talks about five actions that can transform teams, business and community. My key takeaways from the book are
- Serving leaders run to great purpose by holding out in front of their team a "reason why" that is so big that it requires and motivates everybody's best efforts
- They qualify to be first by putting other people first
- They raise the bar of expectation by being highly selective in choice of team leaders and by establishing high standards of performance
- They teach serving leader principles and practices and remove obstacle to performance. These actions multiply the serving leader's impact by educating and activating tier after tier of leadership
- They build on strength by arranging each person in the team to contribute what he or she is best at. This improves everyone's performance and solidifies teams by aligning the strengths of many people
Some of the thoughts were counter intuitive such as "focus over strength over addressing weaknesses". Yet the argument that "it is foolish to pour all our energy into turning weaknesses to serviceable mediocrity" - makes profound sense.
Overall a good read and fodder for introspection.
Subscribe to:
Posts (Atom)