Keywords in Context
Very often we want to see a keyword in context (KWIC), e.g. the word and a number of words on either side.
The following text is taken from the EDINA Community Report 2014.
This Community Report sets out what EDINA does. EDINA develops and delivers world-class online services and expertise that benefit research and education in the UK and beyond. Our work is a significant part of the contribution that Jisc makes as a champion for the use of digital technologies in Higher and Further Education and in the skills sector. We seek ways to assist Jisc member organisations to succeed more effectively in their mission to improve outcome and increase impact within limited budgets. For researchers, students and their teachers this means enhancing their productivity with services that both inspire and save time, helping to make the imagined possible!
What you find written here complements other summaries of our activity and the services we deliver, as found on the Jisc website, on the EDINA website, and in the EDINA Annual Review, which forms part of our formal accountability to Jisc and its stakeholders. The uptake and use of our services continues to grow, as it has done consistently since 1995/96 when EDINA first began its part, leveraging value from the University of Edinburgh in which we are based for the wider UK academic community.
With the kwic.py script we will be able to type:
python kwic.py edina.txt services 3
and get back all the instances of the word ‘services’ in the text file.
When it’s working, output will arrive in the console, and should look like this:
delivers world-class online [services] and expertise that
their productivity with [services] that both inspire
activity and the [services] we deliver, as
use of our [services] continues to grow,
The script is reproduced below.
import sys, string, re
# command line arguments
file = sys.argv
target = sys.argv
window = int(sys.argv)
a = open(file)
text = a.read()
tokens = text.split() # split on whitespace
keyword = re.compile(target, re.IGNORECASE)
for index in range( len(tokens) ):
if keyword.match( tokens[index] ):
start = max(0, index-window)
finish = min(len(tokens), index+window+1)
lhs = string.join( tokens[start:index] )
rhs = string.join( tokens[index+1:finish] )
print "%s [%s] %s" % (lhs, tokens[index], rhs)