Blog Archives

timetrack improvements

I’ve just added a couple of improvements to timetrack that allow you to append to existing time recordings (either with an amount like 15m or using live to time additional minutes spent and append them). You can also remove entries using timetrack rm instead of remove – saving keystrokes is what programming is all about. … Continue reading timetrack improvements

Posted in Open Source, PhD, timetrack Tagged with:

AI can’t solve all our problems, but that doesn’t mean it isn’t intelligent

A recent opinion piece I read on Wired called for us to stop labelling our current specific machine learning models AI because they are not intelligent. I respectfully disagree. AI is not a new concept. The idea that a computer could ‘think’ like a human and one day pass for a human has been around since … Continue reading AI can’t solve all our problems, but that doesn’t mean it isn’t intelligent

Posted in machine learning, PhD, philosophy, Work Tagged with:

ElasticSearch: Turning analysis off and why its useful

I have recently been playing with Elastic search a lot for my PhD and started trying to do some more complicated queries and pattern matching using the DSL syntax. I have an index on my local machine called impact_studies which contains all 6637 REF 2014 impact case studies in a JSON format. One of the … Continue reading ElasticSearch: Turning analysis off and why its useful

Posted in analysis, elasticsearch, indexing, PhD Tagged with: ,

Freecite python wrapper

I’ve written a simple wrapper around the Brown University Citation parser FreeCite. I’m planning to use the service to pull out author names from references in REF impact studies and try to link them back to investigators listed on RCUK funding applications. The code is here and is MIT licensed. It provides a simple method … Continue reading Freecite python wrapper

Posted in citations, freecite, PhD, rcuk, ref, references Tagged with:

Scrolling in ElasticSearch

I know I’m doing a lot of flip-flopping between SOLR and Elastic at the moment – I’m trying to figure out key similarities and differences between them and where one is more suitable than the other. The following is an example of how to map a function f onto an entire set of indexed data in elastic using … Continue reading Scrolling in ElasticSearch

Posted in elasticsearch, lucene, PhD, results, scan, scroll Tagged with:

Keynote at YDS 2015: Information Discovery, Partridge and Watson

Here is a recording of my recent keynote talk on the power of Natural Language processing through Watson and my academic/PhD topic – Partridge – at York Doctoral Symposium. 0-11 minutes – history of mankind, invention and the acceleration of scientific progress (warming people to the idea that farming out your scientific reading to a computer … Continue reading Keynote at YDS 2015: Information Discovery, Partridge and Watson

Posted in extraction, ibm, information, PhD, retrieval, scientific, watson, Work, yds Tagged with: ,

SAPIENTA Web Service and CLI

Hoorah! After a number of weeks I’ve finally managed to get SAPIENTA running inside docker containers on our EBI cloud instance. You can try it out at http://sapienta.papro.org.uk/. The project was previously running via a number of very precarious scripts that had a habit of stopping and not coming back up. Hopefully the new docker … Continue reading SAPIENTA Web Service and CLI

Posted in docker, PhD, script, web, websockets Tagged with: ,

SSSplit Improvements

Introduction As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences. Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order … Continue reading SSSplit Improvements

Posted in demo, improvements, java, PhD, regex, split, sssplit, test, Work Tagged with: , ,

SSSplit Improvements

Introduction As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences. Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order … Continue reading SSSplit Improvements

Posted in demo, improvements, java, PhD, regex, split, sssplit, test, Work Tagged with: , ,

Tidying up XML in one click

When I’m working on Partridge and SAPIENTA, I find myself dealing with a lot of badly formatted XML. I used to manually run xmllint –format against every file before opening it but that gets annoying very quickly (even if you have it saved in your bash history). So I decided to write a Nemo script … Continue reading Tidying up XML in one click

Posted in PhD, processing, tidy, Work, xml Tagged with: