Google Scholar Metadata: Monitoring Papers with Google Spreadsheets

Published by Tomas  Vitvar on July 10, 2009 in Ideas,Tools

Google Scholar is a great thing which you can use to monitor citations of (your) papers. The major problem I have with Google Scholar, though, is that it does not expose any metadata about papers and citations. I have thus created a simple solution that allows you to extract papers’ titles and their citation counts and expose them in an atom feed. A nice thing is that all you need for that is Google Spreadsheets.

See my citation spreadsheet and the related atom feed for more details. In case you want to have your own, just create a copy of the spreahsheet (File/Make a Copy), change the value of the name field or change the Google Scholar URL as you like (on the input sheet), and share the citations sheet as Atom feed (Share/Publish as a Web Page and then select the citations sheet and the ATOM format). After that just add the feed as a new subscription in your favorite feed reader. Note that the feed gets updated every 5 minutes (this is what you cannot influence).

UPDATE: I am now displaying the citations of my papers on citations tab in my publications page. I only parsed the XML atom feed in JavaScript and rendered it in HTML. However, due to the current XHR cross-domain restrictions I had to create a proxy in Apache serving my domain to access the atom feed URL on google domain.

Exclude Self-traffic from your Website’s Access Reports

Published by Tomas  Vitvar on January 13, 2009 in Ideas,Programming

I use Google Analytics to track my website’s traffic. Since I also often use information on my website to quickly search for a paper I wrote and refer to it, I access my website quite often. From this reason I need to exclude my own traffic from reports generated by Goolge Analytics. Google Analytics comes with two suggestions how to do this, however, neither of them is really suitable for me. The first option is to exclude all traffic based on one or more IPs. I can set a filter to exclude a traffic from my work as well as home networks, however, I do not really want to exclude a traffic from my colleagues which such a filter would do too (my work place has a single public IP shared by all outgoing connections). The other option is to set a variable (cookie) on all pages you want to exclude and create a filter based on that variable. This option is not any better as I would need to always set such variable on every new page I create, that is, call a specific javascript method when page loads, deploy the page to my web server, access the page to set the cookie for my own access, and then remove the javascript method call and redeploy the new page for public access.

Fortunately, a very simple solution came into my mind (currently only works on Firefox). First, I create a custom variable in the Firefox configuration settings called general.useragent.extra.private and set its value to my_agent” (please use your own unique identification, type about:config in your browser’s address bar to create and set such variable). This will add the “my_agent” string to the browser’s agent identification that you can read from within the javascript in the browser by userAgent property of the navigator object (navigator.userAgent). After that you can just add a simple condition to your page’s javascript code that calls Google Analytics methods to track your page’s traffic. The script could look like:

<script type="text/javascript">
    if (navigator.userAgent.indexOf('my_agent') == -1) {
        var pageTracker = _gat._getTracker("UA-xxx");
Update: In Safari, you can set a custom user agent string by enabling Develop menu (goto "Preferences->Advanced->Show Develop Menu in Menu Bar") and in the develop menu "User Agent->Other..." set the user agent string.    

Success of a PhD Endeavor

Published by Tomas  Vitvar on July 11, 2008 in Ideas,Research

What are successful factors in completing a PhD degree? There is no ideal student who knows what he/she wants from the very beginning till the end, however, there are several aspects that a student should learn in order to be well prepared to work as a research in the future. Although a PhD degree is awarded based on a successful defense of a PhD thesis — a report on research results that a student completes during his/her studies — a PhD student should be active in the community, publish papers in conferences and journals, and, most importantly, do some innovative work. A PhD thesis is thus a report describing results of a PhD work which reflects the student’s life.

In my view, you as a PhD student should ideally:

  • Know what to do. Before you start a PhD you should already know what your PhD should be about. This does not mean that the idea should be clear, but the direction of your work should be clear. This is usually dependent on a research group that you are working with, however, frequent changes of topics is not a good sign.
  • Be at the right place. It is important that you are affiliated with a right group that works in the area of your PhD topic. A student alone is unlikely to do a valuable research as research areas are usually so broad that one cannot cover. Talking to professors, attending meetings, lectures, etc. is the most important thing in getting enough background knowledge for your thesis.
  • Have the experience. Depending on the research topic, it is sometimes important that you have some experience with “real life”. Research is about creating new methods, technologies, or techniques which can be used for better solutions of real problems. Lack of real-world experience might cause resulting work to be “off the grounds”.
  • Have a motivation. Doing a PhD is a long way to go. Getting familiar with the field, learning how to publish and write, finding out gaps to solve, etc. are all important aspects of your research. It is very much easy to lose the motivation on the way in many aspects. You may feel that what you do does not have any value. You may feel that you cannot do any innovative work as what you do has already been done by hundreds of others. You may feel “down” once you get rejection of your paper from a workshop or a conference. At some point in time you will understand that all this is about understanding of how to do research, how to publish and write.
  • Keep deadlines. It is easy to say “there is enough time, I will do it tomorrow” or “there will be another opportunity to publish my paper”. Postponing your deadlines is a start of losing your way in doing your research and, most importantly, completing your thesis. External deadlines such as those set by conferences or journals are very important as you cannot change them, so keeping those always bring you a step forward.
  • Know your supervisor. A good supervisor is one of the most important things in your PhD. A supervisor is an expert in your field and gives you feedback to your intermediate results and teaches you the technical quality of your work. He/she should also provide you with the access to the community, that is, he/she should introduce you to people, research groups, and provide you with publishing opportunities (well-established conferences, journals, magazines etc.). The supervisor is also usually busy as he/she might have more students, managing more projects, etc. Despite what your supervisor does or does not, it is the person who approves your work and eventually your thesis. So, it is important that you learn how to deal with your supervisor, i.e. what are his/her requirements and what you have to do to fulfill those requirements.
  • Have enough time. Students are usually young, knowledgeable and enthusiastic so it is easy to commit students to too many things. You can end up teaching, working in projects, managing projects, organizing conferences, meetings, or doing some evangelism of your research field. Although all of these are very much important tasks that are certainly very important to learn, however, you should keep them in line of your original PhD work while at the same time not committing to too many of them as they can easily distract you from your work.

I have also seen many students who started their PhD from several reasons. The first group of students just want to extend their student life – they feel to be still “young” to start a “serious” life while they want to stay in touch with the university, with student style of living. The second group of students are naturally born theoreticians and researchers who want to push their idea forward, make it right and make it real. The third group of students love expressing themselves in front of some audiance, they love to teach and explain stuff to others. The fourth group of students want to get their degree as they think it will bring them an advantage in finding a good position in the future. There is certainly a big overlap between these groups, however, a common thing to all is that as a PhD student you are supposed to learn what the research is about so that you are well prepared to work as a researcher in the future.

RESTful Services and Semantic Descriptions

Published by Tomas  Vitvar on January 10, 2008 in Ideas,Projects,Research

Today, we had a WSMO phone conference where we discussed the semantic annotations for RESTful services. I have presented the work done by Amit Sheth and his group on SA-REST (see the presentation below and my previous post).

RESTful services are usually described in a free-text form in HTML while service descriptions (i.e. service contracts) are not explicitly defined. In addition, when creating Web 2.0 applications (mashups), the problem with integration of data produced or consumed by these services is still a remaining issue. A developer must either implement a mediator or change the implementation of a service (if possible) to conform to integration needs. SA-REST introduces a novel approach to annotation of RESTful service description in a HTML using microformats. Semantic descriptions can significantly improve the data integration and automation of service lifecycle. SA-REST proposes to use W3C recommendation where possible, thus the annotation mechanism is based on RDFa and GRDDL

SA-REST, however, does not define any forms of semantic descriptions but assumes that such descriptions will be reused. In this respect, SA-REST is an analogous approach to semantic annotations of WSDL using SAWSDL. In the WSMO WG, we have recently done the work on WSMO-Lite (see our paper in ECOWS 2007 conference) which defines a minimal lightweight service ontology and which can be used for annotations of WSDL services by means of SAWSDL. This is the new approach to augmenting existing service descriptions already available (within or outside of enterprises) in a bottom-up fashion. However, it is important to note that WSMO-Lite is independent on WSDL (and SAWSDL). In this respect, we plan to use WSMO-Lite as a concrete service ontology for annotation of RESTful services, and possibly build on top of SA-REST. This will introduce the second annotation mechanism for WSMO-Lite allowing to use both, WSDL and RESTful services as mechanisms for invocation and communication. We call this annotation mechanism MicroWSMO.

The MicroWSMO together with WSMO-Lite are the core specifications of the upcoming EU funded project SOA4ALL.The goal of this project is to enable SOA architectures in the large-scale Web environment where semantics will play the central role in service provisioning, automation, and scalability.

Linux Freedom — Basis for Integrated Home Building on Open Source

Published by Tomas  Vitvar on January 3, 2008 in Ideas,Linux

I have recently purchased a Linksys WRTSL54GS router, and installed a third-party firmware called OpenWrt.

It is really amazing as I actually got almost a fully-fledged linux running in a very small box (see for example the packages available for white russian distribution of OpenWrt). In fact, I was always thinking about having a linux server for my home network management, however, I did not like the idea of having a big server (due to the power consumption, space and possibly noise). I have deliberately chosen WRTSL54GS as it is quite powerful (comparing to other devices in the area): 8MB of flash and 32MB of RAM, it has a USB port and is only around $100. The only drawback is that Linksys does not sell it in Europe so I had to get one on eBay from the US directly. With shipment costs it was something around 100€. An alternative would be for example ASUS, FREECOM, Siemens, etc., however they are usually more expensive, less powerful or they include HDD already. There is also an interesting book Linksys WRT54G: Ultimate Hacking describing various linux distributions for various Linksys devices as well as hardware hacking tips and tricks. This is a must have too if you plan to do more than just clicking on router’s management web interface.

So may hands are free to configure my SOHO network directly using iptables, cron, samba, and other great stuff which OpenWrt offers. With more devices on my network it will be even more fun to manage all…

With the need for integration of various data sources from various devices on the SOHO network, there is a joint initiative between DERI Galway and EPFL called semantic reality. The idea is to develop a technology for seamless integration of various devices (sensors) in ad-hoc networks. Although I am not personally involved in this project, I believe that linux-based firmware provides a great flexibility for functionality — the basis for integrated home building on open source software.

SA-REST: Semantic Annotations for RESTFul Services

Published by Tomas  Vitvar on November 8, 2007 in Ideas,Research

IEEE Internet Computing magazine in its November/December issue published the article authored by Amit Sheth et al. entitled SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups. They discuss how to enable semantic annotations for RESTful services in an analogical way as SAWSDL does (see my previous post and article on SAWSDL). They define a very simple mechanism to mark input, output, lifting, lowering and fault in the REST specification usually available through some XHTML page and by using RDFa and GRDDL. The main point is that since REST providers usually define the services in a textual form on the web there is no explicit and formal form for definition of input, output or fault schema for messages. SA-REST introduces a micro-format style to their semantic description as part of the REST service specification embedded in a XHTML page.

I only wonder why authors define input and output keywords for SA-REST and do not adopt SAWSDL modelReference. SAWSDL modelReference is more generic annotation you can use for any kind of service description including information model (like they do with input and output) as well as functional (capability such as preconditions and effects) or non-functional descriptions. In my opinion, it would also be handy that the annotation framework is the same as the one introduced by SAWSDL as it would allow to work with independent semantic layer on the top of technologies like WSDL and REST.

What I like in this work is the way how semantic annotations for services can be done using micro-format style to definition of meta-data about resources (in this case XHTML describing a RESTful service). They use RDFa and GRDDL for that purpose. This approach very much complements our work on WSMO-Lite and it is inline with what we plan to further introduce in our conceptual models for services around WSMO. This will all happen in the EU FP7 project SOA4ALL and W3C Incubator Group called SWS-Testbed (Amit contributes to this group with semantic annotations for REST too).