Tag Archive | XML

PHP Function Information Downloader

The Resulting XML Document from a PHP Downloader Request

I’ve spent much of the last day building a little application in preparation for more work on my final year project. As I’ve mentioned before, my project is to build an IDE for the web programming language PHP, so one of the things I have to be able to do is autocomplete and autocorrect PHP function names, and give the user information on functions.

In order to be able to do this I need the information in the first place, after a search online for a machine readable list of functions, their parameters and their descriptions — perhaps in JSON or XML format — I was unable to find anything of use, therefore I decided I would have to scrape this information from the PHP.net website myself.

Scraping for functions was made somewhat easier than I thought it might be because the people at PHP.net maintain a quick reference page avaliable at http://www.php.net/quickref.php. After downloading this my program searches for anchor tags (links) in the unordered list element called ‘quickref_functions’, ensuring that what it is looking at is indeed a page about a function and not a page about a class, by ensuring the url starts with “/manual/en/function”. I then grab the hyperlink reference from each anchor tag.

Once I have parsed the entire page I am left with a list of URLs, each of these URLs points to a page with all the information about a particular function. For example, the first URL I always have in the list is http://www.php.net/manual/en/function.abs.php which gives all the information about the function abs();

I go through and download each of pages at these URLs and parse them, taking out the function signature, which looks like this:

number abs ( mixed $number )

Information about each parameter, the return type, the description and a link to the comments section. In the coming few days I hope to be able to parse out information allowing me to flag deprecated functions, provide popular comments about the function and show code examples amongst other things.

At the moment once I have downloaded and parsed all 4700+ function pages and parsed them I output them to an XML document, eventually I will insert all of them as records into a NoSQL database.

Because downloading and parsing all the information takes quite a while I will ship my IDE with a pre made database of PHP functions, but will allow uses to, via an advanced settings panel, attempt a redownload of all functions in order to download new ones, or changes to old ones. I think this will be a function that only the most advanced users use, if anyone does at all, but it is however an interesting unique selling point to my product.

Danny.

Advertisements

Posting Source Code on WordPress

I’ve posted a lot of non computer science blog posts recently, but this is supposed to be “the blog of a budding computer scientist!”. So here’s a post which I hope will help a few of my fellow computer science bloggers who use the WordPress blogging platform — which by the way is fantastic.

Quite often when I see blog posts that contain source code it’s formatted in an annoying way, doesn’t have any colour coding or in a worst case scenario is a screenshot of an IDE. It’s impossible for people to copy your code if you take a screenshot of it, and in my experience if you post your code online you want people to copy and adapt it for their own use.

On wordpress you could use <pre> tags in the HTML editor to make code boxes like the following:

//Here's some <pre> formatted code
public static void Main()
{
      Console.WriteLine("Hello WordPress");
}

Thats all well and good, it keeps the code seperate from the content of the blog post and gives it a different font and background colour to differentiate it as code, however those of us who are used to working in an IDE, such as Visual Studio, with its syntax highlighting may find it less friendly to read. This is where one of WordPress’ best features comes in.

The [ sourcecode ] tag allows you to post fully colour coded source code in a variety of languages including C#, C++, JavaScript and XML. It also adds some other features like line numbering,  code printing, copy to clipboard and view source — It looks like this:

//Here's some [ sourcecode ] formatted code
public static void Main()
{
      Console.WriteLine("Hello WordPress");
}

Much better!

All you have to do is wrap your code like so

[ sourcecode language="LANGUAGECODEHERE" ]
          //Code here
[ /sourcecode ]

Without the space in front of sourcecode (Which I’ve had to put to prevent WordPress from actually making it into a source code box). You then have to replace LANGUAGECODEHERE with the code corresponding to the programming language you are posting:

  • C#’s code is “csharp”
  • XML’s code is “xml”
  • PHP is “php”
  • Java is “java”

You can see all 29 codes at this helpful wordpress help page.

I hope this helps a few of my fellow CS bloggers! 🙂

Danny