Experiences of developing under and porting to Windows

Contents

1 Introduction

These notes were written to record my specific experiences of getting a particular set of code working under the Microsoft Windows XP operating system. The code in question consisted of components written in Tcl and Python.

2 Development environment

I use the DOS prompt as a substitute for the UNIX command prompt. To ensure that Python and Tcl interpreters are available directly, I made sure that the folders containing the appropriate executables were added to my Path.

To do this, I selected Start/Control Panel/System, clicked on the Advanced tab, and selected the Environment Variables button. This should bring up a dialogue for setting and editing the environment variables.

To add to the Path, highlight this variable in the list and then click on "edit". You can then check that the Path contains the folders for Tcl and Python. If not then you can add them by appending the appropriate value using a semicolon as the Path separator, e.g.:

...;C:\Python25;C:\Program Files\Tcl

3 Batch files

The applications that I've been working with were primarily developed under UNIX (with an eye on making them work later under MS Windows), and rely on a variety of different environment variables being set.

One way to replicate this is to explicitly set the same variables as described above. However my preferred method (for now at least) is to create a batch file for each application, which can set the variables locally and execute the appropriate command to launch the application.

To do this required learning some batch file syntax and protocols.

3.1 Naming

Batch files should have a ".bat" or ".cmd" extension.

3.2 Setting variables

Within a batch file you can use the "setlocal" and "endlocal" commands to define a context for (re)setting environment variables, e.g.:

setlocal
set DBCCP4I_TOP=C:\CCP4\pjb93\
set Path=%Path%;C:\Python25
endlocal

Note that surrounding a name with "%"s causes it to be treated as a variable, and the variable's value should be substituted.

Note also that Windows uses a semicolon (";") as the path separator, whereas UNIX uses a colon (":").

3.3 Running a program

Use the "call" command to run a program, e.g.:

call python myscript.py

3.4 Echoing to DOS window

Use the "echo" command, which seems to replicate that found under UNIX.

It appears that by default the batch execution echoes all commands to the window; this can be turned off by using "@echo off". NB I don't know if this suppresses all echoing, or just the default DOS echoing.

3.5 Using batch file parameters

It is possible to access parameters specified when the batch file is executed, using the %x notation. "x" is a number from 0 to 9. %0 is the name of the batch file, and %1 ... %9 are the first 9 arguments passed to the batch file. (To access the 10th and beyond, use the "shift" command - I haven't investigated this).

The %* notation is a special form which references all the arguments except for %0.

An example (from the documentation): if the following command was run:

mybatch.bat C:\folder1 D:\folder2

then within the batch file these parameters could be accessed e.g. using the line:

xprog %1 %2

which would be expanded to:

xprog C:\folder1 D:\folder2

3.6 Comments

Use "rem" to add comments to the batch file, e.g.:

3.7 An Example

The following is an example of a small batch file that is used to launch a Tcl program:

@echo off
rem Batch script for dbviewer
rem Generated automatically at 13:38:14 15 May 07
echo Starting dbviewer
setlocal
set DBCCP4I_TOP=\CCP4\pjb93\BIOXHIT\Bioxhit_db\dbccp4i
set PYTHONPATH=%PYTHONPATH%;%DBCCP4I_TOP%\dbccp4i;%DBCCP4I_TOP%\ClientAPI
call wish %DBCCP4I_TOP%\application\viewer.tcl %1
endlocal
echo Finishing dbviewer

This creates a "local environment" similar to a UNIX subshell where environment variables can be created or modified in order to allow the program to run correctly.

4 Programming Tips

4.1 Path separators

UNIX and Windows path separators are different, however Python and Tcl both provide functions that work around these differences. So you always avoid hardcoding paths with directory separators, and use these functions instead.

For example, instead of:

set my_path [subst $base]/foo/bar

use the "file join" command:

set my_path [file join $base foo bar]

In Python use the "os.path.join" command, e.g.

my_path = os.path.join(base,"foo","bar")

However, there are some gotchas particularly for Tcl scripts, where the default Windows separator is also used as an escape character. It is typical for strings like "\my\string\" to be transformed into "mystring" i.e. the path separators are substituted away.

For Tcl therefore it is recommended that you use the alternative separator (which happens to be identical to the UNIX one). This doesn't appear to be a problem provided that the same convention is used consistently within each path - so avoid mixing "\" and "/" separators in the same name.

4.2 Username and home directory

In UNIX the USER environment variable gives the user's login name, and the HOME environment variable gives their home directory. Under Windows XP, the USERNAME and USERPROFILE variables perform the equivalent functions.

Aside: it seems that in Tcl the tcl_platform array also has a "user" element that gives the username, and that this works for both Windows and Linux.

The Microsoft Windows XP Professional Product Documentation has a Command shell overview that includes a list of the available environment variables (in the section "Using environment variables with Cmd.exe").

4.3 Executable programs

In UNIX the names of executable programs are absolute, so that if you have an executable called "foo.exe" then this will be entirely distinct from "foo".

However under Windows "foo" and "foo.exe" are interchangeable. So if you typed "foo" then this would be expanded to "foo.exe" automatically.

4.4 Platform Information

Tcl has a global array called tcl_platform. The elements of this array have useful information about the current platform. The following script is a useful way to examine these values:

foreach name [array names tcl_platform] {
  puts "$name \t $tcl_platform($name)"
}

On my Windows XP machine this gives the following output:

osVersion 	 5.1
byteOrder 	 littleEndian
threaded 	 1
machine 	 intel
platform 	 windows
os 	 Windows NT
user 	 pjb93
wordSize 	 4

while on my Linux SuSE 9.0 VMware installation, the same script gives:

osVersion 	 2.1.4-99-default
byteOrder 	 littleEndian
machine 	 i686
platform 	 unix
os 	 Linux
user 	 pjx
wordSize 	 4

Note that both scripts were running on Tcl 8.4 (so not all values appear to be available on all systems). You can get the major/minor version numbers for the Tcl interpreter via the $tcl_version inside tclsh.

Python offers`the sys.platform attribute which identifies the system that the interpreter is running under, and the os.name attribute which gives the names of OS-specific modules. Note that sys.platform will be "win32" for all flavours of Windows.

The os module also defines a number of "portability constants", for example os.sep defines the string to separate directories and os.pathsep defines the string to separate entries in the system path.

4.5 Starting background processes in Tcl and Python

In Tcl the exec command provides a UNIX-like environment where specifying the ampersand character ("&") at the end of the command being exec'ed causes it to be executed as a background process on both UNIX and Windows systems.

In Python you can use the os.spawnl(...) command with the os.P_NOWAIT flag, to start a process in background on both UNIX and WIndows, e.g.:

status = os.spawnl(os.P_NOWAIT,"python.exe","python","bgscript.py")

5 UNIX versus Windows line-endings

UNIX and Windows use different control characters to denote line endings in text files. UNIX uses a line-feed character (LF) while Windows uses carriage-return followed by a line-feed (CR+LF).

The result of this difference can be seen for example if you take a text file written under Windows, move it to a Linux machine and then load it into emacs - the lines will appear to end with "^M"s, which is emacs's way of showing the additional carriage return character used in the Windows system. Going the other way, a UNIX text file may appear as a single long line when viewed in Windows.

As well as causing problems if source code files are transferred between systems, the line-ending differences can also cause problems for example for programs that read text files line-wise by breaking on newlines.

For a more substantial treatment of the issues, background and so forth, see the Wikipedia entry on Newline.

5.1 Converting Files

There are a number of ways to convert files.

Python includes a couple of scripts that will convert between different line endings, in the Tools/scripts/ directory:

The usage under UNIX is crlf.py file1 [file2 [...]].

There are also UNIX utilities like dos2unix, unix2dos etc.

Under Windows, UNIX line endings can be converted by loading the file into the "Edit" program and then resaving with the same name.

5.2 Reading Files

Both Tcl and Python will automatically deal with different line endings when reading and writing using line-buffered functions. However you will need to take care if you are reading "raw" data from files, for example if you expect that files from one platform might be read on another and are attempting to match line endings e.g. using regular expressions.

The os.linsep attribute in Python gives the string used to terminate lines on the current platform.



P.J.Briggs@dl.ac.uk