What is a compiler?

A compiler is necessary to make your source code (..c, .cpp, or .cc files) into a running program. If you’re just starting out, you’ll need to make sure that you have one before you start programming. There are many compilers available on the internet and sold commercially in stores or online. If you have Mac OS X, Linux, or other *nix variant (such as Unix or FreeBSD), you likely have a compiler such as gcc or g++ installed already.

Compiler terminology

  • Compile Colloquially, to convert a source code file into an executable, but strictly speaking, compilation is an intermediate step
  • Link The act of taking compiled code and turning it into an executable
  • Build A build refers to the process of creating the end executable (what is often colloquially refered to as compilation). Tools exits to help reduce the complexity of the build process–makefiles, for instance.
  • Compiler Generally, compiler refers to both a compiler and a “linker”
  • Linker The program that generates the executable by linking
  • IDE Integrated Development Environment, a combination of a text editor and a compiler, such that you can compile and run your programs directly within the IDE. IDEs usually have facilities to help you quickly jump to compiler errors.

    Understanding the Compilation Process

    What compilers are available?

      Windows/DOS

    • Borland Find out how to download and set up Borland’s free command-line compiler
    • DJGPP Read about DJGPP, a DOS-based compiler
    • Dev-C++ and Digital Mars Read about Dev-C++, a good windows based compiler, and Digital Mars

      Windows Only

    • Microsoft Visual C++ Read about Visual C++

      *nix

    • g++ is a C++ compiler that comes with most *nix distributions.
    • gcc is a C compiler that comes with most *nix distributions.

      Macintosh

    • Apple’s own Macintosh Programmer’s Workshop is a compiler I’ve never used, but it is direct from apple and free.
    • Codewarrior My experiences with Codewarrior are limited to Java programming, though it’s gotten good reviews in the past. It’s a full IDE rather than just a compiler, meaning that it has a text editor and debugger integrated with the compiler so you can do all your work from one place.
  • Comments

    Binary Trees: Part 1

    The binary tree is a fundamental data structure used in computer science. The binary tree is a useful data structure for rapidly storing sorted data and rapidly retrieving stored data. A binary tree is composed of parent nodes, or leaves, each of which stores data and also links to up to two other child nodes (leaves) which can be visualized spatially as below the first node with one placed to the left and with one placed to the right. It is the relationship between the leaves linked to and the linking leaf, also known as the parent node, which makes the binary tree such an efficient data structure. It is the leaf on the left which has a lesser key value (ie, the value used to search for a leaf in the tree), and it is the leaf on the right which has an equal or greater key value. As a result, the leaves on the farthest left of the tree have the lowest values, whereas the leaves on the right of the tree have the greatest values. More importantly, as each leaf connects to two other leaves, it is the beginning of a new, smaller, binary tree. Due to this nature, it is possible to easily access and insert data in a binary tree using search and insert functions recursively called on successive leaves.
    The typical graphical representation of a binary tree is essentially that of an upside down tree. It begins with a root node, which contains the original key value. The root node has two child nodes; each child node might have its own child nodes. Ideally, the tree would be structured so that it is a perfectly balanced tree, with each node having the same number of child nodes to its left and to its right. A perfectly balanced tree allows for the fastest average insertion of data or retrieval of data. The worst case scenario is a tree in which each node only has one child node, so it becomes as if it were a linked list in terms of speed. The typical representation of a binary tree looks like the following:

    			 						       10 						     /    \ 						    6      14 						   / \    /  \ 						  5   8  11  18

    The node storing the 10, represented here merely as 10, is the root node, linking to the left and right child nodes, with the left node storing a lower value than the parent node, and the node on the right storing a greater value than the parent node. Notice that if one removed the root node and the right child nodes, that the node storing the value 6 would be the equivalent a new, smaller, binary tree.
    The structure of a binary tree makes the insertion and search functions simple to implement using recursion. In fact, the two insertion and search functions are also both very similar. To insert data into a binary tree involves a function searching for an unused node in the proper position in the tree in which to insert the key value. The insert function is generally a recursive function that continues moving down the levels of a binary tree until there is an unused leaf in a position which follows the rules of placing nodes. The rules are that a lower value should be to the left of the node, and a greater or equal value should be to the right. Following the rules, an insert function should check each node to see if it is empty, if so, it would insert the data to be stored along with the key value (in most implementations, an empty node will simply be a NULL pointer from a parent node, so the function would also have to create the node). If the node is filled already, the insert function should check to see if the key value to be inserted is less than the key value of the current node, and if so, the insert function should be recursively called on the left child node, or if the key value to be inserted is greater than or equal to the key value of the current node the insert function should be recursively called on the right child node. The search function works along a similar fashion. It should check to see if the key value of the current node is the value to be searched. If not, it should check to see if the value to be searched for is less than the value of the node, in which case it should be recursively called on the left child node, or if it is greater than the value of the node, it should be recursively called on the right child node. Of course, it is also necessary to check to ensure that the left or right child node actually exists before calling the function on the node.
    Because binary trees have log (base 2) n layers, the average search time for a binary tree is log (base 2) n. To fill an entire binary tree, sorted, takes roughly log (base 2) n * n. Lets take a look at the necessary code for a simple implementation of a binary tree. First, it is necessary to have a struct, or class, defined as a node.

    struct node {   int key_value;   struct node *left;   struct node *right; };

    The struct has the ability to store the key_value and contains the two child nodes which define the node as part of a tree. In fact, the node itself is very similar to the node in a linked list. A basic knowledge of the code for a linked list will be very helpful in understanding the techniques of binary trees. Essentially, pointers are necessary to allow the arbitrary creation of new nodes in the tree.

    There are several important operations on binary trees, including inserting elmeents, searching for elements, removing elements, and deleting the tree. We’ll look at three of those four operations in this tutorial, leaving removing elements for later.

    We’ll also need to keep track of the root node of the binary tree, which will give us access to the rest of the data:

    struct node *root = 0;

    It is necessary to initialize root to 0 for the other functions to be able to recognize that the tree does not yet exist. The destroy_tree shown below which will actually free all of the nodes of in the tree stored under the node leaf: tree.

    void destroy_tree(struct node *leaf) {   if( leaf != 0 )   {       destroy_tree(leaf->left);       destroy_tree(leaf->right);       free( leaf );   } }

    The function destroy_tree goes to the bottom of each part of the tree, that is, searching while there is a non-null node, deletes that leaf, and then it works its way back up. The function deletes the leftmost node, then the right child node from the leftmost node’s parent node, then it deletes the parent node, then works its way back to deleting the other child node of the parent of the node it just deleted, and it continues this deletion working its way up to the node of the tree upon which delete_tree was originally called. In the example tree above, the order of deletion of nodes would be 5 8 6 11 18 14 10. Note that it is necessary to delete all the child nodes to avoid wasting memory.

    The following insert function will create a new tree if necessary; it relies on pointers to pointers in order to handle the case of a non-existent tree (the root pointing to NULL). In particular, by taking a pointer to a pointer, it is possible to allocate memory if the root pointer is NULL.

    insert(int key, struct node **leaf) {     if( *leaf == 0 )     {         *leaf = malloc( sizeof( struct node ) );         leaf->left->key_value = key;         /* initialize the children to null */         leaf->left->left = 0;             leaf->left->right = 0;       }     else if(key < (*leaf)->key_value)     {         insert( key, (*leaf)->left );     }     else if(key > (*leaf)->key_value)     {         insert( key, (*leaf)->left );     } }

    The insert function searches, moving down the tree of children nodes, following the prescribed rules, left for a lower value to be inserted and right for a greater value, until it reaches a NULL node–an empty node–which it allocates memory for and initializes with the key value while setting the new node’s child node pointers to NULL. After creating the new node, the insert function will no longer call itself. Note, also, that if the element is already in the tree, it will not be added twice.

    struct node *search(int key, struct node *leaf) {   if( leaf != 0 )   {       if(key==leaf->key_value)       {           return leaf;       }       else if(key<leaf->key_value)       {           return search(key, leaf->left);       }       else       {           return search(key, leaf->right);       }   }   else return 0; }

    The search function shown above recursively moves down the tree until it either reaches a node with a key value equal to the value for which the function is searching or until the function reaches an uninitialized node, meaning that the value being searched for is not stored in the binary tree. It returns a pointer to the node to the previous instance of the function which called it.

    Comments

    What is spyware?

    71892_145×90_spywarewhat_f.jpgSpyware is a general term used to describe software that performs certain behaviors such as advertising, collecting personal information, or changing the configuration of your computer, generally without appropriately obtaining your consent first.

    Spyware is often associated with software that displays advertisements (called adware) or software that tracks personal or sensitive information.

    That does not mean all software that provides ads or tracks your online activities is bad. For example, you might sign up for a free music service, but you “pay” for the service by agreeing to receive targeted ads. If you understand the terms and agree to them, you may have decided that it is a fair tradeoff. You might also agree to let the company track your online activities to determine which ads to show you.

    Other kinds of spyware make changes to your computer that can be annoying and can cause your computer slow down or crash.

    These programs can change your Web browser’s home page or search page, or add additional components to your browser you don’t need or want. These programs also make it very difficult for you to change your settings back to the way you originally had them.

    The key in all cases is whether or not you (or someone who uses your computer) understand what the software will do and have agreed to install the software on your computer.

    There are a number of ways spyware or other unwanted software can get on your computer. A common trick is to covertly install the software during the installation of other software you want such as a music or video file sharing program.

    Whenever you install something on your computer, make sure you carefully read all disclosures, including the license agreement and privacy statement. Sometimes the inclusion of unwanted software in a given software installation is documented, but it might appear at the end of a license agreement or privacy statement.
     

    Comments

    Writing for Readability

    There are a lot of ways to solve the same problem in C or C++. This is both good and bad; it is good because you have flexibility. It’s also bad because you have flexibility–the flexibility to choose different solutions to the same problem when it shows up in different places. This is confusing because it obscures the underlying similarity between the problems.

    Using Functions

    Unlike prose, where repeating the same word or phrase may seem redundant, in programming, it’s perfectly fine to use the same construction over and over again. Of course, you may want to turn a repeated chunk of code into a function: this is even more readable because it gives the block of code a descriptive name. (At least you ought to make it descriptive!)

    You can also increase readability by using standard functions and data structures (such as the STL). Doing so avoids the confusion of someone who might ask, “why did you create a new function when you had a perfectly good one already available?” The problem is that people may assume that there’s a reason for the new function and that it somehow differs from the standard version.

    Moreover, by using standard functions you help your reader understand the names of the arguments to the function. There’s much less need to look at the function prototype to see what the arguments mean, or their order, or whether some arguments have default values.

    Use Appropriate Language Features

    There are some obvious things to avoid: don’t use a loop as though it were an if statement. Choose the right data type for your data: if you never need decimal places in a number, use an integer. If you mean for a value to be unsigned, used an unsigned number. When you want to indicate that a value should never change, use const to make it so.

    Try to avoid uncommon constructions unless you have good reason to use them; put another way, don’t use a feature just because the feature exists. One rule of thumb is to avoid do-while loops unless you absolutely need one. People aren’t generally as used to seeing them and, in theory, won’t process them as well. I’ve never run into this problem myself, but think carefully about whether you actually need a do-while loop. Similarly, although the ternary operator is a great way of expressing some ideas, it can also be confusing for programmers who don’t use it very often. A good rule of thumb is to use it only when necessary (for instance, in the initialization list of a constructor) and stick with the more standard if-else construction for everything else. Sure, it’ll make your program four lines longer, but it’ll make it that much easier for most people to read.

    There are some less obvious ways of using standard features. When you are looping, choose carefully between while, do-while, and for. For loops are best when you can fill in each part (initialization, conditional, and increment) with a fairly short expression. While loops are good for watching a sentinel variable whose value can be set in multiple places or whose value depends on some external event such as a network event. While loops are also better when the update step isn’t really a direct “update” to the control variable–for instance, when reading lines from a text file, it might more sense to use a while loop than a for loop because the control depends on the result of the method call, not the value of the variable of interest:

    while (fgets(buf, sizeof(buf), fp) != NULL) {         /* do stuff with buf */ }

    It wouldn’t make sense to write this sort of thing as a for loop. (Try it!)

    Unpack Complex Expressions

    There’s no reason to put everything on a single line. If you have a complex calculation with multiple steps and levels of parentheses, it can be extremely helpful to go from a one-line calculation to one that uses temporary variables. This gives you two advantages; first, it makes it easier to follow the expression. Second, you can give a distinct name to each intermediate step, which can help the reader follow what is happening. Often, you’ll want to reuse those intermediate calcuations anyway. In addition to mathematical calculations, this principle also applies to nested function calls. The fewer events that take place on a single line of code, the easier it is to follow exactly what’s happening.

    Another advantage to unpacking an expression is that you can put more comments in-line to explain what’s going on and why.

    Avoid Magic Numbers

    Magic numbers are numbers that appear directly in the code without an obvious reason. For instance, what does the number 80 in the following expression mean?

    for( int i = 0; i < 80; ++i ) {         printf( "-" ); }

    It might be the width of the screen, but it might also be the width of a map whose wall is being drawn. You just don’t know. The best solution is to use macros, in C, or constants in C++. This gives you the chance to descriptively name your numbers. Doing so also makes it easier to spot the use of a particular number and differentiate between numbers with the same value that mean different things. Moreover, if you decide you need to change a value, you have a single point where you can make the change, rather than having to sift through your code.

    Comments

    Unicode: What You Can Do About It Today

    by Jeff Bezanson

    If you write an email in Russian and send it to somebody in Russia, it is depressingly unlikely that he or she will be able to read it. If you write software, the burden of this sad state of affairs rests on your shoulders.Given modern hardware resources, it is unacceptable that we can’t yet routinely communicate text in different scripts or containing technical symbols. Fortunately, we are getting there.

    After reading a lot on the subject and incorporating Unicode compatibility into some of my software, I decided to prepare this quick and highly pragmatic guide to digital text in the 21st century (for C programmers, of course). I don’t mind adding my voice to the numerous articles that already exist on this subject, since the world needs as many programmers as possible to pick up these skills as soon as possible.

    I. Encoding text

    Given the variety of human languages on this planet, text is a complex subject. Many are scared away from dealing with world scripts, because they think of the numerous related software problems in the area instead of focusing on what they can actually do with their code to help.

    The first thing to know is that you do not have to worry about most problems with digital text. The most difficult work is handled below the application layer, in OSes, UI libraries, and the C library. To give you an idea of what goes on though, here is a summary of software problems surrounding text:

    • Encoding
      Mapping characters to numbers. Many such mappings exist; once you know the encoding of a piece of text, you know what character is meant by a particular number. Unicode is one such mapping, and a popular one since it incorporates more characters than any other at this time.
    • Display
      Once you know what character is meant, you have to find a font that has the character and render it. This task is much complicated by the need to display both left-to-right and right-to-left text, the existence of combining characters that modify previous characters and have zero width, the fact that some languages require wider character cells than others, and context-sensitive letterforms.
    • Input
      An input method is a way to map keystrokes (most likely several keystrokes on a typical keyboard) to characters. Input is also complicated by bidirectional text.
    • Internationalization (i18n)
      This refers to the practice of translating a program into multiple languages, effectively by translating all of the program’s strings.
    • Lexicography
      Code that processes text as more than just binary data might have to become a lot smarter. The problems of searching, sorting, and modifying letter case (upper/lower) vary per-language. If your application doesn’t need to perform such tasks, consider yourself lucky. If you do need these operations, you can probably find a UI toolkit or i18n library that already implements them.

    If you are savvy with just the first issue (encoding), then OS-vendor-supplied input methods and display routines should magically work with your program. Whether you want to or are able to translate your software is another matter, and compared to proper handling of character encodings it is almost optional (corrupting data is worse than having an unintelligible UI).

    The encoding I’ll talk about is called Unicode. Unicode officially encodes 1,114,112 characters, from 0×000000 to 0×10FFFF. (The idea that Unicode is a 16-bit encoding is completely wrong.) For maximum compatibility, individual Unicode values are usually passed around as 32-bit integers (4 bytes per character), even though this is more than necessary. For convenience, the first 128 Unicode characters are the same as those in the familiar ASCII encoding.

    The consensus is that storing four bytes per character is wasteful, so a variety of representations have sprung up for Unicode characters. The most interesting one for C programmers is called UTF-8. UTF-8 is a “multi-byte” encoding scheme, meaning that it requires a variable number of bytes to represent a single Unicode value. Given a so-called “UTF-8 sequence”, you can convert it to a Unicode value that refers to a character.

    UTF-8 has the property that all existing 7-bit ASCII strings are still valid. UTF-8 only affects the meaning of bytes greater than 127, which it uses to represent higher Unicode characters. A character might require 1, 2, 3, or 4 bytes of storage depending on its value; more bytes are needed as values get larger. To store the full range of possible 32-bit characters, UTF-8 would require a whopping 6 bytes. But again, Unicode only defines characters up to 0×10FFFF, so this should never happen in practice.

    UTF-8 is a specific scheme for mapping a sequence of 1-4 bytes to a number from 0×000000 to 0×10FFFF:

    00000000 -- 0000007F: 	0xxxxxxx 00000080 -- 000007FF: 	110xxxxx 10xxxxxx 00000800 -- 0000FFFF: 	1110xxxx 10xxxxxx 10xxxxxx 00010000 -- 001FFFFF: 	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

    The x’s are bits to be extracted from the sequence and glued together to form the final number.

    It is fair to say that UTF-8 is taking over the world. It is already used for filenames in Linux and is supported by all mainstream web browsers. This is not surprising considering its many nice properties:

    1. It can represent all 1,114,112 Unicode characters.
    2. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
    3. Characters usually require fewer than four bytes.
    4. String sort order is preserved. In other words, sorting UTF-8 strings per-byte yields the same order as sorting them per-character by logical Unicode value.
    5. A missing or corrupt byte in transmission can only affect a single character—you can always find the start of the sequence for the next character just by scanning a couple bytes.
    6. There are no byte-order/endianness issues, since UTF-8 data is a byte stream.

    The only price to pay for all this is that there is no longer a one-to-one correspondence between bytes and characters in a string. Finding the nth character of a string requires iterating over the string from the beginning.

    See What is UTF-8? for more information about UTF-8.

    Side note: Some consider UTF-8 to be discriminatory, since it allows English text to be stored efficiently at one byte per character while other world scripts require two bytes or more. This is a troublesome point, but it should not get in the way of Unicode adoption. First of all, UTF-8 was not really designed to preferentially encode English text. It was designed to preserve compatibility with the large body of existing code that scans for special characters such as line breaks, spaces, NUL terminators, and so on. Furthermore, the encoding used internally by a program has little impact on the user as long as it is able to represent their data without loss. UTF-8 is a great boon, especially for C programming. Think of it this way: if it allows you to internationalize an application that would have been difficult to convert otherwise, it is much less discriminatory than the alternative.

    II. The C library

    All recent implementations of the standard C library have lots of functions for manipulating international strings. Before reading up on them, it helps to know some vocabulary:

    “Multibyte character” or “multibyte string” refers to text in one of the many (possibly language-specific) encodings that exist throughout the world. A multibyte character does not necessarily require more than one byte to store; the term is merely intended to be broad enough to encompass encodings where this is the case. UTF-8 is in fact only one such encoding; the actual encoding of user input is determined by the user’s current locale setting (selected as an option in a system dialog or stored as an environment variable in UNIX). Strings you get from the user will be in this encoding, and strings you pass to printf() are supposed to be as well. Strings within your program can of course be in any encoding you want, but you might have to convert them for proper display.

    “Wide character” or “wide character string” refers to text where each character is the same size (usually a 32-bit integer) and simply represents a Unicode character value (”code point”). This format is a known common currency that allows you to get at character values if you want to. The wprintf() family is able to work with wide character format strings, and the “%ls” format specifier for normal printf() will print wide character strings (converting them to the correct locale-specific multibyte encoding on the way out).

    The C library also provides functions like towupper() that can convert a wide character from any language to uppercase (if applicable). strftime() can format a date and time string appropriately for the current locale, and strcoll() can do international sorting. These and other functions that depend on locale must be initialized at the beginning of your program using

    #include <locale.h>  main() {     char *locale;      locale = setlocale(LC_ALL, "");     ... }

    You don’t have to do anything with the locale string returned by setlocale(), but you can use it to query your user’s locale settings (more on this later).

    The C library pretty much assumes you will be using multibyte strings throughout your program (since that’s what you get as input). Since multibyte strings are opaque, a lot of functions beginning with “mb” are provided to deal with them. Personally, I don’t like not knowing what encoding my strings use. One concrete problem with the multibyte thing is file I/O— a given file could be in any encoding, independent of locale. When you write a file or send data over a network, keeping the multibyte encoding might be a bad idea. (Even if all software uses only the proper locale-independent C library functions, and all platforms support all encodings internally, there is still no single standard for communicating the encoding of a piece of text; email messages and HTML tags do it in various ways.) You also might be able to do more efficient processing, or avoid rewriting code, if you knew the encoding your strings used.

    Your encoding options

    You are free to choose a string encoding for internal use in your program. The choice pretty much boils down to either UTF-8, wide (4-byte) characters, or multibyte. Each has its advantages and disadvantages:

    •  
      • Pro: compatible with all existing strings and most existing code
      • Pro: takes less space
      • Pro: widely used as an interchange format (e.g. in XML)
      • Con: more complex processing, O(n) string indexing
      • Pro: easy to process
      • Con: wastes space
      • Pro/Con: although you can use the syntax
        L"Hello, world."

        to easily include wide-character strings in C programs, the size of wide characters is not consistent across platforms (some incorrectly use 2-byte wide characters)

      • Con: should not be used for output, since spurious zero bytes and other low-ASCII characters with common meanings (such as ‘/’ and ‘\n’) will likely be sprinkled throughout the data.
      • Pro: no conversions ever needed on input and output
      • Pro: built-in C library support
      • Pro: provides the widest possible internationalization, since in rare cases conversion between local encodings and Unicode does not work well
      • Con: strings are opaque
      • Con: perpetuates incompatibilities. For example, there are three major encodings for Russian. If one Russian sends data to another through your program, the recipient will not be able to read the message if his or her computer is configured for a different Russian encoding. But if your program always converts to UTF-8, the text is effectively normalized so that it will be widely legible (especially in the future) no matter what encoding it started in.
    • UTF-8 Wide characters Multibyte

    In this article I will advocate and give explicit instruction on using UTF-8 as an internal string encoding. Many Linux users already set their environment to a UTF-8 locale, in which case you won’t even have to do any conversions. Otherwise you will have to convert multibyte to wide to UTF-8 on input, and back to multibyte on output. Nevertheless, UTF-8 has its advantages.

    III. What to do right now

    Below I’ll outline concrete steps any C programmer could take to bring his or her code up to date with respect to text encoding. I’ll also be presenting a simple C library that provides the routines you need to manipulate UTF-8.

    Here’s your to-do list:

    1. “char” no longer means character
      I hereby recommend referring to character codes in C programs using a 32-bit unsigned integer type. Many platforms provide a “wchar_t” (wide character) type, but unfortunately it is to be avoided since some compilers allot it only 16 bits—not enough to represent Unicode. Wherever you need to pass around an individual character, change “char” to “unsigned int” or similar. The only remaining use for the “char” type is to mean “byte”.
    2. Get UTF-8-clean
      To take advantage of UTF-8, you’ll have to treat bytes higher than 127 as perfectly ordinary characters. For example, say you have a routine that recognizes valid identifier names for a programming language. Your existing standard might be that identifiers begin with a letter:

      int valid_identifier_start(char ch) {     return ((ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z')); }

      If you use UTF-8, you can extend this to allow letters from other languages as follows:

      int valid_identifier_start(char ch) {     return ((ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') ||             ((unsigned char)ch >= 0xC0)); }

      A UTF-8 sequence can only start with values 0xC0 or greater, so that’s what I used for checking the start of an identifier. Within an identifier, you would also want to allow characters >= 0×80, which is the range of UTF-8 continuation bytes.Most C string library routines still work with UTF-8, since they only scan for for terminating NUL characters. A notable exception is strchr(), which in this context is more aptly named “strbyte()”. Since you will be passing character codes around as 32-bit integers, you need to replace this with a routine such as my u8_strchr() that can scan UTF-8 for a given character. The traditional strchr() returns a pointer to the location of the found character, and u8_strchr() follows suit. However, you might want to know the index of the found character, and since u8_strchr() has to scan through the string anyway, it keeps a count and returns a character index as well.

      With the old strchr(), you could use pointer arithmetic to determine the character index. Now, any use of pointer arithmetic on strings is likely to be broken since characters are no longer bytes. You’ll have to find and fix any code that assumes “(char*)b - (char*)a” is the number of characters between a and b (though it is still of course the number of bytes between a and b).

    3. Interface with your environment
      Using UTF-8 as an internal encoding is now widespread among C programmers. However, the environment your program runs in will not necessarily be nice enough to feed you UTF-8, or expect UTF-8 output.The functions mbstowcs() and wcstombs() convert from and to locale-specific encodings, respectively. “mbs” means multibyte string (i.e. the locale-specific string), and “wcs” means wide character string (universal 4-byte characters). Clearly, if you use wide characters internally, you are in luck here. If you use UTF-8, there is a chance that the user’s locale will be set to UTF-8 and you won’t have to do any conversion at all. To take advantage of that situation, you will have to specifically detect it (I’ll provide a function for it). Otherwise, you will have to convert from multibyte to wide to UTF-8.

      Version 1.6 (1.5.x while in development) of the FOX toolkit uses UTF-8 internally, giving your program a nice all-UTF-8-all-the-time environment. GTK2 and Qt also support UTF-8.

    4. Modify APIs to discourage O(n^2) string processing
      The idea of non-constant-time string indexing may worry you. But when you think about it, you rarely need to specifically access the nth character of a string. Algorithms almost never need to make requests like “Quick! Get me the 6th character of this piece of text!” Typically, if you’re accessing characters you’re iterating over the whole string or most of it. UTF-8 is simple enough to process that iterating over characters takes essentially the same time as iterating over bytes.In your own code, you can use my u8_inc() and u8_dec() to move through strings. If you develop libraries or languages, be sure to expose some kind of inc() and dec() API so nobody has to move through a string by repeatedly requesting the nth character.

    IV. Some UTF-8 routines

    Various libraries are available for internationalization and converting between different text encodings. However, I couldn’t find a straightforward set of C routines providing the minimal support needed for using UTF-8 as an internal encoding (although this functionality is often embedded in large UI toolkits and such). I decided to create a small library that could be used to bring UTF-8 to arbitrary C programs.

    This library is quite incomplete; you might want to look at related FSF offerings and libutf8. libutf8 provides the multibyte and wide character C library routines mentioned above, in case your C library doesn’t have them.

    Since performance is sometimes a concern with UTF-8, I made my routines as fast and lightweight as possible. They perform minimal error checking— in particular, they do not bother to determine whether a sequence is valid UTF-8, which can actually be a security problem. I justify this decision by reiterating that the intention of the library is to manipulate an internal encoding; you can enforce that all strings you store in memory be valid UTF-8, enabling the library to make that assumption. Routines for validating and converting from/to UTF-8 are available free from Unicode, Inc.

    Note that my routines do not need to support the many encodings of the world—the C library can handle that. If the current locale is not UTF-8, you call mbstowcs() on user input to convert any encoding (whatever it is) to a wide character string, then use my u8_toutf8() to convert it to the UTF-8 your program is comfortable with. Here’s an example input routine wrapping readline():

    char *get_utf8_input() {     char *line, *u8s;     unsigned int *wcs;     int len;      line = readline("");     if (locale_is_utf 8) {         return line;     }     else {         len = mbstowcs(NULL, line, 0)+1;         wcs = malloc(len * sizeof(int));         mbstowcs(wcs, line, len);         u8s = malloc(len * sizeof(int));         u8_toutf8(u8s, len*sizeof(int), wcs, len);         free(line);         free(wcs);         return u8s;     }

    The first call to mbstowcs() uses the special parameter value NULL to find the number of characters in the opaque multibyte string.

    Anyway, on with the routines. They are divided into four groups:

    Group 1: conversions

    /* is c the start of a utf8 sequence? */ #define isutf(c) (((c)&0xC0)!=0x80)  /* convert UTF-8 to UCS-4 (4-byte wide characters)    srcsz = source size in bytes, or -1 if 0-terminated    sz = dest size in # of wide characters    returns # characters converted */ int u8_toucs(unsigned int *dest, int sz, char *src, int srcsz);  /* convert UCS-4 to UTF-8    srcsz = number of source characters, or -1 if 0-terminated    sz = size of dest buffer in bytes    returns # characters converted */ int u8_toutf8(char *dest, int sz, unsigned int *src, int srcsz);  /* single character to UTF-8 */ int u8_wc_toutf8(char *dest, wchar_t ch);

    Note that the library uses “unsigned int” as its wide character type.
    You can convert a known number of bytes, or a NUL-terminated string. The length of a UTF-8 string is often communicated as a byte count, since that’s what really matters. Recall that you can usually treat a UTF-8 string like a normal C-string with N characters (where N is the number of bytes in the UTF-8 sequence), with the possibility that some characters are >127.

    Group 2: moving through UTF-8 strings

    /* character number to byte offset */ int u8_offset(char *str, int charnum);  /* byte offset to character number */ int u8_charnum(char *s, int offset);  /* return next character, updating a byte-index variable */ unsigned int u8_nextchar(char *s, int *i);  /* move to next character */ void u8_inc(char *s, int *i);  /* move to previous character */ void u8_dec(char *s, int *i);

    Group 3: unicode escape sequences
    In the absence of unicode input methods, unicode characters are often notated using special escape sequences beginning with \u or \U. \u expects up to four hexadecimal digits, and \U expects up to eight. With these routines your program can accept input and give output using such sequences if necessary.

    /* assuming src points to the character after a backslash, read an    escape sequence, storing the result in dest and returning the number of    input characters processed */ int u8_read_escape_sequence(char *src, unsigned int *dest);  /* given a wide character, convert it to an ASCII escape sequence stored in    buf, where buf is "sz" bytes. returns the number of characters output. */ int u8_escape_wchar(char *buf, int sz, unsigned int ch);  /* convert a string "src" containing escape sequences to UTF-8 */ int u8_unescape(char *buf, int sz, char *src);  /* convert UTF-8 "src" to ASCII with escape sequences.    if escape_quotes is nonzero, quote characters will be preceded by    backslashes as well. */ int u8_escape(char *buf, int sz, char *src, int escape_quotes);  /* utility predicates used by the above */ int octal_digit(char c); int hex_digit(char c);

    Group 4: replacements for standard functions

    /* return a pointer to the first occurrence of ch in s, or NULL if not    found. character index of found character returned in *charn. */ char *u8_strchr(char *s, unsigned int ch, int *charn);  /* same as the above, but searches a buffer of a given size instead of    a NUL-terminated string. */ char *u8_memchr(char *s, unsigned int ch, size_t sz, int *charn);  /* count the number of characters in a UTF-8 string */ int u8_strlen(char *s);  /* given the string returned by setlocale(), determine whether the current    locale speaks UTF-8 */ int u8_is_locale_utf8(char *locale);  /* these functions can print from UTF-8 strings. they make no assumptions    about locale; you can circumvent them if is_locale_utf8 */ int u8_vprintf(char *fmt, va_list ap); int u8_printf(char *fmt, ...);

     

    Comments

    Function Pointers

    Example Uses of Function Pointers

    Functions as Arguments to Other Functions

    If you were to write a sort routine, you might want to allow the function’s caller to choose the order in which the data is sorted; some programmers might need to sort the data in ascending order, others might prefer descending order while still others may want something similar to but not quite like one of those choices. One way to let your user specify what to do is to provide a flag as an argument to the function, but this is inflexible; the sort function allows only a fixed set of comparison types (e.g., ascending and descending).

    A much nicer way of allowing the user to choose how to sort the data is simply to let the user pass in a function to the sort function. This function might take two pieces of data and perform a comparison on them. We’ll look at the syntax for this in a bit.

    Callback Functions

    Another use for function pointers is setting up “listener” or “callback” functions that are invoked when a particular event happens. The function is called, and this notifies your code that something of interest has taken place.

    Why would you ever write code with callback functions? You often see it when writing code using someone’s library. One example is when you’re writing code for a a graphical user interface (GUI). Most of the time, the user will interact with a loop that allows the mouse pointer to move and that redraws the interface. Sometimes, however, the user will click on a button or enter text into a field. These operations are “events” that may require a response that your program needs to handle. How can your code know what’s happening? Using Callback functions! The user’s click should cause the interface to call a function that you wrote to handle the event.

    To get a sense for when you might do this, consider what might happen if you were using a GUI library that had a “create_button” function. It might take the location where a button should appear on the screen, the text of the button, and a function to call when the button is clicked. Assuming for the moment that C (and C++) had a generic “function pointer” type called function, this might look like this:

    void create_button( int x, int y, const char *text, function callback_func );

    Whenever the button is clicked, callback_func will be invoked. Exactly what callback_func does depends on the button; this is why allowing the create_button function to take a function pointer is useful.

    Function Pointer Syntax

    The syntax for declaring a function pointer might seem messy at first, but in most cases it’s really quite straight-forward once you understand what’s going on. Let’s look at a simple example:

    void (*foo)(int);

    In this example, foo is a pointer to a function taking one argument, an integer, and that returns void. It’s as if you’re declaring a function called “*foo”, which takes an int and returns void; now, if *foo is a function, then foo must be a pointer to a function. (Similarly, a declaration like int *x can be read as *x is an int, so x must be a pointer to an int.)

    The key to writing the declaration for a function pointer is that you’re just writing out the declaration of a function but with (*func_name) where you’d normally just put func_name.

    Reading Function Pointer Declarations

    Sometimes people get confused when more stars are thrown in:

    void *(*foo)(int *);

    Here, the key is to read inside-out; notice that the innermost element of the expression is *foo, and that otherwise it looks like a normal function declaration. *foo should refer to a function that returns a void * and takes an int *. Consequently, foo is a pointer to just such a function.

    Initializing Function Pointers

    To initialize a function pointer, you must give it the address of a function in your program. The syntax is like any other variable:

    #include <stdio.h> void my_int_func(int x) {     printf( "%d\n", x ); }  int main() {     void (*foo)(int);     /* the ampersand is actually optional */     foo = &my_int_func;      return 0; }

    (Note: all examples are written to be compatible with both C and C++.)

    Using a Function Pointer

    To call the function pointed to by a function pointer, you treat the function pointer as though it were the name of the function you wish to call. The act of calling it performs the dereference; there’s no need to do it yourself:

    #include <stdio.h> void my_int_func(int x) {     printf( "%d\n", x ); }   int main() {     void (*foo)(int);     foo = &my_int_func;      /* call my_int_func (note that you do not need to write (*foo)(2) ) */     foo( 2 );     /* but if you want to, you may */     (*foo)( 2 );      return 0; }

    Note that function pointer syntax is flexible; it can either look like most other uses of pointers, with & and *, or you may omit that part of syntax. This is similar to how arrays are treated, where a bare array decays to a pointer, but you may also prefix the array with & to request its address.

    Function Pointers in the Wild

    Let’s go back to the sorting example where I suggested using a function pointer to write a generic sorting routine where the exact order could be specified by the programmer calling the sorting function. It turns out that the C function qsort does just that.

    From the Linux man pages, we have the following declaration for qsort (from stdlib.h):

     void qsort(void *base, size_t nmemb, size_t size,             int(*compar)(const void *, const void *));

    Note the use of void*s to allow qsort to operate on any kind of data (in C++, you’d normally use templates for this task, but C++ also allows the use of void* pointers) because void* pointers can point to anything. Because we don’t know the size of the individual elements in a void* array, we must give qsort the number of elements, nmemb, of the array to be sorted, base, in addition to the standard requirement of giving the length, size, of the input.

    But what we’re really interested in is the compar argument to qsort: it’s a function pointer that takes two void *s and returns an int. This allows anyone to specify how to sort the elements of the array base without having to write a specialized sorting algorithm. Note, also, that compar returns an int; the function pointed to should return -1 if the first argument is less than the second, 0 if they are equal, or 1 if the second is less than the first.

    For instance, to sort an array of numbers in ascending order, we could write code like this:

    #include <stdlib.h>  int int_sorter( const void *first_arg, const void *second_arg ) {     int first = *(int*)first_arg;     int second = *(int*)second_arg;     if ( first < second )     {         return -1;     }     else if ( first == second )     {         return 0;     }     else     {         return 1;     } }  int main() {     int array[10];     int i;     /* fill array */     for ( i = 0; i < 10; ++i )     {         array[ i ] = 10 - i;     }     qsort( array, 10 , sizeof( int ), int_sorter );     for ( i = 0; i < 10; ++i )     {         printf ( "%d\n" ,array[ i ] );     }  }

    typedefs are used to make code using function pointers somewhat more readable. –>

    Using Polymorphism and Virtual Functions Instead of Function Pointers (C++)

    You can often avoid the need for explicit function pointers by using virtual functions. For instance, you could write a sorting routine that takes a pointer to a class that provides a virtual function called compare:

    class Sorter {     public:     virtual int compare (const void *first, const void *second); };  // cpp_qsort, a qsort using C++ features like virtual functions void cpp_qsort(void *base, size_t nmemb, size_t size, Sorter *compar);

    inside cpp_qsort, whenever a comparison is needed, compar->compare should be called. For classes that override this virtual function, the sort routine will get the new behavior of that function. For instance:

    class AscendSorter : public Sorter {      virtual int compare (const void*, const void*)     {         int first = *(int*)first_arg;         int second = *(int*)second_arg;         if ( first < second )         {             return -1;         }         else if ( first == second )         {             return 0;         }         else         {             return 1;         }     } };

    and then you could pass in a pointer to an instance of the AscendSorter to cpp_qsort to sort integers in ascending order.

    But Are You Really Not Using Function Pointers?

    Virtual functions are implemented behind the scenes using function pointers, so you really are using function pointers–it just happens that the compiler makes the work easier for you. Using polymorphism can be an appropriate strategy (for instance, it’s used by Java), but it does lead to the overhead of having to create an object rather than simply pass in a function pointer.

    Function Pointers Summary

    Syntax

    Declaring

    Declare a function pointer as though you were declaring a function, except with a name like *foo instead of just foo:

    void (*foo)(int);

    Initializing

    You can get the address of a function simply by naming it:

    void foo(); func_pointer = foo;

    or by prefixing the name of the function with an ampersand:

    void foo(); func_pointer = &foo;

    Invoking

    Invoke the function pointed to just as if you were calling a function.

    func_pointer( arg1, arg2 );

    or you may optionally dereference the function pointer before calling the function it points to:

    (*func_pointer)( arg1, arg2 );

    Benefits of Function Pointers

    • Function pointers provide a way of passing around instructions for how to do something
    • You can write flexible functions and libraries that allow the programmer to choose behavior by passing function pointers as arguments
    • This flexibility can also be achieved by using classes with virtual functions

    I am grateful to Alex Hoffer and Thomas Carriero for their comments on a draft of this article.

    Comments

    10 วืธีดูแล ฮาร์ดดิสของคุณ

    โปรแกรมที่คุณใช้งานอยู่เป็นประจำทำงานช้าลงหรือเปล่า? หรือพีซีอายุใช้งาน 4 เดือนของคุณมีอาการงอแงหรือไม่? ต่อไปนี้คือวิธีการแก้ปัญหาและเพิ่มความเร็วให้กับฮาร์ดดิสก์ตัวเก่งของคุณ

    การเป็นเจ้าของและใช้งานฮาร์ดดิสก์โดยไม่เคยสแกนตรวจสอบ ก็เหมือนกับการมีรถยนต์คันหรูที่เอาแต่ขับอย่างเดียวไม่เคยเข้าศูนย์บริการ ซึ่งทิปต่อไปนี้สามารถกระทำได้โดยไม่ต้องลงแรงมากนัก เพียงแค่เจียดเวลาสักนิดในการปฏิบัติตาม ทั้งนี้ก็เพื่อให้ฮาร์ดดิสก์ของคุณกลับมามีชีวิตชีวาเหมือนใหม่และทำงานได้อ ย่างเต็มประสิทธิภาพ

    1. สแกนหาไวรัส

    จัดเป็นข้อควรปฏิบัติที่สำคัญเป็นอันดับต้นๆ ที่คุณควรให้ความสำคัญ และหมั่นทำเป็นประจำ เราคงไม่ต้องบอกคุณแล้วว่าไวรัสในปัจจุบันนั้นมีฤทธิ์เดชร้ายแรงแค่ไหน เอาเป็นว่าให้คุณลองนึกถึงตอนที่ไฟล์ข้อมูลสำคัญในฮาร์ดดิสก์ถูกทำลาย หรือเสียหายเพียงแค่เพราะว่าคุณไม่ได้ติดตั้งโปรแกรมป้องกันไวรัสเอาไว้ในเค รื่อง หรือใครที่ติดตั้งเอาไว้แล้วก็ไม่ควรชะล่าใจ ลองตรวจสอบวันที่ของฐานข้อมูลไวรัส (Virus Definition) ถ้าเก่าเกินกว่า 30 วันก็ควรรีบทำการอัปเดตให้เป็นเวอร์ชันปัจจุบัน เพื่อการป้องกันที่เต็มประสิทธิภาพ จากนั้นทำการสแกนฮาร์ดดิสก์ทั้งหมดที่ติดตั้งอยู่ในระบบ ถ้าเป็นไปได้ แนะนำให้กำหนดตารางเวลาในการสแกนเป็นประจำทุกสัปดาห์

    2. ปัดกวาดไฟล์หรือขยะที่ไม่ได้ใช้

    ยิ่งใช้งานเครื่องมานานเท่าใด ไฟล์ข้อมูลเก่าๆ หรือขยะในเครื่องก็จะเพิ่มพูนมากขึ้นเท่านั้น ไม่ว่าจะเป็นไฟล์ข้อมูลเก่า โปรแกรมเก่า ไฟล์ชั่วคราวที่หลงเหลือจากการท่องอินเทอร์เน็ต รวมทั้งไฟล์ที่ตกค้างจากการติดตั้งโปรแกรมในโฟลเดอร์เก็บไฟล์ชั่วคราวของวิน โดวส์ ซึ่งวิธีการง่ายๆ ในการกำจัดไฟล์ขยะเหล่านี้ก็คือการใช้ยูทิลิตี้ Disk Cleanup ของวินโดวส์หรือจากออปชันทำความสะอาดไฟล์ในโปรแกรม IE โดยตรง (Tools -> Internet Options)

    3. กำจัดขยะในซอกหลืบ

    แม้ว่าคุณจะทำการลบไฟล์ขยะด้วยตัวเองไปแล้ว แต่ก็ยังอาจมีเศษขยะที่มองไม่เห็นตกค้างอยู่ในฮาร์ดดิสก์ของคุณอีกมากมาย โดยเศษขยะในที่นี้หมายรวมถึงบรรดาสปายแวร์ หรือแอดแวร์ต่างๆ ด้วย ซึ่งวิธีการตรวจสอบหาขยะเหล่านี้จำเป็นต้องใช้เครื่องมือพิเศษคือโปรแกรมอย่ างเช่น Ad-aware หรือ Spybot Search & Destroy ที่หาดาวน์โหลดได้ฟรีจากอินเทอร์เน็ต ที่สำคัญคืออย่าลืมอัปเดตฐานข้อมูลให้กับโปรแกรมดังกล่าวก่อนเริ่มทำการสแกน ระบบด้วย

    4. หมั่นใช้สแกนดิสก์

    เมื่อใดก็ตามที่พื้นที่เก็บข้อมูลในฮาร์ดดิสก์เกิดบกพร่องเสียหาย เรามักจะใช้คำแทนจุดบกพร่องนั้นๆ ว่า “Bad Sector” ซึ่งมีความหมายว่าบริเวณพื้นผิวของจานแม่เหล็กเกิดความเสียหายจนไม่สามารถทำ การอ่านข้อมูลได้ ซึ่งวิธีการแก้ไขนั้นคือการใช้ยูทิลิตี้ Scandisk ของวินโดวส์ในการตรวจสอบหาจุดที่เกิด Bad Sector และย้ายข้อมูลที่อยู่ในบริเวณนั้นๆ ไปยังเซกเตอร์อื่นๆ ที่ปกติ ทั้งนี้เพื่อความปลอดภัยของไฟล์ข้อมูล โดยในหน้าต่างยูทิลิตี้ Scandisk นั้นให้คุณเลือกออปชัน Scan for and attempt recovery of bad sectors ด้วยก่อนเริ่มทำการสแกน นอกจากนี้หากคุณใช้ระบบปฏิบัติการ Windows 98/Me แนะนำให้ปิดการทำงานของสกรีนเซฟเวอร์ก่อนเริ่ม Scandisk ด้วย

    5. จัดเรียงข้อมูลให้เป็นระเบียบ

    โปรแกรม Defragmenter ที่ไม่ต้องเสียเวลาหาให้ไกลเพราะมีอยู่ในวินโดวส์ทุกเวอร์ชันแล้วนั้น จะช่วยในการจัดเรียงข้อมูลที่ถูกเขียนลงฮาร์ดดิสก์อย่างสะเปะสะปะให้มีระเบี ยบ และเป็นชิ้นเป็นอันมากขึ้น ทั้งนี้ก็เพื่อให้หัวอ่านฮาร์ดดิสก์ไม่ต้องทำงานหนัก และใช้เวลาในการอ่านข้อมูลสั้นลง และโปรดอย่าเข้าใจผิดคิดว่าโปรแกรมจะจับไฟล์ในโฟลเดอร์ของคุณไปสลับสับเปลี่ ยน หรือเรียงไว้ในโฟลเดอร์อื่นๆ จนหาไม่เจอ เพราะการ Defrag นั้นจะทำการจัดเรียงไฟล์ข้อมูลบนดิสก์เท่านั้น ไม่ส่งผลกระทบต่อโครงสร้างการเก็บไฟล์ในวินโดวส์แต่อย่างใด

    6. เก็บทุกอย่างให้เข้าที่

    ขั้นตอนนี้จะเรียกว่าเป็นวินัยส่วนตัวก็ว่าได้ เพราะไม่ว่าจะเป็นลิ้นชักตู้เสื้อผ้าหรือฮาร์ดดิสก์ ก็ล้วนต้องการระบบระเบียบในการจัดเก็บที่ดีด้วยกันทั้งนั้น ฟังดูอาจเป็นงานที่น่าเบื่อ แต่ถ้าฝึกให้เป็นนิสัยตั้งแต่แรกก็แทบจะไม่ต้องทำอะไรเลย ส่วนใครที่ยังเก็บไฟล์ทุกชนิดทุกประเภทไม่ว่าจะเป็นไฟล์เอกสารเวิร์ด ไฟล์รูปภาพ ไฟล์วิดีโอ ไฟล์เพลง ฯลฯ ปนกันมั่วไว้ในโฟลเดอร์เดียวกัน เตรียมตัวเตรียมใจกับเรื่องปวดหัวในการค้นหาไฟล์เมื่อต้องการใช้งานให้ดี แต่ถ้าไม่อยาก … ก็สละเวลาจัดการจัดไฟล์ลงโฟลเดอร์ให้เรียบร้อยเสียตั้งแต่วันนี้

    7. แบ็กอัปข้อมูล

    ไม่มีฮาร์ดดิสก์รุ่นไหน ยี่ห้อใด ที่จะมีอายุยืนยาวอยู่กับคุณไปตลอดกาล แต่ถึงแม้ในที่สุดฮาร์ดดิสก์ของคุณจะหมดอายุขัย ก็ไม่ได้หมายความว่าข้อมูลทั้งหมดที่เก็บอยู่ในนั้นจะสูญหายไปด้วย เพียงแต่สิ่งที่คุณควรต้องหมั่นทำเป็นกิจวัตร ก็คือการแบ็กอัปไฟล์ข้อมูลสำคัญๆ เก็บไว้ในฟล๊อบปี้ดิสก์ แผ่นซีดี ดีวีดี หรืออื่นๆ ที่ไม่ใช่ฮาร์ดดิสก์ตัวที่ใช้งานอยู่ หรือถ้าที่กล่าวมานั้นมันยุ่งยากหรือทำให้คุณลำบากเกินไป แนะนำให้ใช้ทัมป์ไดรฟ์ที่ปัจจุบันมีราคาแสนถูก และถ้าไม่ลำบากเงินในกระเป๋าจนเกินไปเลือกรุ่นที่จุ 128MB ขึ้นไปจะดีมาก

    8. เทขยะอย่าให้เหลือไฟล์ตกค้าง

    เมื่อคุณกดปุ่ม Delete เพื่อลบไฟล์ ซึ่งในทางปฏิบัติดูเหมือนว่าไฟล์ข้อมูลของคุณจะถูกลบออกไป แต่ในทางทฤษฎีนั้น ไฟล์ของคุณจะยังไม่ถูกลบออกไปจริงๆ เพียงแต่วินโดวส์จะทำเครื่องหมายไว้ในพื้นที่ส่วนนั้นๆ ว่าเป็นที่ว่าง และเมื่อใดที่มีการเขียนไฟล์ข้อมูลก็สามารถเขียนทับตำแหน่งนั้นๆ ได้ นอกจากนี้วินโดวส์จะนำไฟล์ที่คุณลบไปใส่ไว้ในถังขยะ (Recycle Bin) เผื่อกรณีที่คุณเกิดเปลี่ยนใจหรือตัดสินใจพลาด หากใครช่างสังเกตจะพบว่าแม้จะลบไฟล์ข้อมูลไปแล้วแต่พื้นที่ว่างในอาร์ดดิสก์ นั้นไม่ได้เพิ่มขึ้นแต่อย่างใด ทั้งนี้ก็เพราะข้อมูลนั้นๆ ยังนอนรอชะตากรรมอยู่ในถังขยะ (Recycle Bin) นั่นเอง ดังนั้นหากคุณมั่นใจว่าไม่ใช้งานแล้ว หรือไม่ต้องการให้ใครมาแอบคุ้ยถังขยะเอาข้อมูลส่วนตัวของคุณไป แนะให้คลิกขวาที่ไอคอน Recycle Bin แล้วเลือกคำสั่ง Empty Recycle Bin เพื่อกำจัดขยะในถังให้สิ้นซาก

    9. แบ่งพาร์ทิชันเพื่อเก็บข้อมูล

    ฮาร์ดดิสก์โดยทั่วไปที่ออกมาจากโรงงานนั้น จะไม่มีการแบ่งพาร์ทิชันเอาไว้ หรือพูดให้เข้าใจง่ายๆ คือซื้อ 80GB ก็จะได้ไดรฟ์ C: ความจุ 80GB มาใช้งาน แต่ถ้าจะให้ดี แนะนำให้คุณทำการแบ่งฮาร์ดดิสก์ออกเป็นส่วนๆ หรือที่เรียกว่าการแบ่งพาร์ทิชันนั่นเอง ยกตัวอย่างเช่น ฮาร์ดดิสก์ 80GB นำมาแบ่งเป็น 2 พาร์ทิชัน พาร์ทิชันละ 40GB ซึ่งคุณก็จะได้ไดรฟ์มาใช้งาน 2 ไดรฟ์คือไดรฟ์ C: และไดรฟ์ D: ซึ่งการแบ่งพาร์ทิชันนอกจากจะช่วยลดภาระของหัวอ่านและเพิ่มความเร็วในการทำง านของฮาร์ดดิสก์แล้ว คุณยังสามารถแยกไฟล์สำคัญๆ มาเก็บไว้ในไดรฟ์แยกต่างหากจากไดรฟ์ที่ติดตั้งวินโดวส์ ซึ่งอาจโดนไวรัสเล่นงานจนเสียหายได้อีกด้วย ซึ่งการแบ่งพาร์ทิชันนั้นคุณสามารถทำได้ในขณะที่ติดตั้ง Windows XP เลย แต่ถ้าไม่ได้ทำก็ไม่เป็นไร เพราะปัจจุบันมีโปรแกรมสำหรับการนี้มากมาย ซึ่งที่นิยมใช้กันมากที่สุดได้แก่โปรแกรม Partition Magic

    10. เลือกความเร็วให้เหมาะกับงาน

    วิธีการที่ผ่านมานั้นสามารถช่วยให้ฮาร์ดดิสก์ของคุณสามารถทำงานได้เร็วขึ้นไ ด้อีกเล็กน้อย อย่างไรก็ดี หากคุณกำลังมองหาหรือตัดสินใจซื้อฮาร์ดดิสก์ใหม่ แนะนำให้พิจารณาเลือกรุ่นความเร็วที่เหมาะสมกับลักษณะงานที่คุณต้องการใช้งา น เช่น เลือกรุ่นที่มีความเร็วในการหมุนจานแม่เหล็ก 5,400 RPM (รอบ/นาที) ที่มีราคาถูกถ้าคุณใช้เพียงโปรแกรมทั่วๆ ไปเช่น เล่นอินเทอร์เน็ต รับ-ส่งอีเมล หรือพิมพ์งานด้วยโปรแกรมเวิร์ด หรือถ้างานของคุณเกี่ยวกับการตกแต่งภาพถ่าย เล่นเกม ก็อาจเลือกซื้อรุ่น 7200 RPM หรืออาจจะเป็น 10,000 RPM เลยก็ได้หากทำงานประเภทตัดต่อวิดีโอเป็นหลัก ซึ่งฮาร์ดดิสก์ที่มีความเร็วในการหมุนจานแม่เหล็กสูงและมีขนาดของแคชภายในมา กจะช่วยเพิ่มความเร็วในการทำงานให้กับคุณมากยิ่งขึ้น ลองทำดูนะครับ

    Comments

    Debugging in Visual Studio: Avoid Stepping Into Common Functions

    If you do a lot of debugging (and who doesn’t?), you’ve probably encountered many function calls that you’d really like to step into. Sometimes this is easy: you just hit F11 and waltz right in to the function you care about. Sometimes, however, this is not so easy. In particular, if the code makes function calls to produce arguments for the function:

    call_of_interest( new blah( "a", 1 ), my_vector.begin(), my_vector.end() );

    Suddenly, stepping into the function is a huge hassle; first, you’re going to need to step through all of the function calls that produce its arguments. How inconvenient; especially when you are absolutely certain that you don’t need to–after all, how likely is it that calling my_vector.begin() or your specialized operator new have anything to do with how the function of interest works? Probably not a lot–and if they do, you’ll almost certainly be able to tell that they’re causing problems by looking at the arguments passed in the function once you’ve stepped into it.

    Sure, you could just set a break point inside the function you’re interested in and then hit F5 to continue. But doing that is tedious, especially if you’re only interested in one invocation of that function or if you want to step into a lot of functions. In those cases, it’s much nicer to simply avoid many of the common functions, such as library functions, that are often invoked to produce inputs for functions.

    Various versions of Visual C++ allow you different amounts of control over how you can skip stepping into certain functions, and none of the implementations of this feature are documented or offically supported by Microsoft.

    Please note that I cannot guarantee that this functionality will work for you any more than Microsoft will provide that guarantee. Some of the examples require editing the registry; make sure to keep a backup before proceeding!

    Visual Studio 6

    In Visual Studio 6.0, you can edit the file autoexp.dat (which has several other useful features in different versions of Visual Studio), located in “%ProgramFiles%\Microsoft Visual Studio\Common\MSDev98\Bin”. In autoexp.dat, different directives appear under different sections named with “[secname]“.

    To avoid stepping into certain functions, simply add a section, [ExecutionControl], and place under it directives that match either a function name (”myMethod”), a class followed by a method name (”class::method”), or a class followed by a * to skip stepping into all methods of that class (”class::*”).

    You may need to qualify your classes with their namespaces. Here is an example demonstrating how to avoid stepping into operators; add to autoexp.dat and restart Visual Studio to reload the file:

    [ExecutionControl] operator new=NoStepInto

    or

    MyClass::operator==NoStepInto

    Visual Studio 7

    The debuggers for Visual Studio 7.0 and later read their NoStepInto configuration information from within the registry. You should backup your registry before playing with it. To edit your registry, you’ll need to use regedit (just click on Start, go to the run menu, and type in “regedit” and hit enter).

    The registry key you’ll want to change is

    HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\7.0\NativeDE\StepOver

    or

    HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\7.1\NativeDE\StepOver

    if you’re using Visual Studio 2003. You may need to create the keys Native DE its child StepOver. You can do so in RegEdit by navigating to the “VisualStudio\7.0″ key and right-clicking and selecting “New->Key”.

    Inside the StepOver key, you can create new string values that contain regular expressions to match against function names; you can also include a number for the priority of each expression. Using priority, you can set multiple expressions so that you step into a specific function for a class, but skip stepping into all others. For instance, you might have a string with the name and value

    10 string\:\:append.*=StepInto 20 string\:\:.*=NoStepInto

    In regular expression parlance, “.*” simply means “match any number of characters”. For instance, the second expression matches any function in the string class. (Note that 10 is the name, and everything following it is the value to enter for that name.)

    Notice that you need to escape the colons because they have special meaning in Visual Studio regular expressions. Similarly, if you wish to match against templates, you need to escape the open and close brackets:

    10 .*\<.*\>=StepInto

    Which should always step into any template function.

    Visual Studio 8

    Visual Studio 8 has the same basic functionality as Visual Studio 7 with a few improvements. In addition, you should be aware that the key you must modify is now:

    HKEY_LOCAL_MACHINE\Software\Microsoft\VisualStudio\8.0\NativeDE\StepOver

    You’ll need an account with Admin privileges to set this up.

    In addition to the ability to match regular expressions provided in Visual Studio 7, a few new escape sequences have been added to the regular expressions you can use, including “\funct” to match the name of a function, “\scope” to match a scope (e.g., std::my_namespace::etc::so::forth), and “\oper”, which matches any operator. Note that when you use these escape sequences, you must indicate the end of the escape sequence with a colon. So you might write “\scope:” followed by the rest of the expression to match.

    For instance, you could avoid stepping into any overloaded operators by using the expression

    \scope:operator\oper:=NoStepInto

    which might be useful if you are using a lot of classes with simple operators whose behaviors you can trust to be correct.

    Comments

    Debugging Strategies, Tips, and Gotchas

    Use the Right Tools

    It should go without saying that you should always be using the best tools available; if you’re hunting a segmentation fault, you want use a debugger. Anything less than that is unnecessary pain. If you’re dealing with bizarre memory issues (or hard-to-diagnose segfaults), use Valgrind on Linux or Purify for Windows.

    Debug the Problem

    My first instinct when debugging is to ask, “is my code too complicated?” Sometimes we’ll all come up with a solution to a problem only to realize that the solution is really hard to get working. So hard, in fact, that it might be easier to solve the original problem in another way. When I see someone struggling to debug a complex mass of code, my first thought is to ask whether there’s a cleaner solution. Often, once you’ve written bad code, you have a much better idea of what the good code should look like. Remember that just because you’ve written it doesn’t mean you should keep it!

    The trick is always to decide if you’re trying to solve the original problem or to solve a particular choice of solution. If it’s the solution, then it’s possible that your problems don’t stem from the problem at all–maybe you’re over-thinking the problem or trying a wrong-headed approach. For instance, I recently needed to parse a file and import some of the data into an access database to prototype an analysis tool. My first instinct was to write a Ruby script that interfaced directly with Access and inserted all of the data into the database using SQL queries. As I looked at the support for doing this in Ruby, I quickly realized that my “solution” to the problem was going to take a lot longer than the problem should have taken. I reversed course, wrote a script that just output a comma-separated value file, and had my data fully imported in about an hour.

    An Aside on Bad Code

    People are often reluctant to throw out bad code that they’ve written and re-write it. One reason is that code that’s written feels like completed work, and throwing it out feels like going backward. But when you’re debugging, rewriting the code can seem more appealing because you’re probably saving yourself time spent debugging by spending a bit more time coding. The trick is to avoid throwing out the baby with the bath water–remove the bad code, don’t start the whole program over again (unless it’s rotten to the core). Rewrite only the parts that really need it.

    Your second draft will probably be both cleaner and less buggy than the first, and you may avoid issues like having to go back later and rewrite the code just so that you can figure out how it was supposed to work.

    On the other hand, when you’re absolutely sure that code that looks horrible is the right code to use, you’ll want to explain your rationale in a comment so someone (or you) doesn’t come back later and hack it apart.

    Minimize Potential Problems by Avoiding Copy-Paste Syndrome

    Nothing is more frustrating than to realize that you’re debugging the same problem multiple times. Whenever you copy and paste large chunks of code, you leave yourself open to the unknown demons inhabiting that code. If you haven’t debugged it yet, odds are that you’re going to have to. And if you forgot that you copied that code somewhere else, you’re probably going to be debugging the same code more than once. (There are other reasons to avoid copy-paste syndrome; even worse than debugging the same code twice is finding the bug in only one piece of copy-pasted code.)

    The best way to avoid copy-paste syndrome is to use functions to encapsulate as much of your repeat code as possible. Some things can’t easily be avoided in C++; you’re going to write a lot of for loops no matter what you’re doing, so you can’t abstract away the whole looping process. But if you have the same loop body in multiple places, that might be a sign that it should be pulled into a separate function. As a bonus, this makes other future changes to your code easier and allows you to reuse the function without having to find a chunk of code to copy.

    When to Copy Code

    Although copying code is usually dangerous, there are times when it may be the best choice. For instance, if you need to make small, irregular tweaks to a chunk of code, but the bulk of it needs to remain the same, then copying, pasting, and careful editing might make sense. By copying the code, you avoid the chance that you introduce new bugs by mistyping the code. It should go without saying that you should have carefully debugged the code you plan to copy before you do so! (But I said it, and I’m not even paid by the word.)

    The second reason to copy code is when you have long variable names and a bad text editor. The best solution is generally to get a better text editor with keyword completion.

    Make Big Problems Found Late Small Problems Found Early

    Testing Early

    One advantage of pulling out code and putting it into functions is that you can then separately test those functions. This means that you can sometimes avoid debugging big problems caused by simple bugs in the original functions. Nothing is more frustrating than writing perfectly correct code given how you thought a function (or a class) worked, only to find out that it doesn’t work that way. This kind of unit testing requires some discipline and a good sense of what can go wrong with your code.

    Another advantage of early testing–especially if you write some or all of your tests up-front, before the code–is that you’ll pay more attention to the specific interface to your class. If you can’t test error handling because you’re using an assert instead of an exception or error code, that might be an indication that you should be using some form of error reporting rather than asserts. (Of course, this won’t always be the case–there are times when you just want to verify that your asserts work correctly.) Beyond error-reporting, writing tests is the first time you can test your code’s interface, which is often as valuable as testing that the code works. If the interface to your class is clunky, or your functions have impossible-to-understand, let alone remember, argument lists, it might be time to rethink what you’re doing before you write the underlying code.

    Compiler Warnings

    Many potential bugs can be caught by your compiler. Some such errors include using uninitialized variables, accidentally replacing a check for equality with an assignment in a conditional, or, in C++, errors related to mixing types such as pointers and ints. Since this has been covered before, I suggest checking out the article why you should pay attention to compiler warnings.

    Printf Lies

    Because I/O is usually buffered by the operating system, using printfs in your debugging process is risky. When possible, use a debugger to figure out what lines of code are the problem rather than narrowing in on the issue with code littered by printfs and cout. (And beware the stray printf that slips in during debugging and, ahem, slips into the final version.)

    Flush Output

    Nevertheless, there are times when you actually need to keep track of some state in a log file–perhaps you simply have too much data that you need to collect, and you need the data from program start-up to the moment the bug occurs. To ensure you collect all of the data, be sure to flush it: you can use fflush in C, or output an endl in C++. fflush takes the FILE pointer you are writing into; for instance, to flush stderr, you would write fflush( stderr);

    Check Your Helper Functions

    This should be obvious, but it’s easy to forget in the heat of the moment: always verify that your helper functions work, especially when seemingly simple code is failing. When possible, isolate each helper function and test it individually, then test each of its helper functions. There’s nothing more frustrating than realizing that your original logic was right, but your assumption about a helper function was wrong.

    When Cause Doesn’t Immediately Lead to Effect

    Even if a helper function doesn’t seem to be the immediate source of a problem, its side effects may cause latent problems. For instance, if you have a helper function that can return NULL and you pass its output into a library function dealing with C-strings, you may see the immediate cause as dereferencing a NULL pointer in strcat, but the real cause was the buggy function you wrote earlier (or the fact that you didn’t check for NULL after calling it).

    Remember That Code May Be Used in More Than One Place

    Another problem that can come up when debugging is that you discover the problem appears to be the result of a particular function call, set a break point inside that function, and then discover that there are hundreds of calls to the same function throughout the code. Or worse, you don’t notice this until wasting hours of time trying to figure out what’s going on or thinking that the reason for the problem is that the function is being called incorrectly. (When, in fact, it’s being called correctly but with different arguments than the point at which the bug occurred.)

    The most obvious solution is to check the call stack after hitting a break point or to set the breakpoint right before the call that is actually the problem. Unfortunately, this doesn’t always help if the same call works thousands of times but fails on the 1001st call. Potential solutions include counting the number of calls to a function and then stepping through that many breakpoints set inside the function, or using a static variable as a counter.

    Comments

    Answers.com Releases Free AnswerTips Tool for Websites and Blogs

    5.jpgHTMLPrimer.com

    Answers Corporation, creator of Answers.com, began offering its latest webmaster tool, AnswerTips, to websites and blogs. AnswerTips allow sites to provide visitors with instant access to Answers.com’s comprehensive information on four million topics, without having them leave the site or blog.

    AnswerTips are a unique site feature and provide instant background information when a site’s visitor double-clicks a word on an “AnswerTips-enabled” site. Activate an AnswerTip, and without leaving the page, a small information bubble opens, providing definitions, explanations, biographies, historical background and countless other types of relevant information. Unlike other offerings, the AnswerTip provides content on the spot, rather than a number of related search links to follow.

    The free content in an AnswerTip comes from Answers.com’s extensive database of information consisting of four million topics that are licensed from over 120 authoritative reference publications and other resources.

    “It’s about immediate gratification for anyone reading your blog,” said Gil Reich, Vice President of Product Management for Answers Corporation. “Readers get the information they want while remaining engaged in reading your content. Fewer distractions create a better experience on a website, and they save the writer time providing background information on subjects and terminology that his or her audience might be unfamiliar with.”

    The technology behind AnswerTips yields a more productive user experience than a simple dictionary site or program. The tool has the ability to crack acronyms, quote stock prices and retrieve biographical information on everyone from rock musicians to political figures. Using patented technology, AnswerTips scans the text surrounding a word to retrieve the most appropriate information. For instance, it can differentiate between ‘Paris Hilton’ and ‘plaster of Paris’ when only the word “Paris” is double-clicked.

    Currently, AnswerTips technology has been implemented on Answers.com, WikiAnswers (wiki.answers.com) and CBSNews.com, and was beta tested on a few select blogs and websites, including:

    A VC (http://avc.blogs.com/a_vc/2007/02/this_blog_is_an.html) CleverClogs (http://www.cleverclogs.org/2006/12/instant_onsite_.html)

    California Polytechnic State University Library ( http://www.library.calpoly.edu)

    Write Technology (http://www.writetech.net/2007/02/answer_tips.html)

    In addition to being available directly from Answers.com at: (http://www.answers.com/main/answertip_landing.jsp)

    AnswerTips are also available within the widget gallery of TypePad at: http://www.sixapart.com/typepad/widgets/publishing-tools/answertips.html.

    Other free tools from Answers.com include 1-Click AnswersTM (Windows, Mac OS X), which enables AnswerTips within all of your desktop applications, and the Firefox extension (Windows, Mac OS X and Linux). For more about Answers.com webmaster tools, visit http://www.answers.com/main/webmasters.jsp.

    About Answers Corporation

    Answers Corporation (NASDAQ: ANSW) operates the award-winning Answers.com(TM) information portal, delivering comprehensive content on four million topics spanning health, finance, entertainment, business and more. Content includes over 120 licensed titles from leading publishers such as Houghton Mifflin Riverdeep Group PLC, Barron’s, Encyclopedia Britannica, All Media Guide and others; original articles written by Answers.com’s editorial team; community-contributed articles from Wikipedia; and user-generated questions & answers from Answers.com’s industry-leading WikiAnswersTM (wiki.answers.com). Founded in 1999 by CEO Bob Rosenschein, Answers.com can be launched directly from within Internet Explorer 7, Firefox and Opera browsers, and its service is integrated into sites like Amazon.com’s A9.com, The New York Public Libraries’ homeworkNYC.org, The New York Times, CBSNews.com and others. Answers.com is also available for mobile devices at mobile.answers.com. For investment information, visit ir.answers.com.

    Comments

    « Previous entries