Ideally, you should use a static vertex buffer object, with a stride of 32 bytes (or a multiple of 32). The most optimal draw call is glDrawRangeElements. Indices should be unsigned shorts.
Vertex data should be stored as GLfloat, GLshort, or GLubyte, as these formats are supported by most GPUs.
ATI performance tuning(page 10)
Using VBOs(page 14)
UNCONFIRMED: Internally on all GPUs, vertices contain four values: xyzw. By default, if a w value is not provided, it is set to 1. Theoretically, if you perform this padding in your application before sending the data to OpenGL, you can improve performance.
Shaders expect all data to be float, thus it is expected that on programmable GPUs specifying all vertex data as float will improve performance by limiting data conversion.