Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.naic.edu/~phil/hardware/nvidia/doc/CUFFT_Library_2.3.pdf
Дата изменения: Sun Nov 1 19:12:51 2009
Дата индексирования: Tue Nov 24 15:38:36 2009
Кодировка:
Поисковые слова: http astrokuban.info astrokuban

CUDA

CUFFT Library

PG-00000-003_V2.3 June, 2009

CUFFT Library

PG-00000-003_V2.3

Confidential Information Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice This source code is subject to NVIDIA ownership rights under U.S. and international Copyright laws. This software and the information contained herein is PROPRIETARY and CONFIDENTIAL to NVIDIA and is being provided under the terms and conditions of a NonDisclosure Agreement. Any reproduction or disclosure to any third party without the express written consent of NVIDIA is prohibited. NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. U.S. Government End Users. This source code is a "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of "commercial computer software" and "commercial computer software documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.72021 through 227.72024 (JUNE 1995), all U.S. Government End Users acquire the source code with only those rights set forth herein. Trademarks NVIDIA, CUDA, and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright © 20062009 by NVIDIA Corporation. All rights reserved.

NVIDIA Corporation

Table of Contents

CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CUFFT Types and Definitions . Type cufftHandle . . . . . . . Type cufftResult . . . . . . . . Type cufftReal . . . . . . . . . Type cufftDoubleReal . . . . Type cufftComplex . . . . . . Type cufftDoubleComplex . CUFFT Transform Types . . . CUFFT Transform Directions CUFFT API Functions . . . . . Function cufftPlan1d() . . Function cufftPlan2d() . . Function cufftPlan3d() . . Function cufftDestroy() . Function cufftExecC2C() Function cufftExecR2C() Function cufftExecC2R() Function cufftExecZ2Z() Function cufftExecD2Z() Function cufftExecZ2D() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 2 3 3 3 3 4

.5 .6 .7 .7 .8 .8 .9 10 11 12 12 15 15 16 16 17 18

Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 CUFFT Code Examples . . . . . . . . . . . . 1D Complex-to-Complex Transforms 1D Real-to-Complex Transforms . . . 2D Complex-to-Complex Transforms 2D Complex-to-Real Transforms . . . 3D Complex-to-Complex Transforms

PG-00000-003_V2.3

NVIDIA

iii

CUFFT Library
This document describes CUFFT, the NVIDIA® CUDATM (compute unified device architecture) Fast Fourier Transform (FFT) library. The FFT is a divideandconquer algorithm for efficiently computing discrete Fourier transforms of complex or realvalued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floatingpoint power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation. FFT libraries typically vary in terms of supported transform sizes and data types. For example, some libraries only implement Radix2 FFTs, restricting the transform size to a power of two, while other implementations support arbitrary transform sizes. This version of the CUFFT library supports the following features: 1D, 2D, and 3D transforms of complex and realvalued data Batch execution for doing multiple 1D transforms in parallel 2D and 3D transform sizes in the range [2, 16384] in any dimension 1D transform sizes up to 8 million elements Inplace and outofplace transforms for real and complex data Doubleprecision transforms on compatible hardware (GT200 and later GPUs)

CUFFT Types and Definitions
The next sections describe the CUFFT types and transform directions: "Type cufftHandle" on page 2 "Type cufftResult" on page 2 "Type cufftReal" on page 2
PG-00000-003_V2.3 1

NVIDIA

CUDA

CUFFT Library

"Type cufftDoubleReal" on page 3 "Type cufftComplex" on page 3 "Type cufftDoubleComplex" on page 3 "CUFFT Transform Types" on page 3 "CUFFT Transform Directions" on page 4

Type cufftHandle
typedef unsigned int cufftHandle;

is a handle type used to store and access CUFFT plans. For example, the user receives a handle after creating a CUFFT plan and uses this handle to execute the plan.

Type cufftResult
typedef enum cufftResult_t cufftResult;

is an enumeration of values used exclusively as API function return values. The possible return values are defined as follows:
Return Values
CUFFT_SUCCESS CUFFT_INVALID_PLAN CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_SHUTDOWN_FAILED CUFFT_INVALID_SIZE

Any CUFFT operation is successful. CUFFT is passed an invalid plan handle. CUFFT failed to allocate GPU memor y. The user requests an unsupported type. The user specifies a bad memor y pointer. Used for all internal driver errors. CUFFT failed to execute an FFT on the GPU. The CUFFT library failed to initialize. The CUFFT library failed to shut down. The user specifies an unsupported FFT size.

Type cufftReal
typedef float cufftReal;

is a singleprecision, floatingpoint real data type.

PG-00000-003_V2.3

NVIDIA

2

CUDA

CUFFT Library

Type cufftDoubleReal
typedef double cufftDoubleReal;

is a doubleprecision, floatingpoint real data type.

Type cufftComplex
typedef cuComplex cufftComplex;

is a singleprecision, floatingpoint complex data type that consists of interleaved real and imaginary components.

Type cufftDoubleComplex
typedef cuDoubleComplex cufftDoubleComplex;

is a doubleprecision, floatingpoint complex data type that consists of interleaved real and imaginary components.

CUFFT Transform Types
The CUFFT library supports complex and realdata transforms. The cufftType data type is an enumeration of the types of transform data supported by CUFFT:
typedef enum CUFFT_R2C CUFFT_C2R CUFFT_C2C CUFFT_D2Z CUFFT_Z2D CUFFT_Z2Z } cufftType; cufftType_t = 0x2a, // = 0x2c, // = 0x29, // = 0x6a, // = 0x6c, // = 0x69 // { Real to complex (interleaved) Complex (interleaved) to real Complex to complex, interleaved Double to double-complex Double-complex to double Double-complex to double-complex

For complex FFTs, the input and output arrays must interleave the real and imaginary parts (the cufftComplex type). The transform size in each dimension is the number of cufftComplex elements. The CUFFT_C2C constant can be passed to any plan creation function to configure a singleprecision complextocomplex FFT. Pass the CUFFT_Z2Z constant to configure a doubleprecision complexto complex FFT.

3

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

For realtocomplex FFTs, the output array holds only the non redundant complex coefficients. So for an Nelement transform, the output array holds N/2+1 cufftComplex terms. For higher dimensional real transforms of the form N0вN1в...вNn, the last dimension is cut in half such that the output data is N0вN1в...в(Nn/ 2+1) complex elements. Therefore, in order to perform an inplace FFT, the user has to pad the input array in the last dimension to (Nn/ 2+1) complex elements or 2*(N/2+1) real elements. Note that the realtocomplex transform is implicitly forward. Passing the CUFFT_R2C constant to any plan creation function configures a single precision realtocomplex FFT. Passing the CUFFT_D2Z constant configures a doubleprecision realtocomplex FFT. The requirements for complextoreal FFTs are similar to those for real tocomplex. In this case, the input array holds only the nonredundant, N/2+1 complex coefficients from a realtocomplex transform. The output is simply N elements of type cufftReal. However, for an in place transform, the input size must be padded to 2*(N/2+1) real elements. The complextoreal transform is implicitly inverse. Passing the CUFFT_C2R constant to any plan creation function configures a singleprecision complextoreal FFT. Passing CUFFT_Z2D constant configures a doubleprecision complextoreal FFT. For 1D complextocomplex transforms, the stride between signals in a batch is assumed to be the number of cufftComplex elements in the logical transform size. However, for realdata FFTs, the distance between signals in a batch depends on whether the transform is in place or outofplace. For inplace FFTs, the input stride is assumed to be 2*(N/2+1) cufftReal elements or N/2+1 cufftComplex elements. For outofplace transforms, the input and output strides match the logical transform size (N) and the nonredundant size (N/2+1), respectively.

CUFFT Transform Directions
The CUFFT library defines forward and inverse Fast Fourier Transforms according to the sign of the complex exponential term:
#define CUFFT_FORWARD -1 #define CUFFT_INVERSE 1

PG-00000-003_V2.3

NVIDIA

4

CUDA

CUFFT Library

For higherdimensional transforms (2D and 3D), CUFFT performs FFTs in rowmajor or C order. For example, if the user requests a 3D transform plan for sizes X, Y, and Z, CUFFT transforms along Z, Y, and then X. The user can configure columnmajor FFTs by simply changing the order of the size parameters to the plan creation API functions. CUFFT performs unnormalized FFTs; that is, performing a forward FFT on an input data set followed by an inverse FFT on the resulting set yields data that is equal to the input scaled by the number of elements. Scaling either transform by the reciprocal of the size of the data set is left for the user to perform as seen fit.

CUFFT API Functions
The CUFFT API is modeled after FFTW (see http://www.fftw.org), which is one of the most popular and efficient CPUbased FFT libraries. FFTW provides a simple configuration mechanism called a plan that completely specifies the optimal--that is, the minimum floatingpoint operation (flop)--plan of execution for a particular FFT size and data type. The advantage of this approach is that once the user creates a plan, the library stores whatever state is needed to execute the plan multiple times without recalculation of the configuration. The FFTW model works well for CUFFT because different kinds of FFTs require different thread configurations and GPU resources, and plans are a simple way to store and reuse configurations. The CUFFT library initializes internal data upon the first invocation of an API function. Therefore, all API functions could return the CUFFT_SETUP_FAILED error code if the library fails to initialize. CUFFT shuts down automatically when all usercreated FFT plans are destroyed. The CUFFT functions are as follows: "Function "Function "Function "Function cufftPlan1d()" on page 6 cufftPlan2d()" on page 7 cufftPlan3d()" on page 7 cufftDestroy()" on page 8

5

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

"Function "Function "Function "Function "Function "Function

cufftExecC2C()" on page cufftExecR2C()" on page cufftExecC2R()" on page cufftExecZ2Z()" on page cufftExecD2Z()" on page cufftExecZ2D()" on page

8 9 10 11 12 12

Function cufftPlan1d()
cufftResult cufftPlan1d( cufftHandle *plan, int nx, cufftType type, int batch );

creates a 1D FFT plan configuration for a specified signal size and data type. The batch input parameter tells CUFFT how many 1D transforms to configure.
Input
plan nx type batch

Pointer to a cufftHandle object The transfor m size (e.g., 256 for a 256-point FFT) The transfor m data type (e.g., CUFFT_C2C for complex to complex) Number of transfor ms of size nx Contains a CUFFT 1D plan handle value CUFFT library failed to initialize. The nx parameter is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan.

Output
plan

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS

PG-00000-003_V2.3

NVIDIA

6

CUDA

CUFFT Library

Function cufftPlan2d()
cufftResult cufftPlan2d( cufftHandle *plan, int nx, int ny, cufftType type );

creates a 2D FFT plan configuration according to specified signal sizes and data type. This function is the same as cufftPlan1d() except that it takes a second size parameter, ny, and does not support batching.
Input
plan nx ny type

Poin T he T he T he

ter to a cufftHandle object transfor m size in the X dimension (number of rows) transfor m size in the Y dimension (number of columns) transfor m data type (e.g., CUFFT_C2R for complex to real)

Output
plan

Contains a CUFFT 2D plan handle value CUFFT library failed to initialize. The nx or ny parameter is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan.

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS

Function cufftPlan3d()
cufftResult cufftPlan3d( cufftHandle *plan, int nx, int ny, int nz, cufftType type );

creates a 3D FFT plan configuration according to specified signal sizes and data type. This function is the same as cufftPlan2d() except that it takes a third size parameter nz. :
Input
plan nx ny

Pointer to a cufftHandle object The transfor m size in the X dimension The transfor m size in the Y dimension

7

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

Input (continued)
nz type

The transfor m size in the Z dimension The transfor m data type (e.g., CUFFT_R2C for real to complex) Contains a CUFFT 3D plan handle value CUFFT library failed to initialize. Parameter nx, ny, or nz is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan.

Output
plan

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS

Function cufftDestroy()
cufftResult cufftDestroy( cufftHandle plan );

frees all GPU resources associated with a CUFFT plan and destroys the internal plan data structure. This function should be called once a plan is no longer needed to avoid wasting GPU memory.
Input
plan

The cufftHandle object of the plan to be destroyed. CUFFT library failed to initialize. CUFFT library failed to shut down. The plan parameter is not a valid handle. CUFFT successfully destroyed the FFT plan.

Return Values
CUFFT_SETUP_FAILED CUFFT_SHUTDOWN_FAILED CUFFT_INVALID_PLAN CUFFT_SUCCESS

Function cufftExecC2C()
cufftResult cufftExecC2C( cufftHandle plan, cufftComplex *idata, cufftComplex *odata, int direction );

executes a CUFFT singleprecision complextocomplex transform plan as specified by direction. CUFFT uses as input data the GPU
PG-00000-003_V2.3 8

NVIDIA

CUDA

CUFFT Library

memory pointed to by the idata parameter. This function stores the Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform.
Input

The cufftHandle object for the plan to update idata Pointer to the single-precision complex input data (in GPU memory) to transfor m odata Pointer to the single-precision complex output data (in GPU memory) direction The transfor m direction: CUFFT_FORWARD or CUFFT_INVERSE
plan

Output
odata

Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata, odata, and/or direction parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

Function cufftExecR2C()
cufftResult cufftExecR2C( cufftHandle plan, cufftReal *idata, cufftComplex *odata );

executes a CUFFT singleprecision realtocomplex (implicitly forward) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the non redundant Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform (See "CUFFT Transform Types" on page 3 for details on real data FFTs.)
Input
plan

The cufftHandle object for the plan to update

9

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

Input (continued)
idata odata

Pointer to the single-precision real input data (in GPU memory) to transfor m Pointer to the single-precision complex output data (in GPU memory) Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Output
odata

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

Function cufftExecC2R()
cufftResult cufftExecC2R( cufftHandle plan, cufftComplex *idata, cufftReal *odata );

executes a CUFFT singleprecision complextoreal (implicitly inverse) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. The input array holds only the non redundant complex Fourier coefficients. This function stores the real output values in the odata array. If idata and odata are the same, this method does an inplace transform. (See "CUFFT Transform Types" on page 3 for details on real data FFTs.)
Input
plan idata odata

The cufftHandle object for the plan to update Pointer to the single-precision complex input data (in GPU memory) to transfor m Pointer to the single-precision real output data (in GPU memory) Contains the real-valued output data

Output
odata

PG-00000-003_V2.3

NVIDIA

10

CUDA

CUFFT Library

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Function cufftExecZ2Z()
cufftResult cufftExecZ2Z( cufftHandle plan, cufftDoubleComplex *idata, cufftDoubleComplex *odata, int direction );

executes a CUFFT doubleprecision complextocomplex transform plan as specified by direction. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform.
Input

The cufftHandle object for the plan to update Pointer to the double-precision complex input data (in GPU memory) to transfor m odata Pointer to the double-precision complex output data (in GPU memory) direction The transfor m direction: CUFFT_FORWARD or CUFFT_INVERSE
plan idata

Output
odata

Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata, odata, and/or direction parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

11

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

Function cufftExecD2Z()
cufftResult cufftExecD2Z( cufftHandle plan, cufftDoubleReal *idata, cufftDoubleComplex *odata );

executes a CUFFT doubleprecision realtocomplex (implicitly forward) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the non redundant Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform (See "CUFFT Transform Types" on page 3 for details on real data FFTs.)
Input
plan idata odata

The cufftHandle object for the plan to update Pointer to the double-precision real input data (in GPU memory) to transfor m Pointer to the double-precision complex output data (in GPU memory) Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Output
odata

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

Function cufftExecZ2D()
cufftResult cufftExecZ2D( cufftHandle plan, cufftDoubleComplex *idata, cufftDoubleReal *odata );

executes a CUFFT doubleprecision complextoreal (implicitly inverse) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. The input array holds only the nonredundant complex Fourier coefficients. This function stores the

PG-00000-003_V2.3

NVIDIA

12

CUDA

CUFFT Library

real output values in the odata array. If idata and odata are the same, this method does an inplace transform. (See "CUFFT Transform Types" on page 3 for details on real data FFTs.)
Input
plan idata odata

The cufftHandle object for the plan to update Pointer to the double-precision complex input data (in GPU memory) to transfor m Pointer to the double-precision real output data (in GPU memory) Contains the real-valued output data CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transfor m on GPU. CUFFT successfully executed the FFT plan.

Output
odata

Return Values
CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS

Accuracy and Performance
The CUFFT library implements several FFT algorithms, each having different performance and accuracy. The best performance paths correspond to transform sizes that meet two criteria: 1. Fit in CUDAs shared memory 2. Are powers of a single factor (for example, powers of two) These transforms are also the most accurate due to the numeric stability of the chosen FFT algorithm. For transform sizes that meet the first criterion but not second, CUFFT uses a more general mixedradix FFT algorithm that is usually slower and less numerically accurate. Therefore, if possible it is best to use sizes that are powers of two or four, or powers of other small primes (such as, three, five, or seven). In addition, the poweroftwo FFT algorithm in CUFFT makes maximum use of shared memory by blocking subtransforms for signals that do not meet the first criterion.

13

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

For transform sizes that do not meet either criteria above, CUFFT uses an outofplace, mixedradix algorithm that stores all intermediate results in CUDAs global GPU memory. Although this algorithm uses optimized transform modules for many factors, it has generally lower performance because global memory has less bandwidth than shared memory. The one exception is large 1D transforms, where CUFFT uses a distributed algorithm that performs a 1D FFT using a 2D FFT, where the dimensions of the 2D transform are factors of the 1D size. This path attempts to utilize the faster transforms mentioned above even if the signal size is too large to fit in CUDAs shared memory. Many FFT algorithms for real data exploit the conjugate symmetry property to reduce computation and memory cost by roughly half. However, CUFFT does not implement any specialized algorithms for real data, and so there is no direct performance benefit to using realto complex (or complextoreal) plans instead of complextocomplex. For this release, the real data API exists primarily for convenience, so that users do not have to build interleaved complex data from a real data source before using the library. For 1D transforms, the performance for real data will either match or be less than the complex equivalent (due to an extra copy in come cases). However, there is usually a performance benefit to using real data for 2D and 3D FFTs, since all transforms but the last dimension operate on roughly half the logical signal size

PG-00000-003_V2.3

NVIDIA

14

CUDA

CUFFT Library

CUFFT Code Examples
This section provides simple examples of 1D, 2D, and 3D complex and real data transforms that use the CUFFT to perform forward and inverse FFTs.

1D Complex-to-Complex Transforms
#define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*BATCH); /* Create a 1D FFT plan. */ cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH); /* Use the CUFFT plan to transform the signal in place. */ cufftExecC2C(plan, data, data, CUFFT_FORWARD); /* Inverse transform the signal in place. */ cufftExecC2C(plan, data, data, CUFFT_INVERSE); /* Note: (1) Divide by number of elements in data set to get back original data (2) Identical pointers to input and output arrays implies in-place transformation */ /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data);

15

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

1D Real-to-Complex Transforms
#define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; cudaMalloc((void**)&data, sizeof(cufftComplex)*(NX/2+1)*BATCH); /* Create a 1D FFT plan. */ cufftPlan1d(&plan, NX, CUFFT_R2C, BATCH); /* Use the CUFFT plan to transform the signal in place. */ cufftExecR2C(plan, (cufftReal*)data, data); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data);

2D Complex-to-Complex Transforms
#define NX 256 #define NY 128 cufftHandle plan; cufftComplex *idata, *odata; cudaMalloc((void**)&idata, sizeof(cufftComplex)*NX*NY); cudaMalloc((void**)&odata, sizeof(cufftComplex)*NX*NY); /* Create a 2D FFT plan. */ cufftPlan2d(&plan, NX, NY, CUFFT_C2C); /* Use the CUFFT plan to transform the signal out of place. */ cufftExecC2C(plan, idata, odata, CUFFT_FORWARD); /* Note: idata != odata indicates an out-of-place transformation to CUFFT at execution time. */

PG-00000-003_V2.3

NVIDIA

16

CUDA

CUFFT Library

/* Inverse transform the signal in place */ cufftExecC2C(plan, odata, odata, CUFFT_INVERSE); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(idata); cudaFree(odata);

2D Complex-to-Real Transforms
#define NX 256 #define NY 128 cufftHandle plan; cufftComplex *idata; cufftReal *odata; cudaMalloc((void**)&idata, sizeof(cufftComplex)*NX*NY); cudaMalloc((void**)&odata, sizeof(cufftReal)*NX*NY); /* Create a 2D FFT plan. */ cufftPlan2d(&plan, NX, NY, CUFFT_C2R); /* Use the CUFFT plan to transform the signal out of place. */ cufftExecC2R(plan, idata, odata); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(idata); cudaFree(odata);

17

NVIDIA

PG-00000-003_V2.3

CUDA

CUFFT Library

3D Complex-to-Complex Transforms
#define NX 64 #define NY 64 #define NZ 128 cufftHandle plan; cufftComplex *data1, *data2; cudaMalloc((void**)&data1, sizeof(cufftComplex)*NX*NY*NZ); cudaMalloc((void**)&data2, sizeof(cufftComplex)*NX*NY*NZ); /* Create a 3D FFT plan. */ cufftPlan3d(&plan, NX, NY, NZ, CUFFT_C2C); /* Transform the first signal in place. */ cufftExecC2C(plan, data1, data1, CUFFT_FORWARD); /* Transform the second signal using the same plan. */ cufftExecC2C(plan, data2, data2, CUFFT_FORWARD); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data1); cudaFree(data2);

PG-00000-003_V2.3

NVIDIA

18