Table of Contents
- Introduction
- Programming Model
- Programming Interface
- Hardware Implementation
- Performance Guidelines
- Overall Performance Optimization Strategies
- Maximize Utilization
- Maximize Memory Throughput
- Maximize Instruction Throughput
- Arithmetic Instructions
- Control Flow Instructions
- Synchronization Instruction
- CUDA-Enabled GPUs
- C Language Extensions
- Function Type Qualifiers
- Variable Type Qualifiers
- Built-in Vector Types
- char1, uchar1, char2, uchar2, char3, uchar3, char4, uchar4, short1, ushort1, short2, ushort2, short3, ushort3, short4, ushort4, int1, uint1, int2, uint2, int3, uint3, int4, uint4, long1, ulong1, long2, ulong2, long3, ulong3, long4, ulong4, longlong1, ...
- dim3
- Built-in Variables
- Memory Fence Functions
- Synchronization Functions
- Mathematical Functions
- Texture Functions
- Time Function
- Atomic Functions
- Warp Vote Functions
- Profiler Counter Function
- Execution Configuration
- Launch Bounds
- Mathematical Functions
- C++ Language Constructs
- NVCC Specifics
- Texture Fetching
- Compute Capabilities