1. 귀찮은데도 이 글을 적게 된 이유는 pip 로 tensorflow 를 install 했더니 이 google 잡것들이 자꾸 소스를 바꾸는 통에
    새로 다운 받은 소스를 이전에 pip 로 설치한 tensorflow 로 실행할 때 소스가 바뀐 부분이 많아서 제대로 돌아가지 않아서...
    빡쳐서...
    가장 중요한 이유는 내가 까먹을까봐...





  2. 일단 필요한 dependencies 는 모두 깔려 있다고 가정한다.
    1. 다음의 사이트들을 참고하여 dependencies 를 모두 설치할 것
    2. bazel, cuda, cuDNN, etc 모두 깔려 있다고 가정한다.


  3. Clone the TensorFlow repository
    • $ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
      • to fetch the protobuf library that TensorFlow depends on 을 위해서 --recurse-submodules 옵션이 필요하다


  4. Configure the installation
    1. 이미 CUDA 7.0 과 CUDNN Toolkit 6.5 이 모두 깔려있고 세팅이 다 되어있다는 전제하에 다음을 진행한다.
    2. 3.3  의 과정을 실행하게 되면 CUDA 를 연결하여 TensorFlow 를 사용할 수 있도록 설정하여 Build 할 수 있다.
    3. 다음 명령어를 실행하여
      $ TF_UNOFFICIAL_SETTING=1 ./configure
      
      # Same as the official settings above
      
      WARNING: You are configuring unofficial settings in TensorFlow. Because some
      external libraries are not backward compatible, these settings are largely
      untested and unsupported.
      
      Please specify a list of comma-separated Cuda compute capabilities you want to
      build with. You can find the compute capability of your device at:
      https://developer.nvidia.com/cuda-gpus.
      Please note that each additional compute capability significantly increases
      your build time and binary size. [Default is: "3.5,5.2"]: 3.0                # 나는 5.2
      
      Setting up Cuda include
      Setting up Cuda lib64
      Setting up Cuda bin
      Setting up Cuda nvvm
      Configuration finished
      • graphic card 가 지원하는 compute version 을 적어준다
        • 나의 경우는 5.2 (gtx960)
      • the location of python
        • 나의 경우는 Default (/usr/bin/python) 였다.
      • build TensorFlow with GPU support
        • 나의 경우는 y
      • the location of CUDA 7.0 toolkit
        • 나의 경우는 /usr/local/cuda-7.0
      • the location of CUDNN 6.5 V2 library
        • 나의 경우는 /usr/local/cuda-7.0 (CUDNN 을 깔 때 CUDA 7.0 을 설치한 폴더 하에 복사하게 되니깐 똑같지요.)
      • 위의 5가지 설정을 세팅해준다.



  5. Build your target with GPU support
    1. source code 의 root 에서 다음 명령어들을 실행할 것
      1. 주의할 점은 cuda 를 사용할 거라면 두번 째 명령어로 build 할 것
      2. "--config=cuda" is needed to enable the GPU support
      3. 다음의 명령어들을 실행한다.

        # 얘는 CPU version build 할 때

        $ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package


        # To build with GPU support:

        $ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

        $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg


        # The name of the .whl file will depend on your platform.

        # 아래 색깔 부분은 자신의 파일 이름을 적어줄 것.

        $ sudo -H pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl

        # 나의 경우

        sudo -H pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

      4. ...



  6. Test Python with Tenserflow & CUDA
    1. 다음의 명령어를 실행해 본다.
      $ python
      ...
      >>> import tensorflow as tf
      >>> hello = tf.constant('Hello, TensorFlow!')
      >>> sess = tf.Session()
      >>> print sess.run(hello)
      Hello, TensorFlow!
      >>> a = tf.constant(10)
      >>> b = tf.constant(32)
      >>> print sess.run(a + b)
      42
      >>>
      
    2. 위의 명령어 결과가 제대로 나오면 기본은 끝.
    3. 다른 파일들 실행해서 GPU 가 제대로 실행 되는지 확인할 것

# GPU 를 이용하는 경우... 다음과 같이 나오면 정상인거야.



/usr/bin/python2.7 /home/juce/study/tf/tf_org_tutorials/05_rnn/ptb/ptb_word_lm.py --data_path=/home/juce/study/tf/tf_org_tutorials/05_rnn/ptb/dataset --model=small

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:903] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties: 

name: GeForce GTX 960

major: 5 minor: 2 memoryClockRate (GHz) 1.367

pciBusID 0000:01:00.0

Total memory: 4.00GiB

Free memory: 3.31GiB

I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 

I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 3.02GiB bytes.

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0xb02780000 extends to 0xbc3873000

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB

I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB

I tensorflow/core/common_runtime/direct_session.cc:59] Direct session inter op parallelism threads: 8

Epoch: 1 Learning rate: 1.000

W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 3396 get requests, put_count=2221 evicted_count=1000 eviction_rate=0.450248 and unsatisfied allocation rate=0.669906

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 100 to 110

0.004 perplexity: 4996.585 speed: 3511 wps

W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 4004 get requests, put_count=3171 evicted_count=1000 eviction_rate=0.315358 and unsatisfied allocation rate=0.463536

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 256 to 281

W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 4008 get requests, put_count=4024 evicted_count=1000 eviction_rate=0.248509 and unsatisfied allocation rate=0.26023

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 655 to 720

0.104 perplexity: 845.587 speed: 4938 wps

0.204 perplexity: 624.975 speed: 4856 wps

0.304 perplexity: 505.087 speed: 4832 wps

0.404 perplexity: 435.207 speed: 4818 wps

0.504 perplexity: 390.053 speed: 4805 wps

0.604 perplexity: 351.450 speed: 4797 wps

0.703 perplexity: 325.042 speed: 4798 wps




+ Recent posts