- 귀찮은데도 이 글을 적게 된 이유는 pip 로 tensorflow 를 install 했더니 이 google 잡것들이 자꾸 소스를 바꾸는 통에
새로 다운 받은 소스를 이전에 pip 로 설치한 tensorflow 로 실행할 때 소스가 바뀐 부분이 많아서 제대로 돌아가지 않아서...
빡쳐서...
가장 중요한 이유는 내가 까먹을까봐... - 일단 필요한 dependencies 는 모두 깔려 있다고 가정한다.
- 다음의 사이트들을 참고하여 dependencies 를 모두 설치할 것
- https://www.tensorflow.org/versions/master/get_started/os_setup.html#installing_from_sources
- http://www.complexity.co.kr/?p=2476
- bazel, cuda, cuDNN, etc 모두 깔려 있다고 가정한다.
- Clone the TensorFlow repository
- $ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
- to fetch the protobuf library that TensorFlow depends on 을 위해서 --recurse-submodules 옵션이 필요하다
- Configure the installation
- 이미 CUDA 7.0 과 CUDNN Toolkit 6.5 이 모두 깔려있고 세팅이 다 되어있다는 전제하에 다음을 진행한다.
- 3.3 의 과정을 실행하게 되면 CUDA 를 연결하여 TensorFlow 를 사용할 수 있도록 설정하여 Build 할 수 있다.
- 다음 명령어를 실행하여
$ TF_UNOFFICIAL_SETTING=1 ./configure # Same as the official settings above WARNING: You are configuring unofficial settings in TensorFlow. Because some external libraries are not backward compatible, these settings are largely untested and unsupported. Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: "3.5,5.2"]: 3.0 # 나는 5.2 Setting up Cuda include Setting up Cuda lib64 Setting up Cuda bin Setting up Cuda nvvm Configuration finished
- graphic card 가 지원하는 compute version 을 적어준다
- 나의 경우는 5.2 (gtx960)
- the location of python
- 나의 경우는 Default (/usr/bin/python) 였다.
- build TensorFlow with GPU support
- 나의 경우는 y
- the location of CUDA 7.0 toolkit
- 나의 경우는 /usr/local/cuda-7.0
- the location of CUDNN 6.5 V2 library
- 나의 경우는 /usr/local/cuda-7.0 (CUDNN 을 깔 때 CUDA 7.0 을 설치한 폴더 하에 복사하게 되니깐 똑같지요.)
- 위의 5가지 설정을 세팅해준다.
- Build your target with GPU support
- source code 의 root 에서 다음 명령어들을 실행할 것
- 주의할 점은 cuda 를 사용할 거라면 두번 째 명령어로 build 할 것
- "--config=cuda" is needed to enable the GPU support
- 다음의 명령어들을 실행한다.
# 얘는 CPU version build 할 때
$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
# To build with GPU support:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
# 아래 색깔 부분은 자신의 파일 이름을 적어줄 것.
$ sudo -H pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl
# 나의 경우
# sudo -H pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
- ...
- Test Python with Tenserflow & CUDA
- 다음의 명령어를 실행해 본다.
$ python ... >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> print sess.run(hello) Hello, TensorFlow! >>> a = tf.constant(10) >>> b = tf.constant(32) >>> print sess.run(a + b) 42 >>>
위의 명령어 결과가 제대로 나오면 기본은 끝.
- 다른 파일들 실행해서 GPU 가 제대로 실행 되는지 확인할 것
# GPU 를 이용하는 경우... 다음과 같이 나오면 정상인거야.
/usr/bin/python2.7 /home/juce/study/tf/tf_org_tutorials/05_rnn/ptb/ptb_word_lm.py --data_path=/home/juce/study/tf/tf_org_tutorials/05_rnn/ptb/dataset --model=small
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:903] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.367
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 3.02GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0xb02780000 extends to 0xbc3873000
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/direct_session.cc:59] Direct session inter op parallelism threads: 8
Epoch: 1 Learning rate: 1.000
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 3396 get requests, put_count=2221 evicted_count=1000 eviction_rate=0.450248 and unsatisfied allocation rate=0.669906
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 100 to 110
0.004 perplexity: 4996.585 speed: 3511 wps
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 4004 get requests, put_count=3171 evicted_count=1000 eviction_rate=0.315358 and unsatisfied allocation rate=0.463536
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 256 to 281
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:242] PoolAllocator: After 4008 get requests, put_count=4024 evicted_count=1000 eviction_rate=0.248509 and unsatisfied allocation rate=0.26023
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:254] Raising pool_size_limit_ from 655 to 720
0.104 perplexity: 845.587 speed: 4938 wps
0.204 perplexity: 624.975 speed: 4856 wps
0.304 perplexity: 505.087 speed: 4832 wps
0.404 perplexity: 435.207 speed: 4818 wps
0.504 perplexity: 390.053 speed: 4805 wps
0.604 perplexity: 351.450 speed: 4797 wps
0.703 perplexity: 325.042 speed: 4798 wps