This blog is to documeny my experience with GCP in September 2020
I worked with 4 google services:
1. Storage
2. Functions
3. Cloud run
4. Scheduler
Before working you need to
A. Create a project
B. Create a service account and download the json file. This is the authentication needed to access GCP’s services
C. Enable API of all the services above
1. Storage:
Inside storage you can create different buckets
Inside each bucket files are stored as objects or blobs
There are no folders, just objects with directories
2. Functions
This is simply python functions which get invoked when you call the URL of the function after it gets created
3. Cloud run
This service runs docker images that have been previously built. The steps are as follows
1. Open a cloud shell and activate a project
gcloud config set project [PROJECT ID]
2. Create a folder for the project
3. In the project add
A. Dockerfile
B. Requirements.txt
C. All python scripts, data, models
4. Run google’s “docker build” equivelant
Gcloud builds submit —tag gcr.io/[PROJECT ID]/[CONTAINER NAME]
5. Deploy the container using the equivelant of “docker run”
Gcloud run deploy —image gcr.io/[PROJECT ID]/[CONTAINER NAME] —platform managed —allow-unauthenticated —memory 4G —CPU2.0
6. A link is generated which when opened the container runs
Scheduler
To call a certain service, make sure it is HTTP and that the service account is authorized to call the service using OIDC token
Lesssons learned
1. CLoud run and functions are designed for short term web requests, or maybe webscraping, but are not designed for heavy weight analytics
2. Mapping the port was a pain, Make sure that the container uses the port 8080 in the Dockerfile
3. To run Python scripts, make sure you use Flask. Specifically use app.run(host =‘0.0.0.0’, port = 8080)
4. A good architecture is to schedule the scheduler to call functions
5. The containers build on GCP are stored on Google container registry. The container can be deployed on cloud run, compute enginer or kubernetes cluster. This can be easily done using a button on top.
6. To copy data from to GCS use the command gcsutils cp gs://
7. To schedule jobs on linux, use * * * * * xx/xx/xx/python script.py Where xx/xx/xx is the path of the python interpreter of the environment
No comments:
Post a Comment