Getting Started with Google Gemini API: Installation and First API Call in Python
Learn how to install the google-generativeai SDK, configure your API key, make your first generate_content call, and parse responses. A complete hands-on beginner tutorial for Google Gemini.
Why Google Gemini for Agent Development
Google Gemini represents Google DeepMind's most capable family of large language models. Unlike earlier Google AI offerings that required complex GCP setup, the Gemini API is accessible through a simple Python SDK with a free tier generous enough for prototyping entire agent systems. Gemini models natively support text, images, video, audio, and code — making them uniquely suited for building multi-modal agents.
The google-generativeai SDK is the official Python client. It handles authentication, request formatting, streaming, and response parsing so you can focus on building agent logic rather than managing HTTP calls.
Prerequisites
Before you begin, ensure you have:
flowchart LR
CALLER(["Student or Parent"])
subgraph TEL["Telephony"]
SIP["Twilio SIP and PSTN"]
end
subgraph BRAIN["Education AI Agent"]
STT["Streaming STT<br/>Deepgram or Whisper"]
NLU{"Intent and<br/>Entity Extraction"}
TOOLS["Tool Calls"]
TTS["Streaming TTS<br/>ElevenLabs or Rime"]
end
subgraph DATA["Live Data Plane"]
CRM[("CRM and Notes")]
CAL[("Calendar and<br/>Schedule")]
KB[("Knowledge Base<br/>and Policies")]
end
subgraph OUT["Outcomes"]
O1(["Enrollment captured"])
O2(["Tour scheduled"])
O3(["Counselor callback"])
end
CALLER --> SIP --> STT --> NLU
NLU -->|Lookup| TOOLS
TOOLS <--> CRM
TOOLS <--> CAL
TOOLS <--> KB
NLU --> TTS --> SIP --> CALLER
NLU -->|Resolved| O1
NLU -->|Schedule| O2
NLU -->|Escalate| O3
style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
style O1 fill:#059669,stroke:#047857,color:#fff
style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
- Python 3.9 or later installed
- A Google AI Studio API key (free at aistudio.google.com)
- Basic familiarity with Python
Step 1: Install the SDK
Install the official Google Generative AI package:
pip install google-generativeai
Verify the installation:
python -c "import google.generativeai as genai; print('SDK installed successfully')"
Step 2: Configure Your API Key
There are two ways to provide your API key. The recommended approach uses an environment variable:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export GOOGLE_API_KEY="your-api-key-here"
Then in your Python code, configure the SDK:
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
For quick experiments you can pass the key directly, but never commit API keys to version control:
genai.configure(api_key="your-api-key-here") # Only for local testing
Step 3: Make Your First API Call
The core interaction pattern in Gemini is generate_content. Here is the simplest possible call:
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Explain what an AI agent is in three sentences.")
print(response.text)
The GenerativeModel class is your primary interface. You specify which model to use — gemini-2.0-flash is fast and cost-effective, while gemini-2.0-pro offers stronger reasoning for complex tasks.
Step 4: Parse the Response Object
The response object contains more than just text. Understanding its structure is important for building robust agents:
response = model.generate_content("What is retrieval augmented generation?")
# The generated text
print(response.text)
# Safety ratings for content filtering
for candidate in response.candidates:
print(f"Finish reason: {candidate.finish_reason}")
for rating in candidate.safety_ratings:
print(f" {rating.category}: {rating.probability}")
# Token usage statistics
print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
print(f"Response tokens: {response.usage_metadata.candidates_token_count}")
print(f"Total tokens: {response.usage_metadata.total_token_count}")
The usage_metadata field is critical for cost tracking in production agents. Each model has different pricing per million tokens, and monitoring usage prevents unexpected bills.
Step 5: Configure Generation Parameters
Control the model's behavior with generation configuration:
model = genai.GenerativeModel(
"gemini-2.0-flash",
generation_config=genai.GenerationConfig(
temperature=0.2, # Lower = more deterministic
top_p=0.8, # Nucleus sampling threshold
top_k=40, # Token selection pool size
max_output_tokens=1024,# Maximum response length
),
)
response = model.generate_content("Write a function to sort a list in Python.")
print(response.text)
For agent applications, a lower temperature (0.1-0.3) produces more reliable tool-calling behavior, while higher values (0.7-1.0) work better for creative content generation.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6: System Instructions
System instructions set the agent's persona and behavioral guidelines. They persist across the entire conversation:
model = genai.GenerativeModel(
"gemini-2.0-flash",
system_instruction="You are a senior Python developer. Always provide complete, runnable code examples. Explain tradeoffs between different approaches."
)
response = model.generate_content("How should I handle database connections in a FastAPI app?")
print(response.text)
System instructions are the foundation of every agent you build with Gemini. They define what the agent does, how it responds, and what constraints it operates under.
Common Pitfalls
API key not found: Ensure the environment variable is set in the same shell session where you run Python. Use os.environ.get("GOOGLE_API_KEY") with a fallback for debugging.
Rate limiting: The free tier allows 15 requests per minute for Gemini Pro. Implement exponential backoff for production agents.
Response blocked by safety filters: If response.text raises an error, check response.prompt_feedback to see which safety category triggered the block.
FAQ
What is the difference between Gemini Flash and Gemini Pro?
Gemini Flash is optimized for speed and cost — it responds faster and costs significantly less per token. Gemini Pro offers stronger reasoning, better instruction following, and higher accuracy on complex tasks. For most agent development, start with Flash and upgrade to Pro only for tasks where Flash falls short.
Is the Gemini API free to use?
Google AI Studio offers a free tier with rate limits (typically 15 requests per minute for Pro, 30 for Flash). This is sufficient for development and prototyping. For production workloads, you pay per million tokens through either AI Studio or Vertex AI.
Can I use Gemini with async Python code?
Yes. The SDK provides generate_content_async for use with asyncio. This is essential for building non-blocking agent systems that handle multiple requests concurrently.
#GoogleGemini #Python #GettingStarted #Tutorial #GenerativeAI #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.