Python AI工程师入门指南：第一课

2026-06-14阅读 0热度 0

Python

学Python这事儿，说难不难，说简单也不简单。关键是把环境搭好、基础打牢，后面写起代码来才顺手。这份笔记整理了从零开始学Python的完整路径，从安装环境到写出一个小工具，一路都安排明白了。

安装`Python`

先到Python官网下载安装包，装完之后记得把路径加到系统环境变量里，这样全局才能用。不同系统配置方式不一样，查一下就好。

Mac用户也可以直接用brew安装，一行命令搞定：

brew install python

要注意的是python版本问题。看下面这张图，版本信息一目了然。

安装配置好之后，在终端输入python3 --version，如果有版本号输出，那就说明安装成功了。

版本管理`pyenv`

pyenv是干嘛用的？简单说，就是用来管理和切换Python版本的。为什么要装它呢？从上面那张图就能看出来，python版本那么多，有的库不兼容低版本，有的又不支持高版本。这时候就需要一个工具，让你能在不同版本之间自由切换，想用哪个用哪个。

安装pyenv也很简单：

brew install pyenv

装好之后要添加环境变量：

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init - zsh)"' >> ~/.zshrc# 使配置生效
exec "$SHELL"

用pyenv -v测试一下，能看到版本号就说明装好了。

正常情况下，用pyenv安装某个python版本时，它会尽力下载编译所需的依赖。但偶尔会因为缺少某些系统依赖而失败，所以最好提前把依赖装好：

brew install openssl@3 readline sqlite3 xz tcl-tk@8 libb2 zstd zlib pkgconfig

依赖没问题之后，就可以安装指定版本了。从版本信息来看，3.12应该是目前用得最多的：

pyenv install 3.12

虚拟环境`venv`

Python的虚拟环境venv，可以创建一个独立的“小房间”，在里面安装需要的库，不会影响到全局环境，也就避免了版本冲突的问题。

# 创建虚拟环境
python3 -m venv .venv# 激活虚拟环境
source .venv/bin/activate# 退出虚拟环境
deactivate

包管理器`pip`

pip是Python自带的默认包管理器，用来安装、更新、卸载软件包，非常方便：

pip install requests

写出我的`Hello World`

创建一个python项目，新建一个main.py文件，写入以下内容：

print("Hello World")

在终端运行python3 main.py，就能看到控制台的输出了。编程世界的大门，从此打开。

基础知识回顾

数据类型、变量

变量名只能由数字、字母、下划线组成，不能以数字开头，也不能有特殊字符。

name = "hboot"
age = 18
user_name = 'admin'

5种核心数据类型：Number String Boolean List Dictionary。数字0、空字符串''、空列表[]等，在逻辑上都等于False。

list = []
list.append(18)dict = {"name": "hboot", "age": 18}
dict["name"] = "hboost"

函数

使用def声明函数，可以设置参数默认值，但默认参数必须放到最后：

def get_name(name,age=18):
    return f"hello, {name}! You are {age} years old."print(get_name(dict["name"]))

*args用来接收任意数量的位置参数，打包成元组；**kwargs用来接收任意数量的关键字参数，打包成字典。

def get_name(*args, **kwargs):
    print(f"hello, {args[0]}! You are {kwargs['age']} years old.")get_name("hboot", age=18)

lambda匿名函数，适合写短小精悍、不需要定义函数名的场景：

get_name = lambda name, age=18: f"hello, {name}! You are {age} years old."
print(get_name(dict["name"]))

装饰器，通过@符号，把函数作为参数传入，返回一个新函数。可以在不修改内部代码的情况下，给函数增加功能：

def decorator(func):
    def wrapper(*args, **kwargs):
        print("Before function execution.")
        result = func(*args, **kwargs)
        print("After function execution.")
        return result    return wrapper@decorator
def func():
    print("Function execution.")
    return "Function result."print(func())

类`Class` 和对象`Object`

Class是创建骨架结构，Object则是具体的事物：

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        print(f"{self.name} is {self.age} years old.")    def say_hello(self):
        print(f"Hello, my name is {self.name}.")
        print(f"I am {self.age} years old.")
person = Person("Alice", 25)
person.say_hello()

__init__是类的构造函数，在创建对象时调用。self是指向当前对象的引用，用来访问对象的属性和方法。

封装、继承、多态——类就是一类事物的封装，通过实例化得到具体事物对象。

继承是站在巨人的肩膀上，继承了别人的属性和方法，然后进行扩展：

class AdvancedPerson(Person):
    def __init__(self, name, age):
        super().__init__(name, age)
        self.skills = []    def add_skill(self, skill):
        self.skills.append(skill)
        print(f"{self.name} has learned {skill}.")advanced_person = AdvancedPerson("Bob", 30)
advanced_person.add_skill("Python")

还可以通过覆盖Python内部方法，来修改默认行为：

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age    def __str__(self):
        return f"{self.name} is {self.age} years old."
    def __len__(self):
        return len(self.name)
person = Person("Alice", 25)
print(person)
print(len(person))

条件判断和循环语句

通过if进行条件判断，逻辑组合用and or not：

age = 18if age < 18:
    print("You are a minor.")
if age >= 18 and age < 65:
    print("You are an adult.")
else:
    print("You are an senior.")

for用来遍历循环。range()可以生成数字序列：

list = ['apple', 'banana', 'orange']for item in list:
    print(item)for i in range(1, 6):
    print(i)

while循环会一直运行，直到条件为假才退出：

i = 1
while i <= 5:
    print(i)
    i += 1

在循环中，break用来退出循环，continue用来跳过当前循环的剩余部分，继续下一次循环。

有时候为了简洁，可以用列表推导式把好几行的if for语句放到一行，运行速度也更快：

list = []
# if  for 语句
for i in range(1, 6):
    if i % 2 == 0:
        list.append(i)# 列表推导式
list = [x for x in range(1, 6) if x % 2 == 0]
print(list)

文件操作`I/O`

使用with open(...)语句对文件进行读写操作：

with open('log.txt','w',encoding='utf-8') as f:
    f.write('Hello World')

可以看到当前目录下多了一个文件log.txt，内容为Hello World。

读取文件时，把模式改为r：

with open('log.txt','r',encoding='utf-8') as f:
    print(f.read())

一次性读取小文件没问题，但处理大文件时，可能会直接撑爆内存。这时可以一次读取一行：

with open('log.txt','r',encoding='utf-8') as f:
    for line in f:
        print(line)
        print(line.strip()) # 去除换行符

模块导入

使用import导入模块：

import mathprint(math.sqrt(16))

有些库名比较长，可以简化导入的模块名：

import math as mprint(m.sqrt(16))

如果只需要模块中的某个功能，可以指定名称导入：

from math import sqrtprint(sqrt(16))

错误捕获

用try...except...finally语句来捕获错误：

try:
    # print(1/0)
    # print(math.sqrt(-1))
    print(int('a'))except ZeroDivisionError:
    print("Cannot divide by zero.")
except ValueError:
    print("Invalid value.")
except:
    print("An error occurred.")
finally:
    print("Finally block executed.")

关键是捕获具体的错误类型，这样才能给出合适的错误信息提示。

核心知识进阶

类型注解

可以直接标记数据类型，让代码更清晰：

name: str = "hboot"def get_name(name: str) -> str:
    return f"hello, {name}!"print(get_name(name))

Python不会强制进行类型校验，类型注解主要用于工具检测提示以及方便代码维护。

生成器

为了节省内存，可以使用生成器进行惰性求值：

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + bfor i in fibonacci():
    if i > 100:
        break
    print(i)

上下文管理

为了确保资源的正确关闭和释放，@contextmanager装饰器可以把函数变成上下文管理器，配合with语句使用：

from contextlib import contextmanager@contextmanager
def open_file(filename,mode):
    f = open(filename,mode)
    try:
        yield f
    finally:
        f.close()with open_file('log.txt','r') as f:
    print(f.read())

魔法`Pythonic`

之前在class类里演示过的__init__和__str__方法，还有很多类似的魔法方法。通过定义这些方法，可以让类对象的操作更加优雅。

比如上面学过的上下文管理器，通过实现__enter__和__exit__方法，类对象就能配合with语句使用了：

class OpenFile:
    def __init__(self,filename,mode='r'):
        self.filename = filename
        self.mode = mode       def __enter__(self):
        self.f = open(self.filename,self.mode)
        return self.f    def __exit__(self,exc_type,exc_val,exc_tb):
        if self.f:
            self.f.close()with OpenFile('log.txt') as f:
    print(f.read())

异步编程

和前端语法类似，通过async和await关键字实现异步编程：

import asyncioasync def hello(name: str):
    print(f'Hello {name}!')
    await asyncio.sleep(1)
    print(f'Bye {name}!')
    
asyncio.run(hello('hboot'))
print("hello world")

asyncio.sleep()用于演示异步调用，释放线程控制权。asyncio.run()运行异步函数，会阻塞当前线程，直到所有任务完成。

想要并发地运行多个任务，用asyncio.gather()，事件循环会在它们之间交替运行：

async def batch_hello():
    
    await asyncio.gather(
        hello('Alice'),
        hello('Bob'),
        hello('Charlie')
    )
    print('All done!')
    
asyncio.run(batch_hello())

通过信号量Semaphore可以控制并发数量：

import asyncioasync def hello(name: str,sem: asyncio.Semaphore):
    
    async with sem:
        print(f'Hello {name}!')
        await asyncio.sleep(1)
        print(f'Bye {name}!')async def batch_hello():
    # 控制并发3
    sem = asyncio.Semaphore(3)
    
    tasks = [hello(f'name{i}',sem) for i in range(10)]
    await asyncio.gather(*tasks)
    print('All done!')
    
asyncio.run(batch_hello())

与asyncio.gather()要等批次全部完成不同，asyncio.as_completed()是完成一个就处理一个，不用互相等。

下面的代码改变了不同任务的执行时间，可以很清楚地看到，每有一个任务完成，就立马加入一个新的任务：

import asyncio
import random as RANDOMasync def hello(name: str,sem: asyncio.Semaphore):
    
    async with sem:
        print(f'Hello {name}!')
        await asyncio.sleep(RANDOM.randint(1,5))
        print(f'Bye {name}!')async def batch_hello():
    # 控制并发3
    sem = asyncio.Semaphore(3)
    
    tasks = [hello(f'name{i}',sem) for i in range(10)]
    # await asyncio.gather(*tasks)
    
    for task in asyncio.as_completed(tasks):
        await task
        
    print('All done!')
    
asyncio.run(batch_hello())

如果需要更精细的控制，比如设置超时、取消任务等，可以使用asyncio.wait。

常用标准库

Python标准库提供了丰富的功能，以下是几个常用的：

`json`

json模块提供了JSON编码和解码功能：

import json# 序列化为str
print(json.dumps({"name": "hboot", "age": 18}))
# 反序列化为dict
print(json.loads('{"name": "hboot", "age": 18}'))

`os`

os模块提供了操作系统相关的功能：

import os# 创建文件夹
os.mkdir("test")
# 删除文件夹
os.rmdir("test")# 获取环境变量
print(os.environ.get("PATH"))# 执行系统命令
os.system("ls")

`sys`

sys模块提供了与Python解释器相关的功能：

import sys# 获取命令行参数
print(sys.argv)# 获取Python解释器的版本信息
print(sys.version)# 获取Python解释器的实现信息
print(sys.implementation)# 获取Python解释器的平台信息
print(sys.platform)# 获取Python解释器的路径
print(sys.executable)

`pathlib`

pathlib模块提供了路径对象，用于处理文件和目录的路径：

from pathlib import Path# 获取当前目录
print(Path.cwd())# 获取当前目录的父目录
print(Path.cwd().parent)# 判断路径是否存在
print(Path('.').exists())# 拼接路径
print(Path('.') / 'test.py')

`shutil`

shutil模块提供了一些文件和目录操作的函数：

import shutil# 创建目录
os.mkdir('test_dir')# 创建文件
Path('test_dir/test.py').touch()# 复制文件
shutil.copytree('test_dir', 'test_dir_copy')# 移动文件
shutil.move('test_dir', 'test_dir_move')# 删除文件
shutil.rmtree('test_dir_move')

`logging`

logging模块用于记录日志：

import logginglogging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')

`argparse`

argparse模块用于解析命令行参数：

import argparseparser = argparse.ArgumentParser()
parser.add_argument('--name', help='your name')
args = parser.parse_args()
print(f'Hello, {args.name}')

执行时，追加--name参数，例如：python3 main.py --name hboot。

`re`

re模块用于正则表达式匹配：

import re# 匹配字符串
print(re.match('hello', 'hello world'))# 提取邮箱
print(re.findall(r'[w]+@[w]+.[w]+', 'hello world, my email is '))# 提取手机号
print(re.findall(r'1[3-9]d{9}', 'hello world, my phone number is 13812345678'))# 替换字符串
print(re.sub(r'[d]+', '*', 'hello world, my phone number is 13812345678'))

`datetime`

datetime模块用于处理日期和时间：

from datetime import datetime# 获取当前时间
print(datetime.now())# 指定时间获取对象
print(datetime(2020, 1, 1))# 指定时间戳获取对象
print(datetime.fromtimestamp(1577836800))# 解析字符串为时间对象
print(datetime.strptime('2020-01-01', '%Y-%m-%d'))# 格式化时间为字符串
print(datetime.now().strftime('%Y-%m-%d'))

`collections`

collections是增强型数据结构，用来创建自定义数据结构。Counter 统计元素出现的次数；defaultdict 创建一个默认值为 None 的字典：

from collections import Counter, defaultdictprint(Counter([1, 1, 2, 3, 3, 3, 4, 4, 4, 4]))
print(Counter('hello world'))print(defaultdict(lambda: 'N/A'))
print(defaultdict(list))

`random`

random模块用来生成随机数：

import random# 生成随机数
print(random.random())# 生成指定范围内的随机数
print(random.randint(0, 10))# 随机选择一个元素
print(random.choice([1, 2, 3, 4, 5]))# 随机选择指定数量的元素
print(random.sample([1, 2, 3, 4, 5], 3))# 打乱列表
list = [1, 2, 3, 4, 5]
random.shuffle(list)
print(list)

`math`

math模块用来进行数学运算：

import math# 获取圆周率
print(math.pi)# 获取正弦值
print(math.sin(math.pi / 2))# 获取自然对数
print(math.log(math.e))# 获取对数
print(math.log(100, 10))# 获取指数
print(math.exp(1))# 获取绝对值
print(math.fabs(-100))# 获取平方根
print(math.sqrt(16))

`glob`

glob模块用来匹配文件：

import glob# 获取所有文件
print(glob.glob('*.py'))

`zipfile`

zipfile模块用来操作 ZIP 文件：

import zipfilePath('test.py').touch()
Path('test.txt').touch()# 创建 ZIP 文件
with zipfile.ZipFile('test.zip', 'w') as zip_file:
    zip_file.write('test.py')
    zip_file.write('test.txt')print(zipfile.is_zipfile('test.zip'))# 读取 ZIP 文件
with zipfile.ZipFile('test.zip', 'r') as zip_file:
    print(zip_file.namelist())

`itertools`

itertools是高效迭代工具，可以处理复杂循环、排列组合：

from itertools import product, permutations# 笛卡尔积
for i in product([1, 2], [3, 4]):
    print(i)# 排列组合
for i in permutations([1, 2, 3], 2):
    print(i)

`Jupyter lab`

Jupyter lab是一个基于Jupyter Notebook的开发环境，提供了代码高亮、自动补全、单元测试、图表绘制、数据可视化等丰富的功能。

# 安装
pip install jupyterlab# 启动
jupyter lab

启动后，浏览器里会有一个编辑界面。选择一个环境，会生成一个.ipynb文件，在里面输入代码就可以运行。

shift + enter 运行当前行。

如果需要安装依赖，可以直接在代码里使用!pip install xxx安装。

如果需要读取当前项目的python文件，可以使用%load xxx.py。如果需要执行某个文件，可以使用%run xxx.py。

可以说，Jupyter lab是学习Python的神器，比在本地编写、运行方便得多。

实现一个CLI应用

需求很明确：首先获取到执行CLI的参数：

import sys# 获取到json 文件路径，读取文件
json_file = sys.argv[1]

然后读取指定的json文件：

import json# 读取文件
with open(json_file, "r") as f:
    data = json.load(f)

接着调用大模型，这里以DeepSeek为例，参考官网提供的Python调用示例：

import os
from openai import OpenAIapi_key = os.getenv("API_KEY")
base_url = os.getenv("BASE_URL")
client = OpenAI(api_key=api_key, base_url=base_url)response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Return the text into English.",
        },
        {"role": "user", "content": data},
    ],
    stream=False,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)print(response.choices[0].message.content)

得到翻译结果后，写入文件：

import jsonwith open(json_file, "w") as f:
    json.dump(
        {"original": data, "translated": translated_text},
        f,
        ensure_ascii=False,
        indent=4,
    )

使用dotenv解析.env文件，获取API_KEY和BASE_URL：

from dotenv import load_dotenv# 加载环境变量
load_dotenv()

调用测试：python3 main.py test.json，等待大模型响应。执行结束后，可以看到test.json文件里已经被写入了翻译结果。

代码优化点可以归纳为以下几点：

添加错误处理，用try...except，比如处理API密钥错误、网络错误等。
提前检测参数，比如参数不足、参数格式错误等情况。
使用临时文件，保证文件写入的安全性。
加上if __name__ == "__main__"保护，确保模块可以安全地被导入。

安装Python

版本管理pyenv

虚拟环境venv

包管理器pip

写出我的Hello World

基础知识回顾

数据类型、变量

函数

类Class 和对象Object

条件判断 和循环语句

文件操作I/O

模块导入

错误捕获

核心知识进阶

类型注解

生成器

上下文管理

魔法Pythonic

异步编程

常用标准库

json

os

sys

pathlib

shutil

logging

argparse

re

datetime

collections

random

math

glob

zipfile

itertools

Jupyter lab