python提取中文的正则表达式-445IT之家

当我们用python爬取一些网站的时候，会发现有的是中英又语的，但是我们只需要中文，这时候怎么写正则表达式呢，首先给出答案： [\u3002\uff1b\uff0c\uff1a\u201c\u201d\uff08\uff09\u3001\uff1f\u300a\u300b]
朋友们可以先运行一下下面的例子加深印象。
s = ”’I am ten years old now, I am studying at a primary school, and I am in grade four. There are many subjects for me to learn, among them, I like Chinese the most. Chinese is our country’s language, it has more than five thousand years of history. I am so interested in Chinese culture, and learning Chinese well can help me understand Chinese culture better.
我现在十岁了，我在一所小学上学，我现在读四年级。我要学很多的科目，在这些科目当中，我最喜欢语文。汉语是我们国家的语言，有超过五千年的历史。我对中国的历史很感兴趣，学好语文能让我更好的了解中国历史。”’
t = re.findall(‘[\u3002\uff1b\uff0c\uff1a\u201c\u201d\uff08\uff09\u3001\uff1f\u300a\u300b\u4e00-\u9fa5]’,s)
print(”.join(t))
运行结果如下：

未经允许不得转载：445IT之家 » python提取中文的正则表达式

python提取中文的正则表达式

作者：liuying

相关推荐

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏